Projects
- pullword: Unsupervised Word DiscoveryPullword would be used in settings where only unlabelled text data is available.
- FMR: functional meaning representationA formal language for representing meaning and a system for semantic parsing.
- gocc: Golang version OpenCC 繁簡轉換gocc is a golang port of OpenCC([Open Chinese Convert 開放中文轉換](https://github.com/BYVoid/OpenCC/)) which is a project for conversion between Traditional and Simplified Chinese developed by [BYVoid](https://www.byvoid.com/).
- ling: A Natural Language Processing toolkit in GolangToknization, Normalization, Lemmatization, Tagging etc.
- Goobot: A general multilingual web article extractorGoobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.
- Knowledge extraction from web pages基于网页库的全球电话号码信息抽取
- Web Crawling at ScaleA flexible and high performance distributed crawler framework.