other NLP tools
- Stanford CoreNLP Java
- spaCy Python
- lingo Golang
Multilingual text toknization
Text normalization
Lemmatization
词干提取(stemming)和词形还原(lemmatization)
- Stemming and lemmatization
- Lemmatization ListsDatasets by MBM
- The UniMorph Project
- 中文繁简转换
Tagging
- Regex tagger
- commonregex, a collection of common regular expressions for Go.
- xurls, a Go package of regex for urls.