Projects

pullword: Unsupervised Word Discovery

Pullword would be used in settings where only unlabelled text data is available.

FMR: functional meaning representation

A formal language for representing meaning and a system for semantic parsing.

gocc: Golang version OpenCC 繁簡轉換

gocc is a golang port of OpenCC(Open Chinese Convert 開放中文轉換) which is a project for conversion between Traditional and Simplified Chinese developed by BYVoid.

ling: A Natural Language Processing toolkit in Golang

Toknization, Normalization, Lemmatization, Tagging etc.

Goobot: A general multilingual web article extractor

Goobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.

Knowledge extraction from web pages

基于网页库的全球电话号码信息抽取

Web Crawling at Scale

A flexible and high performance distributed crawler framework.

Recent Posts

More Posts

UIE - a unified text-to-structure generation framework, which can universally model different IE tasks, adaptively generate targeted structures, and unfiedly learn general IE abilities from different knowledge sources.

CONTINUE READING

Vector search engines or vector databases are a core piece of infrastracture that fuels every big deep learning deployment in industry.

CONTINUE READING

One Amazon Alexa AI’s new paper Language Model Is All You Need explores NLU problem as a QA problem. The original paper is on arXiv.

CONTINUE READING

I found a very worthwhile article while surfing medium.com days ago. The article is a summary of a twitter thread which talked about meaning, semantics, language models, learning Thai and Java, entailment, co-reference — all in one fascinating thread. The original article is here.

CONTINUE READING

Filtered-Space Saving (FSS) is a data structure and algorithm combination useful for accurately estimating the top k most frequent values appearing in a stream while using a constant, minimal memory footprint.

CONTINUE READING

Recent & Upcoming Talks

自然语言处理与行业应用
Nov 19, 2019 3:00 PM
语义解析在智能搜索及AI行业落地等方向的应用
Sep 21, 2019 11:40 AM
人工智能的过去、现在和未来
Jun 17, 2017 2:00 PM
『大数据』方法论及示例
Nov 26, 2015 9:00 AM

Contact