Information Extraction

UIE - Universal Information Extraction

UIE - a unified text-to-structure generation framework, which can universally model different IE tasks, adaptively generate targeted structures, and unfiedly learn general IE abilities from different knowledge sources.

Goobot: A general multilingual web article extractor

Goobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.

Knowledge extraction from web pages

基于网页库的全球电话号码信息抽取

Web Crawling at Scale

A flexible and high performance distributed crawler framework.