Information Extraction

Goobot: A general multilingual web article extractor

Goobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.

Knowledge extraction from web pages

基于网页库的全球电话号码信息抽取

Web Crawling at Scale

A flexible and high performance distributed crawler framework.