Goobot: A general multilingual web article extractor
Goobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.
Knowledge extraction from web pages
Web Crawling at Scale
A flexible and high performance distributed crawler framework.