UIE - a unified text-to-structure generation framework, which can universally model different IE tasks, adaptively generate targeted structures, and unfiedly learn general IE abilities from different knowledge sources.
Zhanliang Liu is a very technical research & engineering manager who is able to build complex system/products with experience in mature and startup companies.
He is currently in charge of the Innovation Center of Intelligent Information Processing Technology at Beijing Academy of Artificial Intelligence (BAAI) and the Vice President at Elensdata. Before that he was a principle engineer, senior manager at Baidu. Before joining baidu he was the manager of Sogou haomatong R&D team. Prior to Sogou, he was a researcher at Tencent working on Web Crawling, IR, NLP and Data Mining. Before that, he was a RSDE in the Web Search and Mining Group at Microsoft Research Asia with research interests on IR and Web Search Architecture. Besides, he was one of the founding team members of Hitchsters.com(named one of Time Magazine’s 50 Top Websites for 2007) and Initialview.com.
MEng in Computer Science, 2007
Tianjin University
BSc in Computer Science, 2005
Tianjin University
Pullword would be used in settings where only unlabelled text data is available.
A formal language for representing meaning and a system for semantic parsing.
gocc is a golang port of OpenCC(Open Chinese Convert 開放中文轉換) which is a project for conversion between Traditional and Simplified Chinese developed by BYVoid.
Toknization, Normalization, Lemmatization, Tagging etc.
Goobot is a general multilingual web article extractor. It works without rules or training just as diffbot.com, and it is more than 10 times faster than diffbot.
基于网页库的全球电话号码信息抽取
A flexible and high performance distributed crawler framework.
UIE - a unified text-to-structure generation framework, which can universally model different IE tasks, adaptively generate targeted structures, and unfiedly learn general IE abilities from different knowledge sources.
Vector search engines or vector databases are a core piece of infrastracture that fuels every big deep learning deployment in industry.
One Amazon Alexa AI’s new paper Language Model Is All You Need explores NLU problem as a QA problem. The original paper is on arXiv.
I found a very worthwhile article while surfing medium.com days ago. The article is a summary of a twitter thread which talked about meaning, semantics, language models, learning Thai and Java, entailment, co-reference — all in one fascinating thread. The original article is here.
Filtered-Space Saving (FSS) is a data structure and algorithm combination useful for accurately estimating the top k most frequent values appearing in a stream while using a constant, minimal memory footprint.