pullword: Unsupervised Word Discovery

Pullword would be used in settings where only unlabelled text data is available.

FMR: functional meaning representation

A formal language for representing meaning and a system for semantic parsing.

gocc: Golang version OpenCC 繁簡轉換

gocc is a golang port of OpenCC(Open Chinese Convert 開放中文轉換) which is a project for conversion between Traditional and Simplified Chinese developed by BYVoid.

ling: A Natural Language Processing toolkit in Golang

Toknization, Normalization, Lemmatization, Tagging etc.

Goobot: A general multilingual web article extractor

Goobot is a general multilingual web article extractor. It works without rules or training just as, and it is more than 10 times faster than diffbot.

Knowledge extraction from web pages


Web Crawling at Scale

A flexible and high performance distributed crawler framework.

Vector search engines or vector databases are a core piece of infrastracture that fuels every big deep learning deployment in industry.


One Amazon Alexa AI’s new paper Language Model Is All You Need explores NLU problem as a QA problem. The original paper is on arXiv.


I found a very worthwhile article while surfing days ago. The article is a summary of a twitter thread which talked about meaning, semantics, language models, learning Thai and Java, entailment, co-reference — all in one fascinating thread. The original article is here.


Filtered-Space Saving (FSS) is a data structure and algorithm combination useful for accurately estimating the top k most frequent values appearing in a stream while using a constant, minimal memory footprint.


Gödel’s incompleteness theorems are two theorems of mathematical logic that demonstrate the inherent limitations of every formal axiomatic system containing basic arithmetic. These results, published by Kurt Gödel in 1931, are important both in mathematical logic and in the philosophy of mathematics.


Nov 19, 2019 3:00 PM
Sep 21, 2019 11:40 AM
Jun 17, 2017 2:00 PM
Nov 26, 2015 9:00 AM