Before ChatGPT: my stint on a large model

Looking back on the days before ChatGPT, when I was working on large language models.

We got our start in partnership with the Beijing Academy of Artificial Intelligence (BAAI). GPT-3 had just come out, and Beijing quickly launched a big project to build a large model of its own. The main groups involved were Jie Tang and Maosong Sun’s teams at Tsinghua, and Ji-Rong Wen’s team at Renmin University. There were four tracks in all: a Chinese pretrained language model, a knowledge-infused model, a multimodal model, and a protein-sequence model.

Professor Wen led the multimodal track, which later produced the WuDao·WenLan model. Within it, I ran an innovation center that got going in mid-2020 with a concrete goal: to build a personal intelligent information assistant. We grounded it in intelligent information retrieval and mining, built up the technical base across intelligent search, question answering, and dialogue systems, and pushed the results toward a product — one of the main target settings being government services, so ordinary people could look up and handle civic tasks with less friction. It looks like an early, tentative attempt now, but at the time it was a genuinely hard problem.

WIRED ran a piece on the project back then, and it touched on our line of work too:

“This is a big project,” Wen says with a big grin. “It takes a lot of computing infrastructure and money.” … Wen says his language system could serve as an intelligent assistant to help citizens perform civic tasks online … Zhanliang Liu, project lead for the effort and previously an engineer at Baidu … says his team has built a prototype [for one such government service]. “It is a really tough challenge,” he says.