Text2Structure - Structure Extraction Language
(
(Spot Name: Info Span
(Asso Name: Info Span)
(Asso Name: Info Span)
)
)
# Structure extrating language (SEL) for Universal IE
Spot Name
represents there is a specific information piece with the type of spot name existing in the source text.Asso Name
indicates there exists a specific information piece in the source text that is with the AssoName association to its upper-level Spotted information in the structure.Info Span
represents the text span corresponding to the specific spotting or associating information piece in the source text.
Following is an example:
(
(person: Steve
(work for: Apple)
)
(start-position: became
(employee: Steve)
(employer: Apple)
(title: CEO)
(time: 1997)
)
(orgnization: Apple)
(time: 1997)
)
# The SEL representation for "Steve became CEO of Apple in 1997."
Prompt paradigm
feature engineering -> neural network architechure engineering -> fine tuning -> prompt engineering
How to choose prompt
Different prompts have different zero-shot or few-shot capbilities. For example, the prompt for extrating person name could be:
- Which people are contained in the original text?
- Who are in the text?
- What are the names?
UIE对Prompt的统一:UIE通过大量数据训练固定了Prompt的构造方式,就是条件+抽取标签
,省去了传统Prompt选择太多的问题。
Prompt和原文越相似,效果越好。
Conclusion
UIE可以统一建模不同信息抽取任务,按需自适应地生成目标抽取结构,并从不同的知识来源统一学习通用信息抽取能力。
References
- Yaojie Lu, etc, from CAS, Baidu and BAAI. Unified Structure Generation for Universal Information Extraction., ACL 2022.
- PaddleNLP - UIE. 通用信息抽取 UIE(Universal Information Extraction)
- 通用信息抽取技术与产业应用实战, 2022.5.21
- 《UIE:基于统一结构生成的通用信息抽取》-韩先培, 2022.7.22
- https://github.com/universal-ie/UIE