Lexical Semantics (词汇语义)
WordNet
WordNet is a database of facts about words. 大型英语词汇数据库
Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. 名词、动词、形容词和副词被分成一系列认知同义词(synsets),每一个都表示一个不同的概念
Relations
synsets之间最常见的编码关系是超从属关系(也称为hyperonymy上位词(父)、hyponymy下位词或ISA关系)
除此之外还有,meronym(has part),holonym(part of). 比如dog 的meronym (tail)
Ontology 本体论
哲学上,本体论categorizes everything in the world,研究客观事物存在的本质。
接下来介绍一些有名的lexicon
Sentiment Lexicon
LIWC
Linguistic Inquiry and Word Count
http://liwc.wpengine.com/compare-dictionaries/
Often used for positive and negative emotion words in opinion mining 用于判断褒义贬义
ANEW
Affective Norms for English Words
Participants gave graded reactions from 1-9 on three dimensions:
• Good/bad, psychological valence
• Active/passive, arousal valence
• Strong/weak, dominance valence
打分制
https://csea.phhp.ufl.edu/Media.html
Word Sense Disambiguation (WSD)
人是怎么区分多义词的? local context,domain knowledge, frequency data
Lesk Algorithm
Measure overlap between sense definitions of a word and current context
- Identify the correct sense for one word at a time
- Current context is the set of words in the surrounding sentence/paragraph/document.
拿到一个word的所有释义,分别算与上下文的意思overlap,取最高overlap的一个释义
用人工标签的数据集作为训练集,得到features用于测试集,再回过头来调试features。
除了用overlap也可以用similarity