nlp

Lexical Semantics (词汇语义)

nlp664-week6

Posted by renjie on February 24, 2020


Lexical Semantics (词汇语义)

WordNet

WordNet is a database of facts about words. 大型英语词汇数据库

Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. 名词、动词、形容词和副词被分成一系列认知同义词(synsets),每一个都表示一个不同的概念

Relations

synsets之间最常见的编码关系是超从属关系(也称为hyperonymy上位词(父)、hyponymy下位词或ISA关系)

除此之外还有,meronym(has part),holonym(part of). 比如dog 的meronym (tail)

Ontology 本体论

哲学上,本体论categorizes everything in the world,研究客观事物存在的本质。

接下来介绍一些有名的lexicon

Sentiment Lexicon

LIWC

Linguistic Inquiry and Word Count

http://liwc.wpengine.com/compare-dictionaries/

Often used for positive and negative emotion words in opinion mining 用于判断褒义贬义

ANEW

Affective Norms for English Words Participants gave graded reactions from 1-9 on three dimensions:
• Good/bad, psychological valence
• Active/passive, arousal valence
• Strong/weak, dominance valence
打分制 https://csea.phhp.ufl.edu/Media.html


Word Sense Disambiguation (WSD)

人是怎么区分多义词的? local context,domain knowledge, frequency data

Lesk Algorithm

Measure overlap between sense definitions of a word and current context

  • Identify the correct sense for one word at a time
  • Current context is the set of words in the surrounding sentence/paragraph/document.

拿到一个word的所有释义,分别算与上下文的意思overlap,取最高overlap的一个释义

用人工标签的数据集作为训练集,得到features用于测试集,再回过头来调试features。

除了用overlap也可以用similarity