做網(wǎng)站的域名和空間是什么意思濟南做網(wǎng)站公司哪家好
自然語言處理基礎(chǔ)(Natural Language Processing Basics, NLP Basics)
自然語言處理( Natural Language Processing, NLP)是計算機科學領(lǐng)域與人工智能領(lǐng)域中的一個重要方向。它研究能實現(xiàn)人與計算機之間用自然語言進行有效通信的各種理論和方法。自然語言處理是一門融語言學、計算機科學、數(shù)學于一體的科學。因此,這一領(lǐng)域的研究將涉及自然語言,即人們?nèi)粘J褂玫恼Z言,所以它與語言學的研究有著密切的聯(lián)系,但又有重要的區(qū)別。自然語言處理并不是一般地研究自然語言,而在于研制能有效地實現(xiàn)自然語言通信的計算機系統(tǒng),特別是其中的軟件系統(tǒng)。因而它是計算機科學的一部分。
為什么NLP重要(Why is NLP Important?)
- Turing Test:A test of machine ability to exhibit intelligent behavior indistinguishable from a human
- Language is the communication tool in the test
艾倫圖靈的最早版本:Imitation Game。
卷福也拍過這部電影Imitation Game,為了破解德軍的軍情信息,圖靈和一群才華橫溢的人研究如何破譯密碼,如果純粹人工破解則幾乎不可能,但是他們發(fā)明了最早的人工智能機器,通過大模型破譯出原始信息,一開始該模型一直無法收斂,直到他們發(fā)現(xiàn)傳遞的信息中總會有“希特勒萬歲”這句話之后,發(fā)現(xiàn)了大模型的初始條件,一舉攻破該難題。
詞的表達(Distributed Word Representation)
Word Representation
- Word representation: a process that transform the symbols to the machine understandable meanings
- Definition of meaning(Webster Dictionary)
-
- The thing one intends to convey especially by language
-
- The logical extension of a word
- How to represent the meaning so that the machine can understand?
Goal of Word Representation
- Compute word similarity,計算詞的相似度
- Infer word relation,發(fā)現(xiàn)詞的語義關(guān)系
Synonym and Hypernym
- Use a set of related words, such as synonyms and hypernyms to represent a word
用一組相關(guān)詞(同義詞/上位詞)集合來表示它
Problems of Synonym/Hypernym Representation
- Missing nuance,有一些細微差異無法完成,比如proficient和good
- Missing new meanings of words,同義詞/上位詞出現(xiàn)新的詞義會缺失實際含義,比如Apple(fruit —> IT company)
- Subjective,主觀性問題
- Data sparsity,數(shù)據(jù)稀疏問題
- Requires human labor to create and adapt,需要大量人工構(gòu)建和維護這個字典
One-Hot Representation
- Regard words as discrete symbols,把它看作獨立的符號
- Word ID or one-hot representation,可以比較好的完成兩個文檔之間的相似的計算
Problems of One-Hot Representation
- similarity(star, sun) = (Vstar, Vsun) = 0,它的問題是假設(shè)詞和詞之間互相之間都是正交的,那么從而導致任意兩個詞進行相似度的計算都是零
- All the vectors are orthogonal.No natural notion of similarity for one-hot vectors.
Represent Word by Context
- The meaning of a word is given by the words that frequently appear close-by,一個詞的詞義經(jīng)常跟這個詞的上下文有密切關(guān)系。
- Use context words to represent stars,比如下圖用上下文中的詞表示stars這個詞。
Co-Occurrence Counts
- Count-based distributional representation
- Term-Term matrix: How often a word occurs with another
- Term-Document matrix: How often a word occurs in a document
可以通過這個詞出現(xiàn)次數(shù)得到的稠密向量算出兩個詞之間的相似度
Problems of Count-Based Representation
- Increase in size with vocabulary
- Require a lot of storage
- sparsity issues for those less frequent words
它的問題是當詞表變得越來越大的時候,存儲的需求就會越來越大。
而頻次出現(xiàn)較少的詞,它的上下文或者詞的語境就會變得很稀疏。
Word Embedding,詞嵌入
- Distributed Representation,分布式的表示
- Build a dense vector for each word learned from large-scale text corpora,建立起一個低維的一個稠密的向量空間,用這個空間里面的某一個位置所對應(yīng)的那個向量來表示這個詞。
- Learning method: Word2Vec(We will learn it in the next class)
Language Modeling
- Language Modeling is the task of predicting the upcoming word
語言模型的能力其實就是根據(jù)前面的詞預(yù)測下面即將要出現(xiàn)的詞- Compute conditional probability of an upcoming word Wn:
- Compute conditional probability of an upcoming word Wn:
- A language model is a probability distribution over a sequence of words
- Compute joint probability of a sequence of words:P(W) = P(w1,w2,…,wn) 它稱為合法的一句話的概率,也即所有詞的序列的聯(lián)合概率
- Compute conditional probability of an upcoming words Wn:P(wn | w1,w2,…,wn-1),根據(jù)前面已經(jīng)說過的詞預(yù)測下一個詞
- How to compute the sentence probability?
- Assumption:the probability of an upcoming word is only determined by all its previous words,未來的詞它只會受到它前面的詞的影響
- Language Model
N-gram Model
- Collect statistics about how frequent different n-grams are, and use these to predict next word.
- E.g., 4-gram,比如4-gram,表達式如下,它會統(tǒng)計too late to wj的頻次和too late to出現(xiàn)的頻次的關(guān)系
-
Problem:
- Need to store count for all possible n-grams. So model size is O(exp(n))
-
Markov assumption,馬爾可夫假設(shè)
- Simplifying Language Model
- Bigram(N-Gram,N=2)
- Trigram(N-Gram,N=3)
Problems of N-gram
- Not considering contexts farther than 1 or 2 words,很少考慮更多的歷史,一般就用bigram或trigram,沒有辦法考慮較長的詞的上下文。
- Not capturing the similarity between words,N-gram它的背后其實是一個典型的one-hot representation,它會假設(shè)所有的詞都是相互獨立的,它做統(tǒng)計的時候上下文其實都是基于符號來做統(tǒng)計的,它是沒有辦法理解這些詞互相之間的相似度。
Neural Language Model
- A neural language model is a language model based on neural networks to learn distributed representations of words
分布式的表示建構(gòu)前文和當前詞的預(yù)測條件概率- Associate words with distributed vectors
- Compute the joint probability of word sequences in terms of the feature vectors
- Optimize the word feature vectors(embedding matrix E)and the parameters of the loss function(map matrix W)