連云港集團(tuán)網(wǎng)站建設(shè)企業(yè)快速建站
“你的意思是” 是搜索引擎中一個(gè)非常重要的功能,因?yàn)樗鼈兺ㄟ^顯示建議的術(shù)語來幫助用戶,以便他可以進(jìn)行更準(zhǔn)確的搜索。比如,在百度中,我們進(jìn)行搜索時(shí),它通常會(huì)顯示一些更為常用推薦的搜索選項(xiàng)來供我們選擇:
?
為了創(chuàng)建 “你的意思是”,我們將使用 phrase suggester,因?yàn)橥ㄟ^它我們將能夠建議句子更正,而不僅僅是術(shù)語。在我之前的文章 “Elasticsearch:如何實(shí)現(xiàn)短語建議 - phrase suggester”,我有涉及到這個(gè)問題。
首先,我們將使用一個(gè) shingle 過濾器,因?yàn)樗鼘⑻峁┮粋€(gè)分詞,短語建議器將使用該標(biāo)記來進(jìn)行匹配并返回更正。有關(guān) shingle 過濾器的描述,請(qǐng)閱讀之前的文章 “Elasticsearch: Ngrams, edge ngrams, and shingles”。
準(zhǔn)備數(shù)據(jù)
我們首先來定義映射:
PUT movies
{"settings": {"analysis": {"analyzer": {"en_analyzer": {"tokenizer": "standard","filter": ["lowercase","stop"]},"shingle_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase","shingle_filter"]}},"filter": {"shingle_filter": {"type": "shingle","min_shingle_size": 2,"max_shingle_size": 3}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "en_analyzer","fields": {"suggest": {"type": "text","analyzer": "shingle_analyzer"}}},"actors": {"type": "text","analyzer": "en_analyzer","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"description": {"type": "text","analyzer": "en_analyzer","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"director": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"genre": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"metascore": {"type": "long"},"rating": {"type": "float"},"revenue": {"type": "float"},"runtime": {"type": "long"},"votes": {"type": "long"},"year": {"type": "long"},"title_suggest": {"type": "completion","analyzer": "simple","preserve_separators": true,"preserve_position_increments": true,"max_input_length": 50}}}
}
我們接下來使用?_bulk?命令來寫入一些文檔到這個(gè)索引中去。我們使用這個(gè)鏈接中的內(nèi)容。我們使用如下的方法:
POST movies/_bulk
{"index": {}}
{"title": "Guardians of the Galaxy", "genre": "Action,Adventure,Sci-Fi", "director": "James Gunn", "actors": "Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana", "description": "A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.", "year": 2014, "runtime": 121, "rating": 8.1, "votes": 757074, "revenue": 333.13, "metascore": 76}
{"index": {}}
{"title": "Prometheus", "genre": "Adventure,Mystery,Sci-Fi", "director": "Ridley Scott", "actors": "Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron", "description": "Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone.", "year": 2012, "runtime": 124, "rating": 7, "votes": 485820, "revenue": 126.46, "metascore": 65}....
?在上面,為了說明的方便,我省去了其它的文檔。你需要把整個(gè) movies.txt 的文件拷貝過來,并全部寫入到?Elasticsearch?中。它共有1000 個(gè)文檔。
搜索數(shù)據(jù)
現(xiàn)在讓我們運(yùn)行一個(gè)基本查詢來查看 suggest 的結(jié)果:
GET movies/_search?filter_path=suggest
{"suggest": {"text": "transformers revenge of the falen","did_you_mean": {"phrase": {"field": "title.suggest","size": 5}}}
}
上面命令顯示的結(jié)果為:
{"suggest": {"did_you_mean": [{"text": "transformers revenge of the falen","offset": 0,"length": 33,"options": [{"text": "transformers revenge of the fallen","score": 0.004467494},{"text": "transformers revenge of the fall","score": 0.00020402104},{"text": "transformers revenge of the face","score": 0.00006419608}]}]}
}
請(qǐng)注意,在幾行中你已經(jīng)獲得了一些有希望的結(jié)果。
現(xiàn)在讓我們通過使用更多短語建議功能來增加我們的查詢。讓我們使用 max_errors = 2,這樣我們希望句子中最多有兩個(gè)術(shù)語。 添加了 highlight 顯示以突出??顯示建議的術(shù)語。
GET movies/_search?filter_path=suggest
{"suggest": {"text": "transformer revenge of the falen","did_you_mean": {"phrase": {"field": "title.suggest","size": 5,"confidence": 1,"max_errors":2,"highlight": {"pre_tag": "<strong>","post_tag": "</strong>"}}}}
}
上面命令返回的結(jié)果為:
{"suggest": {"did_you_mean": [{"text": "transformer revenge of the falen","offset": 0,"length": 32,"options": [{"text": "transformers revenge of the fallen","highlighted": "<strong>transformers</strong> revenge of the <strong>fallen</strong>","score": 0.004382903},{"text": "transformers revenge of the fall","highlighted": "<strong>transformers</strong> revenge of the <strong>fall</strong>","score": 0.00020015794},{"text": "transformers revenge of the face","highlighted": "<strong>transformers</strong> revenge of the <strong>face</strong>","score": 0.00006298054},{"text": "transformers revenge of the falen","highlighted": "<strong>transformers</strong> revenge of the falen","score": 0.00006159308},{"text": "transformer revenge of the fallen","highlighted": "transformer revenge of the <strong>fallen</strong>","score": 0.000048000533}]}]}
}
我們?cè)俑倪M(jìn)一點(diǎn)好嗎? 我們添加了 “collate”,我們可以對(duì)每個(gè)結(jié)果執(zhí)行查詢,改進(jìn)建議的結(jié)果。 我使用了帶有 “and” 運(yùn)算符的匹配項(xiàng),以便在同一個(gè)句子中匹配所有術(shù)語。 如果我仍然想要不符合查詢條件的結(jié)果,我使用 prune = true。
GET movies/_search?filter_path=suggest
{"suggest": {"text": "transformer revenge of the falen","did_you_mean": {"phrase": {"field": "title.suggest","size": 5,"confidence": 1,"max_errors":2,"collate": {"query": { "source" : {"match": {"{{field_name}}": {"query": "{{suggestion}}","operator": "and"}}}},"params": {"field_name" : "title"}, "prune" :true},"highlight": {"pre_tag": "<strong>","post_tag": "</strong>"}}}}
}
現(xiàn)在的結(jié)果是:
請(qǐng)注意,答案已更改,我有一個(gè)新字段 “collat??e_match”,它指示結(jié)果中是否匹配整理規(guī)則(這是因?yàn)?prune = true)。
讓我們?cè)O(shè)置 prune 為 false:
GET movies/_search?filter_path=suggest
{"suggest": {"text": "transformer revenge of the falen","did_you_mean": {"phrase": {"field": "title.suggest","size": 5,"confidence": 1,"max_errors":2,"collate": {"query": { "source" : {"match": {"{{field_name}}": {"query": "{{suggestion}}","operator": "and"}}}},"params": {"field_name" : "title"}, "prune" :false},"highlight": {"pre_tag": "<strong>","post_tag": "</strong>"}}}}
}
這次我們得到的結(jié)果是:
我們可以看到只有一個(gè)結(jié)果是最相關(guān)的建議。?