廈門網(wǎng)站建設(shè)公seo關(guān)鍵詞優(yōu)化推廣哪家好
聚合查詢
-
概念
聚合(aggs)不同于普通查詢,是目前學(xué)到的第二種大的查詢分類,第一種即“query”,因此在代碼中的第一層嵌套由“query”變?yōu)榱恕癮ggs”。用于進(jìn)行聚合的字段必須是exact value,分詞字段不可進(jìn)行聚合,對于text字段如果需要使用聚合,需要開啟fielddata,但是通常不建議,因為fielddata是將聚合使用的數(shù)據(jù)結(jié)構(gòu)由磁盤(doc_values)變?yōu)榱硕褍?nèi)存(field_data),大數(shù)據(jù)的聚合操作很容易導(dǎo)致OOM,詳細(xì)原理會在進(jìn)階篇中闡述。
-
聚合分類
- 分桶聚合(Bucket agregations):類比SQL中的group by的作用,主要用于統(tǒng)計不同類型數(shù)據(jù)的數(shù)量
- 指標(biāo)聚合(Metrics agregations):主要用于最大值、最小值、平均值、字段之和等指標(biāo)的統(tǒng)計
- 管道聚合(Pipeline agregations):用于對聚合的結(jié)果進(jìn)行二次聚合,如要統(tǒng)計綁定數(shù)量最多的標(biāo)簽bucket,就是要先按照標(biāo)簽進(jìn)行分桶,再在分桶的結(jié)果上計算最大值。
-
語法
GET product/_search {"aggs": {"<aggs_name>": {"<agg_type>": {"field": "<field_name>"}}} }
aggs_name:聚合函數(shù)的名稱
agg_type:聚合種類,比如是桶聚合(terms)或者是指標(biāo)聚合(avg、sum、min、max等)
field_name:字段名稱或者叫域名。
-
桶聚合:
場景:用于統(tǒng)計不同種類的文檔的數(shù)量,可進(jìn)行嵌套統(tǒng)計。
函數(shù):terms
注意:聚合字段必須是exact value,如keyword
-
指標(biāo)聚合
場景:用于統(tǒng)計某個指標(biāo),如最大值、最小值、平均值,可以結(jié)合桶聚合一起使用,如按照商品類型分桶,統(tǒng)計每個桶的平均價格。
函數(shù):平均值:Avg、最大值:Max、最小值:Min、求和:Sum、詳細(xì)信息:Stats、數(shù)量:Value count
-
管道聚合
場景:用于對聚合查詢的二次聚合,如統(tǒng)計平均價格最低的商品分類,即先按照商品分類進(jìn)行桶聚合,并計算其平均價格,然后對其平均價格計算最小值聚合
函數(shù):Min bucket:最小桶、Max bucket:最大桶、Avg bucket:桶平均值、Sum bucket:桶求和、Stats bucket:桶信息
注意:buckets_path為管道聚合的關(guān)鍵字,其值從當(dāng)前聚合統(tǒng)計的聚合函數(shù)開始計算為第一級。比如下面例子中,my_aggs和my_min_bucket同級, my_aggs就是buckets_path值的起始值。
GET product/_search {"size": 0, "aggs": {"my_aggs": {"terms": {...},"aggs": {"my_price_bucket": {...}}},"my_min_bucket":{"min_bucket": {"buckets_path": "my_aggs>price_bucket"}}} }
-
嵌套聚合
語法:
GET product/_search {"size": 0,"aggs": {"<agg_name>": {"<agg_type>": {"field": "<field_name>"},"aggs": {"<agg_name_child>": {"<agg_type>": {"field": "<field_name>"}}}}} }
用途:用于在某種聚合的計算結(jié)果之上再次聚合,如統(tǒng)計不同類型商品的平均價格,就是在按照商品類型桶聚合之后,在其結(jié)果之上計算平均價格
-
聚合和查詢的相互關(guān)系
-
基于query或filter的聚合
語法:
GET product/_search {"query": {...}, "aggs": {...} }
注意:以上語法,執(zhí)行順序為先query后aggs,順序和誰在上誰在下沒有關(guān)系。query中可以是查詢、也可以是filter、或者bool query
-
基于聚合結(jié)果的查詢、
GET product/_search {"aggs": {...},"post_filter": {...} }
注意:以上語法,執(zhí)行順序為先aggs后post_filter,順序和誰在上誰在下沒有關(guān)系。
-
查詢條件的作用域
GET product/_search {"size": 10,"query": {...},"aggs": {"avg_price": {...},"all_avg_price": {"global": {},"aggs": {...}}} }
上面例子中,avg_price的計算結(jié)果是基于query的查詢結(jié)果的,而all_avg_price的聚合是基于all data的
-
-
聚合排序
-
排序規(guī)則:
order_type:_count(數(shù)量) _key(聚合結(jié)果的key值) _term(廢棄但是仍然可用,使用_key代替)
GET product/_search {"aggs": {"type_agg": {"terms": {"field": "tags","order": {"<order_type>": "desc"},"size": 10}}} }
-
多級排序:即排序的優(yōu)先級,按照外層優(yōu)先的順序
GET product/_search?size=0 {"aggs": {"first_sort": {..."aggs": {"second_sort": {...}}}} }
上例中,先按照first_sort排序,再按照second_sort排序
-
多層排序:即按照多層聚合中的里層某個聚合的結(jié)果進(jìn)行排序
GET product/_search {"size": 0,"aggs": {"tag_avg_price": {"terms": {"field": "type.keyword","order": {"agg_stats>my_stats.sum": "desc"}},"aggs": {"agg_stats": {..."aggs": {"my_stats": {"extended_stats": {...}}}}}}} }
上例中,按照里層聚合“my_stats”進(jìn)行排序
-
-
常用的查詢函數(shù)
-
histogram:直方圖或柱狀圖統(tǒng)計
用途:用于區(qū)間統(tǒng)計,如不同價格商品區(qū)間的銷售情況
語法:
GET product/_search?size=0 {"aggs": {"<histogram_name>": {"histogram": {"field": "price", #字段名稱"interval": 1000, #區(qū)間間隔"keyed": true, #返回數(shù)據(jù)的結(jié)構(gòu)化類型"min_doc_count": <num>, #返回桶的最小文檔數(shù)閾值,即文檔數(shù)小于num的桶不會被輸出"missing": 1999 #空值的替換值,即如果文檔對應(yīng)字段的值為空,則默認(rèn)輸出1999(參數(shù)值)}}} }
-
date-histogram:基于日期的直方圖,比如統(tǒng)計一年每個月的銷售額
語法:
GET product/_search?size=0 {"aggs": {"my_date_histogram": {"date_histogram": {"field": "createtime", #字段需為date類型"<interval_type>": "month", #時間間隔的參數(shù)可選項"format": "yyyy-MM", #日期的格式化輸出"extended_bounds": { #輸出空桶"min": "2020-01","max": "2020-12"}}}} }
interval_type:時間間隔的參數(shù)可選項
? fixed_interval:ms(毫秒)、s(秒)、 m(分鐘)、h(小時)、d(天),注意單位需要帶上具體的數(shù)值,如2d為兩天。需要當(dāng)心當(dāng)單位過小,會 導(dǎo)致輸出桶過多而導(dǎo)致服務(wù)崩潰。
? calendar_interval:month、year
? interval:(廢棄,但是仍然可用)
-
percentile 百分位統(tǒng)計 或者 餅狀圖
-
percentiles:用于評估當(dāng)前數(shù)值分布情況,比如99 percentile 是 1000 , 是指 99%的數(shù)值都在1000以內(nèi)。常見的一個場景就是我們制定 SLA 的時候常說 99% 的請求延遲都在100ms 以內(nèi),這個時候你就可以用 99 percentile 來查一下,看一下 99 percenttile 的值如果在 100ms 以內(nèi),就代表SLA達(dá)標(biāo)了。
語法:
GET product/_search?size=0 {"aggs": {"<percentiles_name>": {"percentiles": {"field": "price","percents": [percent1, #區(qū)間的數(shù)值,如5、10、30、50、99 即代表5%、10%、30%、50%、99%的數(shù)值分布percent2,...]}}} }
-
percentile_ranks: percentile rank 其實就是percentiles的反向查詢,比如我想看一下 1000、3000 在當(dāng)前數(shù)值中處于哪一個范圍內(nèi),你查一下它的 rank,發(fā)現(xiàn)是95,99,那么說明有95%的數(shù)值都在1000以內(nèi),99%的數(shù)值都在3000以內(nèi)。
GET product/_search?size=0 {"aggs": {"<percentiles_name>": {"percentile_ranks": {"field": "<field_value>","values": [rank1,rank2,...]}}} }
-
-
示例
# 聚合查詢
DELETE product
## 數(shù)據(jù)
PUT product
{"mappings" : {"properties" : {"createtime" : {"type" : "date"},"date" : {"type" : "date"},"desc" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}},"analyzer":"ik_max_word"},"lv" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"name" : {"type" : "text","analyzer":"ik_max_word","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"price" : {"type" : "long"},"tags" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"type" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}
}
PUT /product/_doc/1
{"name" : "小米手機(jī)","desc" : "手機(jī)中的戰(zhàn)斗機(jī)","price" : 3999,"lv":"旗艦機(jī)","type":"手機(jī)","createtime":"2020-10-01T08:00:00Z","tags": [ "性價比", "發(fā)燒", "不卡頓" ]
}
PUT /product/_doc/2
{"name" : "小米NFC手機(jī)","desc" : "支持全功能NFC,手機(jī)中的滑翔機(jī)","price" : 4999,"lv":"旗艦機(jī)","type":"手機(jī)","createtime":"2020-05-21T08:00:00Z","tags": [ "性價比", "發(fā)燒", "公交卡" ]
}
PUT /product/_doc/3
{"name" : "NFC手機(jī)","desc" : "手機(jī)中的轟炸機(jī)","price" : 2999,"lv":"高端機(jī)","type":"手機(jī)","createtime":"2020-06-20","tags": [ "性價比", "快充", "門禁卡" ]
}
PUT /product/_doc/4
{"name" : "小米耳機(jī)","desc" : "耳機(jī)中的黃燜雞","price" : 999,"lv":"百元機(jī)","type":"耳機(jī)","createtime":"2020-06-23","tags": [ "降噪", "防水", "藍(lán)牙" ]
}
PUT /product/_doc/5
{"name" : "紅米耳機(jī)","desc" : "耳機(jī)中的肯德基","price" : 399,"type":"耳機(jī)","lv":"百元機(jī)","createtime":"2020-07-20","tags": [ "防火", "低音炮", "聽聲辨位" ]
}
PUT /product/_doc/6
{"name" : "小米手機(jī)10","desc" : "充電賊快掉電更快,超級無敵望遠(yuǎn)鏡,高刷電競屏","price" : "","lv":"旗艦機(jī)","type":"手機(jī)","createtime":"2020-07-27","tags": [ "120HZ刷新率", "120W快充", "120倍變焦" ]
}
PUT /product/_doc/7
{"name" : "挨炮 SE2","desc" : "除了CPU,一無是處","price" : "3299","lv":"旗艦機(jī)","type":"手機(jī)","createtime":"2020-07-21","tags": [ "割韭菜", "割韭菜", "割新韭菜" ]
}
PUT /product/_doc/8
{"name" : "XS Max","desc" : "聽說要出新款12手機(jī)了,終于可以換掉手中的4S了","price" : 4399,"lv":"旗艦機(jī)","type":"手機(jī)","createtime":"2020-08-19","tags": [ "5V1A", "4G全網(wǎng)通", "大" ]
}
PUT /product/_doc/9
{"name" : "小米電視","desc" : "70寸性價比只選,不要一萬八,要不要八千八,只要兩千九百九十八","price" : 2998,"lv":"高端機(jī)","type":"耳機(jī)","createtime":"2020-08-16","tags": [ "巨饃", "家庭影院", "游戲" ]
}
PUT /product/_doc/10
{"name" : "紅米電視","desc" : "我比上邊那個更劃算,我也2998,我也70寸,但是我更好看","price" : 2999,"type":"電視","lv":"高端機(jī)","createtime":"2020-08-28","tags": [ "大片", "藍(lán)光8K", "超薄" ]
}
PUT /product/_doc/11
{"name": "紅米電視","desc": "我比上邊那個更劃算,我也2998,我也70寸,但是我更好看","price": 2998,"type": "電視","lv": "高端機(jī)","createtime": "2020-08-28","tags": ["大片","藍(lán)光8K","超薄"]
}
## 語法
GET product/_search
{"aggs": {"<aggs_name>": {"<agg_type>": {"field": "<field_name>"}}}
}
## 桶聚合 例:統(tǒng)計不同標(biāo)簽的商品數(shù)量
GET product/_search
{"aggs": {"tag_bucket": {"terms": {"field": "tags.keyword"}}}
}
## 不顯示hits數(shù)據(jù):size:0
GET product/_search
{"size": 0, "aggs": {"tag_bucket": {"terms": {"field": "tags.keyword"}}}
}
## 排序
GET product/_search
{"size": 0, "aggs": {"tag_bucket": {"terms": {"field": "tags.keyword","size": 3,"order": {"_count": "desc"}}}}
}## doc_values和field_data
GET product/_search
{"size": 0, "aggs": {"tag_bucket": {"terms": {"field": "name"}}}
}
GET product/_search
{"size": 0, "aggs": {"tag_bucket": {"terms": {"field": "name.keyword"}}}
}
POST product/_mapping
{"properties": {"name": {"type": "text","analyzer": "ik_max_word","fielddata": true}}
}
GET product/_search
{"size": 0,"aggs": {"tag_bucket": {"terms": {"size": 20,"field": "name"}}}
}#*****************************************
## 指標(biāo)聚合
## 例:最貴、最便宜和平均價格三個指標(biāo)
GET product/_search
{"size": 0, "aggs": {"max_price": {"max": {"field": "price"}},"min_price": {"min": {"field": "price"}},"avg_price": {"avg": {"field": "price"}}}
}
## 單個聚合查詢所有指標(biāo)
GET product/_search
{"size": 0, "aggs": {"price_stats": {"stats": {"field": "price"}}}
}
##按照name去重的數(shù)量
GET product/_search
{"size": 0, "aggs": {"type_count": {"cardinality": {"field": "name"}}}
}
GET product/_search
{"size": 0, "aggs": {"type_count": {"cardinality": {"field": "name.keyword"}}}
}
##對type計算去重后數(shù)量
GET product/_search
{"size": 0, "aggs": {"type_count": {"cardinality": {"field": "lv.keyword"}}}
}
##*********************************************
## 管道聚合 二次聚合
## 例:統(tǒng)計平均價格最低的商品分類
GET product/_search
{"size": 0, "aggs": {"type_bucket": {"terms": {"field": "type.keyword"},"aggs": {"price_bucket": {"avg": {"field": "price"}}}},"min_bucket":{"min_bucket": {"buckets_path": "type_bucket>price_bucket"}}}
}##=============================================
## 嵌套聚合
## 語法
GET product/_search
{"size": 0,"aggs": {"<agg_name>": {"<agg_type>": {"field": "<field_name>"},"aggs": {"<agg_name_child>": {"<agg_type>": {"field": "<field_name>"}}}}}
}
# 例:統(tǒng)計不同類型商品的不同級別的數(shù)量
GET product/_search
{"size": 0, "aggs": {"type_lv": {"terms": {"field": "type.keyword"},"aggs": {"lv": {"terms": {"field": "lv.keyword"}}}}}
}
#按照lv分桶 輸出每個桶的具體價格信息
GET product/_search
{"size": 0, "aggs": {"lv_price": {"terms": {"field": "lv.keyword"},"aggs": {"price": {"stats": {"field": "price"}}}}}
}##結(jié)合了上面兩個例子
##統(tǒng)計不同類型商品 不同檔次的 價格信息 標(biāo)簽信息
GET product/_search
{"size": 0, "aggs": {"type_agg": {"terms": {"field": "type.keyword"},"aggs": {"lv_agg": {"terms": {"field": "lv.keyword"},"aggs": {"price_stats": {"stats": {"field": "price"}},"tags_buckets": {"terms": {"field": "tags.keyword"}}}}}}}
}## 統(tǒng)計每個商品類型中 不同檔次分類商品中 平均價格最低的檔次
GET product/_search
{"size": 0,"aggs": {"type_bucket": {"terms": {"field": "type.keyword"},"aggs": {"lv_bucket": {"terms": {"field": "lv.keyword"},"aggs": {"price_avg": {"avg": {"field": "price"}}}},"min_bucket": {"min_bucket": {"buckets_path": "lv_bucket>price_avg"}}}}}
}#======================================================
#基于查詢結(jié)果的聚合
GET product/_search
{"size": 0, "query": {"range": {"price": {"gte": 5000}}}, "aggs": {"tags_bucket": {"terms": {"field": "tags.keyword"}}}
}#基于filter的aggs
GET product/_search
{"query": {"constant_score": {"filter": {"range": {"price": {"gte": 5000}}}}},"aggs": {"tags_bucket": {"terms": {"field": "tags.keyword"}}}
}GET product/_search
{"query": {"bool": {"filter": {"range": {"price": {"gte": 5000}}}}}, "aggs": {"tags_bucket": {"terms": {"field": "tags.keyword"}}}
}#基于聚合的查詢
GET product/_search
{"aggs": {"tags_bucket": {"terms": {"field": "tags.keyword"}}},"post_filter": {"term": {"tags.keyword": "性價比"}}
}#取消查詢條件&&查詢條件嵌套
## 例:最貴、最便宜和平均價格三個指標(biāo)
GET product/_search
{"size": 10,"query": {"range": {"price": {"gte": 4000}}},"aggs": {"max_price": {"max": {"field": "price"}},"min_price": {"min": {"field": "price"}},"avg_price": {"avg": {"field": "price"}},"all_avg_price": {"global": {},"aggs": {"avg_price": {"avg": {"field": "price"}}}},"muti_avg_price": {"filter": {"range": {"price": {"lte": 4500}}}, "aggs": {"avg_price": {"avg": {"field": "price"}}}}}
}#===============================================
#聚合排序_count _key _term
GET product/_search
{"size": 0,"aggs": {"type_agg": {"terms": {"field": "tags","order": {"_count": "desc"},"size": 10}}}
}
#多級排序
GET product/_search?size=0
{"aggs": {"first_sort": {"terms": {"field": "type.keyword","order": {"_count": "desc"}},"aggs": {"second_sort": {"terms": {"field": "lv.keyword","order": {"_count": "asc"}}}}}}
}#多層排序
GET product/_search
{"size": 0,"aggs": {"tag_avg_price": {"terms": {"field": "type.keyword","order": {"agg_stats>stats.sum": "desc"}},"aggs": {"agg_stats": {"filter": {"terms": {"type.keyword": ["耳機(jī)","手機(jī)","電視"]}},"aggs": {"stats": {"extended_stats": {"field": "price"}}}}}}}
}#===========================================================
# 常用的查詢函數(shù)
## histogram 直方圖 或者 柱狀圖
GET product/_search
{"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 0,"to": 1000},{"from": 1000,"to": 2000},{"from": 3000,"to": 4000},{"from": 4000,"to": 5000}]}}}
}
GET product/_search?size=0
{"aggs": {"price_range": {"range": {"field": "createtime","ranges": [{"from": "2020-05-01", "to": "2020-05-31"},{"from": "2020-06-01","to": "2020-06-30"},{"from": "2020-07-01","to": "2020-07-31"},{"from": "2020-08-01"}]}}}
}
#空值的處理邏輯 對字段的空值賦予默認(rèn)值
GET product/_search?size=0
{"aggs": {"price_histogram": {"histogram": {"field": "price","interval": 1000,"keyed": true,"min_doc_count": 0,"missing": 1999}}}
}
#date-histogram
#ms s m h d
GET product/_search?size=0
{"aggs": {"my_date_histogram": {"date_histogram": {"field": "createtime","calendar_interval": "month","min_doc_count": 0,"format": "yyyy-MM", "extended_bounds": {"min": "2020-01","max": "2020-12"},"order": {"_count": "desc"}}}}
}
GET product/_search?size=0
{"aggs": {"my_auto_histogram": {"auto_date_histogram": {"field": "createtime","format": "yyyy-MM-dd","buckets": 180}}}
}
#cumulative_sum
GET product/_search?size=0
{"aggs": {"my_date_histogram": {"date_histogram": {"field": "createtime","calendar_interval": "month","min_doc_count": 0,"format": "yyyy-MM", "extended_bounds": {"min": "2020-01","max": "2020-12"}},"aggs": {"sum_agg": {"sum": {"field": "price"}},"my_cumulative_sum":{"cumulative_sum": {"buckets_path": "sum_agg"}}}}}
}
## percentile 百分位統(tǒng)計 或者 餅狀圖
## https://www.elastic.co/guide/en/elasticsearch/reference/7.10/search-aggregations-metrics-percentile-aggregation.htmlGET product/_search?size=0
{"aggs": {"price_percentiles": {"percentiles": {"field": "price","percents": [1,5,25,50,75,95,99]}}}
}
#percentile_ranks
#TDigest
GET product/_search?size=0
{"aggs": {"price_percentiles": {"percentile_ranks": {"field": "price","values": [1000,2000,3000,4000,5000,6000]}}}
}