哪家建設(shè)網(wǎng)站長春網(wǎng)站建設(shè)平臺(tái)
1 prometheus的思想
所有告警都應(yīng)該立刻處理掉,不應(yīng)該存在長時(shí)間未解決的告警。所以具體的表現(xiàn)就是高頻的數(shù)據(jù)采集,和告警的自動(dòng)恢復(fù)(默認(rèn)5分鐘)
2 alertmanager API調(diào)用
使用如下命令即可手工制造告警,注意startsAt和endsAt時(shí)間為當(dāng)前實(shí)際時(shí)間的UTC格式。
curl -H "Content-Type: application/json" -X POST -d '[{"labels":{"字段1": "值1", "字段2": "值2", "字段3": "值3"},"annotations":{"desc": "xxxx"},"generatorURL":"http://1.1.1.1","startsAt":"2022-08-10T20:57:46.000+08:00"}]' "http://127.0.0.1:9093/api/v2/alerts"
3 alertmanager告警json
alertmanager發(fā)送給receiver的為一個(gè)json,多條告警形成alerts數(shù)組,示例如下:
'{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"}], "groupLabels": {"字段1": "值1"}, "commonLabels": {"字段1": "值1", "字段2"}, "commonAnnotations": {"desc": "xxxx"}, "externalURL": "http://prometheus:9093", "version": "4", "truncatedAlerts": 0}'
告警恢復(fù)之后,對(duì)應(yīng)的status字段會(huì)被置為resolved,只有alerts數(shù)組中所有告警都變?yōu)閞esolved狀態(tài),整條json的status才會(huì)置為resolved。
4 參數(shù)說明
- group_wait:當(dāng)收到第一條告警時(shí),延時(shí)該時(shí)間才進(jìn)行發(fā)送,在此期間如果有其他告警被歸并到相同group下,則屆時(shí)會(huì)在json中一并發(fā)送給receiver。任何告警都會(huì)有此延時(shí)。
- group_interval:group_wait時(shí)間之后,每隔group_interval發(fā)送一次json給receiver
- repeat_interval:假如這個(gè)group沒有任何變化,那么經(jīng)過repeat_interval才會(huì)發(fā)送給receiver
4.1 舉例
假設(shè)group_wait設(shè)置為30秒,group_interval設(shè)置為1分鐘,repeat_interval設(shè)置為10分鐘
- 10:00:00(t0)接收到第一條告警,10:00:20接收到第二條告警,則在10:00:30(t0+group_wait)會(huì)發(fā)送第一條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 10:00:40產(chǎn)生第三條告警,則在10:01:30(t0+group_wait+group_interval)會(huì)發(fā)送第二條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 在10:01:40第一條告警恢復(fù)了,則10:02:30(t0+group_wait+group_interval*2)發(fā)送第三條json如下:
{"receiver": "email", "status": "firing", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
- 在10:02:40另外兩條告警也恢復(fù)了,則10:03:30(t0+group_wait+group_interval*3)發(fā)送第四條json如下:
{"receiver": "email", "status": "resolve", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
假如10:00:30發(fā)送第一條json之后,2、3、4步驟都沒有發(fā)生,且告警一直沒有恢復(fù),則10:10:30(t0+repeat_interval)會(huì)重復(fù)發(fā)送第一條json。