公司網(wǎng)站建設(shè)調(diào)研全網(wǎng)搜索關(guān)鍵詞查詢
首先,歡迎使用DHorse部署k8s應(yīng)用。
k8s可以通過top命令來查詢pod和node的資源使用情況,如果直接運行該命令,如下所示。
[root@centos05 deployment]# kubectl top pod
W0306 15:23:24.990550 8247 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
error: Metrics API not available
top命令依賴于metrics server,而k8s默認未安裝該組件,下面詳細介紹使用過程。
安裝過程
- 下載部署文件
下載components.yaml文件
- 修改鏡像地址
將部署文件中鏡像地址修改為國內(nèi)的地址,大概在部署文件的第140行。
原配置是:
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.2
修改后的配置是:
image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.2
- 部署metrics server
[root@centos05 deployment]# kubectl create -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
查看metric server的運行情況,發(fā)現(xiàn)探針問題:Readiness probe failed: HTTP probe failed with statuscode: 500
[root@centos05 deployment]# kubectl get pods -n kube-system | grep metrics
kube-system metrics-server-6ffc8966f5-84hbb 0/1 Running 0 2m23s
[root@centos05 deployment]# kubectl describe pod metrics-server-6ffc8966f5-84hbb -n kube-system
進而查看pod的日志:
[root@centos05 deployment]# kubectl logs metrics-server-6ffc8966f5-84hbb -n kube-system
I1010 16:27:46.228594 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1010 16:27:46.633494 1 secure_serving.go:266] Serving securely on [::]:4443
I1010 16:27:46.633585 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1010 16:27:46.633616 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1010 16:27:46.633653 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I1010 16:27:46.634221 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W1010 16:27:46.634296 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I1010 16:27:46.634365 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1010 16:27:46.634370 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1010 16:27:46.634409 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1010 16:27:46.634415 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E1010 16:27:46.641663 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.22:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.22 because it doesn't contain any IP SANs" node="k8s-slave2"
E1010 16:27:46.645389 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.20:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.20 because it doesn't contain any IP SANs" node="k8s-master"
E1010 16:27:46.652261 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.21:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.21 because it doesn't contain any IP SANs" node="k8s-slave1"
I1010 16:27:46.733747 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I1010 16:27:46.735167 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1010 16:27:46.735194 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1010 16:28:01.643646 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.22:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.22 because it doesn't contain any IP SANs" node="k8s-slave2"
E1010 16:28:01.643805 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.21:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.21 because it doesn't contain any IP SANs" node="k8s-slave1"
E1010 16:28:01.646721 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.20:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.20 because it doesn't contain any IP SANs" node="k8s-master"
I1010 16:28:13.397373 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
可以確定pod異常是因為:Readiness Probe 探針檢測到 Metris 容器啟動后對 http Get 探針存活沒反應(yīng),具體原因是:cannot validate certificate for 192.168.100.22 because it doesn’t contain any IP SANs" node=“k8s-slave2”
查看 metrics-server 的文檔(https://github.com/kubernetes…),有如下一段說明:
Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing
–kubelet-insecure-tls to Metrics Server)
意思是:kubelet 證書需要由集群證書頒發(fā)機構(gòu)簽名(或者通過向 Metrics Server 傳遞參數(shù) --kubelet-insecure-tls 來禁用證書驗證)。
由于是測試環(huán)境,我們選擇使用參數(shù)禁用證書驗證,生產(chǎn)環(huán)境不推薦這樣做!!!
在大概 139 行的位置追加參數(shù):–kubelet-insecure-tls,修改后內(nèi)容如下:
spec:containers:- args:- --cert-dir=/tmp- --secure-port=4443- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname- --kubelet-use-node-status-port- --metric-resolution=15s- --kubelet-insecure-tls
再次部署文件:
[root@centos05 deployment]# kubectl apply -f components.yaml
查看pod已經(jīng)正常運行:
[root@centos05 deployment]# kubectl get pod -A | grep metrics
kube-system metrics-server-fd9598766-8zphn 1/1 Running 0 89s
執(zhí)行kubectl top命令成功:
[root@centos05 deployment]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
hello-1-qa-dhorse-6fc54647c-5zkjc 501m 133Mi
[root@centos05 deployment]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
centos05 192m 4% 1610Mi 59%
centos06 107m 2% 854Mi 50%
也可以通過代碼方式獲取資源指標(biāo),如:
public PodMetricsList replicaMetrics(ClusterPO clusterPO, String namespace) {ApiClient apiClient = this.apiClient(clusterPO.getClusterUrl(), clusterPO.getAuthToken());Metrics metrics = new Metrics(apiClient);try {return metrics.getPodMetrics(namespace);} catch (ApiException e) {logger.error("Failed to list pod metrics", e);}return null;
}
同時,歡迎使用DHorse進行部署、監(jiān)控應(yīng)用。