網(wǎng)站建設(shè)維護(hù)資質(zhì)免費(fèi)收錄網(wǎng)站提交
前言
安裝前請(qǐng)注意捋清楚版本關(guān)系,如kubeflow版本對(duì)應(yīng)的K8S版本及其相關(guān)工具版本等等
我們此處使用的是是kubeflow-1.6.1和K8s-v1.22.8
單機(jī)部署
部署K8S
初始化Linux
1.關(guān)閉selinux
setenforce 0 && sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
2.關(guān)閉防火墻
systemctl stop firewalld
systemctl disable firewalld
3.設(shè)置hostname
hostnamectl set-hostname ai-node
4.關(guān)閉swap
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
5.修改內(nèi)核參數(shù)和模塊
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
#使內(nèi)核參數(shù)配置生效
sysctl --system
modprobe br_netfilter
lsmod | grep br_netfilter
6.更新系統(tǒng)及內(nèi)核(可選)
升級(jí)centos7及其內(nèi)核
安裝docker
1.安裝docker-ce
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum -y install docker-ce
2.替換國(guó)內(nèi)鏡像
{"exec-opts": ["native.cgroupdriver=systemd"],"registry-mirrors": ["https://registry.docker-cn.com","http://hub-mirror.c.163.com","https://docker.mirrors.ustc.edu.cn"]
}
3.啟動(dòng)docker-ce
systemctl start docker
systemctl enable docker
安裝kubernetes
1.配置yum源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
# repo_gpgcheck要設(shè)置為0,如設(shè)置為1會(huì)導(dǎo)致后面在install kubelet、kubeadm、kubectl的時(shí)候報(bào)[Errno -1] repomd.xml signature could not be verified for kubernetes Trying other mirror.
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
2.安裝kubernetes基礎(chǔ)服務(wù)
yum install -y kubelet-1.22.8 kubeadm-1.22.8 kubectl-1.22.8
systemctl start kubelet
systemctl enable kubelet.service
3.初始化K8S
# apiserver-advertise-address指定master的interface,版本號(hào)與安裝的K8S版本要一致,pod-network-cidr指定Pod網(wǎng)絡(luò)的范圍,這里使用flannel網(wǎng)絡(luò)方案。
# 安裝成功之后,會(huì)打印kubeadm join的輸出,記得要保存下來(lái),后面需要這個(gè)命令將各個(gè)節(jié)點(diǎn)加入集群中
kubeadm init --apiserver-advertise-address=192.168.0.240 --kubernetes-version v1.22.8 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16## 如果初始化過(guò)程中出現(xiàn)錯(cuò)誤,就reset之后重新init
# kubeadm reset
# rm -rf $HOME/.kube/config?# 查看是否所有的pod都處于running狀態(tài)
kubectl get pod -n kube-system -o wide
4.初始化kubectl
mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config
5.設(shè)置kubectl自動(dòng)補(bǔ)充
source <(kubectl completion bash)
可以加入~/.bashrc中以便在新的session中不需要手動(dòng)加載
6.網(wǎng)絡(luò)插件
比較常用的時(shí)flannel和calico,flannel的功能比較簡(jiǎn)單,不具備復(fù)雜網(wǎng)絡(luò)的配置能力,calico是比較出色的網(wǎng)絡(luò)管理插件,單具備復(fù)雜網(wǎng)絡(luò)配置能力的同時(shí),往往意味著本身的配置比較復(fù)雜,所以相對(duì)而言,比較小而簡(jiǎn)單的集群使用flannel,考慮到日后擴(kuò)容,未來(lái)網(wǎng)絡(luò)可能需要加入更多設(shè)備,配置更多策略,則使用calico更好
以下網(wǎng)絡(luò)插件二選一即可
6.1 安裝calico網(wǎng)絡(luò)插件
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
##這個(gè)地址現(xiàn)在404了,應(yīng)該是官方改版了,請(qǐng)參考官方文檔
calico應(yīng)該改版了,新的部署方式參考官方:install-calico
6.2 安裝flannel網(wǎng)絡(luò)插件
For Kubernetes v1.17+
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
7.解除master限制
默認(rèn)k8s的master節(jié)點(diǎn)是不能跑pod的業(yè)務(wù),需要執(zhí)行以下命令解除限制
kubectl taint nodes --all node-role.kubernetes.io/master-
部署kubeflow
下載安裝腳本
官方倉(cāng)庫(kù)地址:https://github.com/kubeflow/manifests
安裝kustomize
kustomize 是一個(gè)通過(guò) kustomization 文件定制 kubernetes 對(duì)象的工具,它可以通過(guò)一些資源生成一些新的資源,也可以定制不同的資源的集合。
wget https://github.com/kubernetes-sigs/kustomize/releases/download/v3.2.0/kustomize_3.2.0_linux_amd64
mv kustomize_3.2.0_linux_amd64 kustomize
chmod +x kustomize
mv kustomize /usr/bin/
鏡像同步
1.說(shuō)明
因?yàn)閗ubeflow的鏡像存儲(chǔ)在google鏡像倉(cāng)庫(kù),國(guó)內(nèi)被墻,因此正常安裝方式是不會(huì)安裝成功的,此時(shí)提供兩種途徑
1.使用國(guó)內(nèi)的同步鏡像
2.自己從google倉(cāng)庫(kù)同步鏡像
第一種方式顯然比較簡(jiǎn)單,但是問(wèn)題也和明顯,鏡像很多時(shí)候同步并不及時(shí),因此很多鏡像都是老版本的,如果想用全新版本安裝,想要找到一個(gè)合適的鏡像倉(cāng)庫(kù),還是比較費(fèi)勁的
第二種方式就一個(gè)要求:使用科技上網(wǎng)【不懂的話還是選第一種方式吧】
因?yàn)槲覀儺?dāng)前需要安裝的kubeflow-1.6.1是最新版本,國(guó)內(nèi)的同步倉(cāng)庫(kù)目前沒(méi)發(fā)現(xiàn)最新版本,因此,我們選擇第二種方式
2.同步
2.1 網(wǎng)絡(luò)問(wèn)題搞定后【可以科技上網(wǎng)】,將剛才下載的manifests-1.6.1.tar.gz包解壓
tar -zxvf manifests-1.6.1.tar.gz
2.2 進(jìn)入目錄
cd manifests-1.6.1
獲取gcr鏡像,因?yàn)槲业木W(wǎng)絡(luò)只無(wú)法獲取gcr.io, quay.io正常,可以根據(jù)需求修改
kustomize build example |grep 'image: gcr.io'|awk '$2 != "" { print $2}' |sort -u
檢查一下如果有鏡像不帶tag,說(shuō)明提取的時(shí)候有問(wèn)題,將awk去掉后仔細(xì)看看,gcr.io倉(cāng)庫(kù)下載是需要帶tag的,換句話說(shuō),好像沒(méi)有l(wèi)atest
2.3 使用腳本將以上獲取的鏡像同步至指定倉(cāng)庫(kù),可以是dockerhub,也可以是私有鏡像倉(cāng)庫(kù)
腳本配置,此腳本是網(wǎng)友編寫,源碼地址:https://github.com/kenwoodjw/sync_gcr
# tree sync_gcr
sync_gcr/
├── images.txt
├── load_image.py
├── README.md
└── sync_image.py
將步驟2.2獲取的鏡像列表放到images.txt中
修改sync_image.py中的鏡像倉(cāng)庫(kù)及相關(guān)登錄信息(如果是public倉(cāng)庫(kù),則不需要login)
# coding:utf-8
import subprocess, os
def get_filename():with open("images.txt", "r") as f:lines = f.read().split('\n')# print(lines)return linesdef pull_image():name_list= get_filename()for name in name_list:if 'sha256' in name:print(name)sha256_name = name.split("@")new_name = sha256_name[0].split("/")[-1]tag = sha256_name[-1].split(":")[-1][0:6]#此處為了加載鏡像速度,我放在內(nèi)網(wǎng)的私有鏡像倉(cāng)庫(kù)中image = "192.168.8.38:9090/grc-io/" + new_name + ":"+ tagcmd = "docker tag {0} {1}".format(name, image)subprocess.call("docker pull {}".format(name), shell=True)subprocess.run(["docker", "tag", name, image])#subprocess.call("docker login -u user -p passwd", shell=True)subprocess.call("docker push {}".format(image), shell=True)else:new_name = "192.168.8.38:9090/grc-io/" + name.split("/")[-1]cmd = "docker tag {0} {1}".format(name, new_name)subprocess.call("docker pull {}".format(name), shell=True)subprocess.run(["docker", "tag", name, new_name])#subprocess.call("docker login -u user -p passwd", shell=True)subprocess.call("docker push {}".format(new_name), shell=True)if __name__ == "__main__":pull_image()
2.4 執(zhí)行腳本
python sync_image.py
這是一個(gè)漫長(zhǎng)的過(guò)程,主要是需要從gcr.io下載鏡像,鏡像有大有小,數(shù)量較多,下載時(shí)間完全看個(gè)人網(wǎng)速,慢慢等待
在等待的過(guò)程中可以同步修改一下安裝文件,即2.5
2.5修改部署文件
因?yàn)槲覀儗㈢R像同步到自己的倉(cāng)庫(kù),所以需要修改一下鏡像地址
在manifests-1.6.1目錄下,打開(kāi)配置文件
vim example/kustomization.yaml
新增內(nèi)容【依據(jù)版本不同,鏡像版本也不同,請(qǐng)不要無(wú)腦照抄】
images:
- name: gcr.io/arrikto/istio/pilot:1.14.1-1-g19df463bbnewName: 192.168.8.38:9090/grc-io/pilotnewTag: "1.14.1-1-g19df463bb"
- name: gcr.io/arrikto/kubeflow/oidc-authservice:28c59efnewName: 192.168.8.38:9090/grc-io/oidc-authservicenewTag: "28c59ef"
- name: gcr.io/knative-releases/knative.dev/eventing/cmd/controller@sha256:dc0ac2d8f235edb04ec1290721f389d2bc719ab8b6222ee86f17af8d7d2a160fnewName: 192.168.8.38:9090/grc-io/controllernewTag: "dc0ac2"
- name: gcr.io/knative-releases/knative.dev/eventing/cmd/mtping@sha256:632d9d710d070efed2563f6125a87993e825e8e36562ec3da0366e2a897406c0newName: 192.168.8.38:9090/grc-io/mtpingnewTag: "632d9d"
- name: gcr.io/knative-releases/knative.dev/eventing/cmd/webhook@sha256:b7faf7d253bd256dbe08f1cac084469128989cf39abbe256ecb4e1d4eb085a31newName: 192.168.8.38:9090/grc-io/webhooknewTag: "b7faf7"
- name: gcr.io/knative-releases/knative.dev/net-istio/cmd/controller@sha256:f253b82941c2220181cee80d7488fe1cefce9d49ab30bdb54bcb8c76515f7a26newName: 192.168.8.38:9090/grc-io/controllernewTag: "f253b8"
- name: gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook@sha256:a705c1ea8e9e556f860314fe055082fbe3cde6a924c29291955f98d979f8185enewName: 192.168.8.38:9090/grc-io/webhooknewTag: "a705c1"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:93ff6e69357785ff97806945b284cbd1d37e50402b876a320645be8877c0d7b7newName: 192.168.8.38:9090/grc-io/activatornewTag: "93ff6e"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler@sha256:007820fdb75b60e6fd5a25e65fd6ad9744082a6bf195d72795561c91b425d016newName: 192.168.8.38:9090/grc-io/autoscalernewTag: "007820"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/controller@sha256:75cfdcfa050af9522e798e820ba5483b9093de1ce520207a3fedf112d73a4686newName: 192.168.8.38:9090/grc-io/controllernewTag: "75cfdc"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping@sha256:23baa19322320f25a462568eded1276601ef67194883db9211e1ea24f21a0bebnewName: 192.168.8.38:9090/grc-io/domain-mappingnewTag: "23baa1"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping-webhook@sha256:847bb97e38440c71cb4bcc3e430743e18b328ad1e168b6fca35b10353b9a2c22newName: 192.168.8.38:9090/grc-io/domain-mapping-webhooknewTag: "847bb9"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:14415b204ea8d0567235143a6c3377f49cbd35f18dc84dfa4baa7695c2a9b53dnewName: 192.168.8.38:9090/grc-io/queuenewTag: "14415b"
- name: gcr.io/knative-releases/knative.dev/serving/cmd/webhook@sha256:9084ea8498eae3c6c4364a397d66516a25e48488f4a9871ef765fa554ba483f0newName: 192.168.8.38:9090/grc-io/webhooknewTag: "9084ea"
- name: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0newName: 192.168.8.38:9090/grc-io/kube-rbac-proxynewTag: "v0.8.0"
- name: gcr.io/ml-pipeline/api-server:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/api-servernewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/cache-server:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/cache-servernewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/frontend:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/frontendnewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/metadata-writer:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/metadata-writernewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliancenewName: 192.168.8.38:9090/grc-io/minionewTag: "RELEASE.2019-08-14T20-37-41Z-license-compliance"
- name: gcr.io/ml-pipeline/mysql:5.7-debiannewName: 192.168.8.38:9090/grc-io/mysqlnewTag: "5.7-debian"
- name: gcr.io/ml-pipeline/persistenceagent:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/persistenceagentnewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/scheduledworkflow:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/scheduledworkflownewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/viewer-crd-controller:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/viewer-crd-controllernewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/visualization-server:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/visualization-servernewTag: "2.0.0-alpha.5"
- name: gcr.io/ml-pipeline/workflow-controller:v3.3.8-license-compliancenewName: 192.168.8.38:9090/grc-io/workflow-controllernewTag: "v3.3.8-license-compliance"
- name: gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0newName: 192.168.8.38:9090/grc-io/ml_metadata_store_servernewTag: "1.5.0"
- name: gcr.io/ml-pipeline/metadata-envoy:2.0.0-alpha.5newName: 192.168.8.38:9090/grc-io/metadata-envoynewTag: "2.0.0-alpha.5"
具體位置
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomizationresources:
# Cert-Manager
- ../common/cert-manager/cert-manager/base
- ../common/cert-manager/kubeflow-issuer/base
# Istio
- ../common/istio-1-14/istio-crds/base
- ../common/istio-1-14/istio-namespace/base
- ../common/istio-1-14/istio-install/base
# OIDC Authservice
- ../common/oidc-authservice/base
# Dex
- ../common/dex/overlays/istio
# KNative
- ../common/knative/knative-serving/overlays/gateways
- ../common/knative/knative-eventing/base
- ../common/istio-1-14/cluster-local-gateway/base
# Kubeflow namespace
- ../common/kubeflow-namespace/base
# Kubeflow Roles
- ../common/kubeflow-roles/base
# Kubeflow Istio Resources
- ../common/istio-1-14/kubeflow-istio-resources/base# Kubeflow Pipelines
- ../apps/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user
# Katib
- ../apps/katib/upstream/installs/katib-with-kubeflow
# Central Dashboard
- ../apps/centraldashboard/upstream/overlays/kserve
# Admission Webhook
- ../apps/admission-webhook/upstream/overlays/cert-manager
# Jupyter Web App
- ../apps/jupyter/jupyter-web-app/upstream/overlays/istio
# Notebook Controller
- ../apps/jupyter/notebook-controller/upstream/overlays/kubeflow
# Profiles + KFAM
- ../apps/profiles/upstream/overlays/kubeflow
# Volumes Web App
- ../apps/volumes-web-app/upstream/overlays/istio
# Tensorboards Controller
- ../apps/tensorboard/tensorboard-controller/upstream/overlays/kubeflow
# Tensorboard Web App
- ../apps/tensorboard/tensorboards-web-app/upstream/overlays/istio
# Training Operator
- ../apps/training-operator/upstream/overlays/kubeflow
# User namespace
- ../common/user-namespace/base# KServe
- ../contrib/kserve/kserve
- ../contrib/kserve/models-web-app/overlays/kubeflow
images:
- name: gcr.io/arrikto/istio/pilot:1.14.1-1-g19df463bbnewName: 192.168.8.38:9090/grc-io/pilotnewTag: "1.14.1-1-g19df463bb"
- name: gcr.io/arrikto/kubeflow/oidc-authservice:28c59efnewName: 192.168.8.38:9090/grc-io/oidc-authservicenewTag: "28c59ef"
- name: gcr.io/knative-releases/knative.dev/eventing/cmd/controller@sha256:dc0ac2d8f235edb04ec1290721f389d2bc719ab8b6222ee86f17af8d7d2a160fnewName: 192.168.8.38:9090/grc-io/controllernewTag: "dc0ac2"
........
2.6 創(chuàng)建PV(可選)
如果你的單機(jī)集群沒(méi)有任何Provisioner,那就需要手動(dòng)創(chuàng)建pv,如果已經(jīng)安裝了相關(guān)的Provisioner和StorageClass,那么這一步可以省略。
2.6.1 手工創(chuàng)建
kubeflow中有四個(gè)服務(wù)是statefulsets,因此需要掛在卷,因?yàn)槲覀兪菃螜C(jī)安裝,所以直接創(chuàng)建local存儲(chǔ)即可,PV大小請(qǐng)自行配置
##創(chuàng)建存儲(chǔ)卷目錄
mkdir -p /opt/kubeflow/{pv1,pv2,pv3,pv4}
# 創(chuàng)建pv的yaml
cat pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:name: pv-001
spec:capacity:storage: 80GiaccessModes:- ReadWriteOncehostPath:path: "/opt/kubeflow/pv1"---
apiVersion: v1
kind: PersistentVolume
metadata:name: pv-002
spec:capacity:storage: 80GiaccessModes:- ReadWriteOncehostPath:path: "/opt/kubeflow/pv2"---
apiVersion: v1
kind: PersistentVolume
metadata:name: pv-003
spec:capacity:storage: 80GiaccessModes:- ReadWriteOncehostPath:path: "/opt/kubeflow/pv3"---
apiVersion: v1
kind: PersistentVolume
metadata:name: pv-004
spec:capacity:storage: 80GiaccessModes:- ReadWriteOncehostPath:path: "/opt/kubeflow/pv4"
執(zhí)行創(chuàng)建
kubectl apply -f pv.yaml
2.6.2 OpenEBS 實(shí)現(xiàn) Local PV 動(dòng)態(tài)持久化存儲(chǔ)
OpenEBS(https://openebs.io) 是一種模擬了?AWS?的 EBS、阿里云的云盤等塊存儲(chǔ)實(shí)現(xiàn)的基于容器的存儲(chǔ)開(kāi)源軟件。具體描述大家可以訪問(wèn)官網(wǎng),此處進(jìn)行一下安裝教程(簡(jiǎn)單)
如果大家選擇默認(rèn)安裝,則直接執(zhí)行
kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
如果需要進(jìn)行一些自定義,則可以直接將yaml文件下載下來(lái),修改后再部署,比如默認(rèn)pv存儲(chǔ)路徑是/var/openebs/local,一般系統(tǒng)盤較小,如果有單獨(dú)數(shù)據(jù)盤,我們可以修改一下OPENEBS_IO_LOCALPV_HOSTPATH_DIR
的參數(shù)。
同時(shí)要修改的還有openebs-hostpath的- name: BasePath
的value值,改成跟OPENEBS_IO_LOCALPV_HOSTPATH_DIR
相同即可。
查看pod啟動(dòng)情況
kubectl get pods -n openebs
默認(rèn)情況下 OpenEBS 還會(huì)安裝一些內(nèi)置的 StorageClass 對(duì)象:
kubectl get sc
設(shè)置openebs-hostpath為default sc:
kubectl patch storageclass openebs-hostpath -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
使用動(dòng)態(tài)存儲(chǔ)的好處是,部署完kubeflow后,使用過(guò)程中很多任務(wù)都需要pv,比如notebook的創(chuàng)建等。如果還是手工創(chuàng)建pv,那么每次創(chuàng)建新的需要持久化的目標(biāo)時(shí),就需要手工操作一下pv創(chuàng)建,比較麻煩。動(dòng)態(tài)存儲(chǔ)就解決了這些問(wèn)題
2.7 等鏡像全部同步完成后,執(zhí)行部署
依然在manifests-1.6.1
目錄下
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
查看pod部署情況
kubectl get pods --all-namespaces
[root@ai-node manifests-1.6.1]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
auth dex-559dbcd758-wmf57 1/1 Running 2 (21h ago) 46h
cert-manager cert-manager-7b8c77d4bd-8jjmd 1/1 Running 2 (21h ago) 46h
cert-manager cert-manager-cainjector-7c744f57b5-vmgws 1/1 Running 2 (21h ago) 46h
cert-manager cert-manager-webhook-fcd445bc4-rspjk 1/1 Running 2 (21h ago) 46h
istio-system authservice-0 1/1 Running 0 5h7m
istio-system cluster-local-gateway-55ff4696f4-ddjzl 1/1 Running 0 5h24m
istio-system istio-ingressgateway-6668f9548d-zh6tp 1/1 Running 0 5h24m
istio-system istiod-64bd848cc4-8ktxh 1/1 Running 0 5h24m
knative-eventing eventing-controller-55c757fbf4-z5z8b 1/1 Running 1 (21h ago) 22h
knative-eventing eventing-webhook-78dccf77d-7xf2c 1/1 Running 1 (21h ago) 22h
knative-serving activator-f6fbdbdd7-kgxjk 2/2 Running 0 21h
knative-serving autoscaler-5c546f654c-w8rmv 2/2 Running 0 21h
knative-serving controller-594bc5bbb9-dn7nx 2/2 Running 0 21h
knative-serving domain-mapping-849f785857-8h9hx 2/2 Running 0 21h
knative-serving domainmapping-webhook-5954cfd85b-hrdcl 2/2 Running 0 21h
knative-serving net-istio-controller-655fd85bc4-gqgrk 2/2 Running 0 21h
knative-serving net-istio-webhook-66c78c9cdb-6lwtl 2/2 Running 0 21h
knative-serving webhook-6895c68dfd-h58l4 2/2 Running 0 21h
kube-system calico-kube-controllers-796cc7f49d-ldvfp 1/1 Running 2 (21h ago) 46h
kube-system calico-node-fdqqp 1/1 Running 2 (21h ago) 46h
kube-system coredns-7f6cbbb7b8-8lwrc 1/1 Running 2 (21h ago) 46h
kube-system coredns-7f6cbbb7b8-cgqkc 1/1 Running 2 (21h ago) 46h
kube-system etcd-ai-node 1/1 Running 2 (21h ago) 46h
kube-system kube-apiserver-ai-node 1/1 Running 2 (21h ago) 46h
kube-system kube-controller-manager-ai-node 1/1 Running 2 (21h ago) 46h
kube-system kube-proxy-4xzxn 1/1 Running 2 (21h ago) 46h
kube-system kube-scheduler-ai-node 1/1 Running 2 (21h ago) 46h
kubeflow-user-example-com ml-pipeline-ui-artifact-69cc696464-mh4pb 2/2 Running 0 3h8m
kubeflow-user-example-com ml-pipeline-visualizationserver-64d797bd94-xn74q 2/2 Running 0 3h9m
kubeflow admission-webhook-deployment-79d6f8c8fb-qkssk 1/1 Running 0 5h24m
kubeflow cache-server-746dd68dd9-qff72 2/2 Running 0 5h23m
kubeflow centraldashboard-f64b457f-6npr2 2/2 Running 0 5h23m
kubeflow jupyter-web-app-deployment-576c56f555-th492 1/1 Running 0 5h23m
kubeflow katib-controller-75b988dccc-d5xsz 1/1 Running 0 5h23m
kubeflow katib-db-manager-5d46869758-9khlf 1/1 Running 0 5h23m
kubeflow katib-mysql-5bf95ddfcc-p55d6 1/1 Running 0 5h23m
kubeflow katib-ui-766d5dc8ff-vngv2 1/1 Running 0 5h24m
kubeflow kserve-controller-manager-0 2/2 Running 0 5h23m
kubeflow kserve-models-web-app-5878544ffd-s8d84 2/2 Running 0 5h23m
kubeflow kubeflow-pipelines-profile-controller-5d98fd7b4f-765x7 1/1 Running 0 5h23m
kubeflow metacontroller-0 1/1 Running 0 5h23m
kubeflow metadata-envoy-deployment-5b96bc6fd6-rfhwj 1/1 Running 0 3h25m
kubeflow metadata-grpc-deployment-59d555db46-gbgn9 2/2 Running 5 (5h22m ago) 5h23m
kubeflow metadata-writer-76bbdb799f-kftpv 2/2 Running 1 (5h22m ago) 5h23m
kubeflow minio-86db59fd6-jv5xv 2/2 Running 0 5h23m
kubeflow ml-pipeline-8587f95f8f-2xl4w 2/2 Running 2 (5h20m ago) 5h24m
kubeflow ml-pipeline-persistenceagent-568f4bddb5-6bqxk 2/2 Running 0 5h24m
kubeflow ml-pipeline-scheduledworkflow-5c74f8dff4-wwpr5 2/2 Running 0 5h24m
kubeflow ml-pipeline-ui-5c684875c-5dm98 2/2 Running 0 5h23m
kubeflow ml-pipeline-viewer-crd-748d77b759-wz5r2 2/2 Running 1 (5h21m ago) 5h23m
kubeflow ml-pipeline-visualizationserver-5b697bd55d-97xwt 2/2 Running 0 3h25m
kubeflow mysql-77578849cc-dtx42 2/2 Running 0 5h23m
kubeflow notebook-controller-deployment-68756676d9-zdbr4 2/2 Running 1 (5h21m ago) 5h23m
kubeflow profiles-deployment-79d49b8648-xd7pg 3/3 Running 1 (5h21m ago) 5h23m
kubeflow tensorboard-controller-deployment-6f879dd7f6-s7mk7 3/3 Running 1 (5h23m ago) 5h23m
kubeflow tensorboards-web-app-deployment-6849d8c9bc-bs52h 1/1 Running 0 5h23m
kubeflow training-operator-6c9f6fd894-qsxcg 1/1 Running 0 5h23m
kubeflow volumes-web-app-deployment-5f56dd78-22jvh 1/1 Running 0 5h23m
kubeflow workflow-controller-686dd58c95-5cmxg 2/2 Running 1 (5h23m ago) 5h23m
等所有容器都起來(lái)后,查看80的映射端口是哪個(gè)
kubectl -n istio-system get svc istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway NodePort 10.1.109.168 <none> 15021:30965/TCP,80:31714/TCP,443:30626/TCP,31400:32535/TCP,15443:32221/TCP 46h
從輸出可以看到80映射的是31714,因此打開(kāi)kubeflow頁(yè)面的地址就是http://服務(wù)器IP:31714/
登錄的賬號(hào)一般默認(rèn)是user@example.com,密碼:12341234,如果不對(duì),則在dex的configmaps中查看
kubectl -n auth get configmaps dex -o yaml
apiVersion: v1
data:config.yaml: |issuer: http://dex.auth.svc.cluster.local:5556/dexstorage:type: kubernetesconfig:inCluster: trueweb:http: 0.0.0.0:5556logger:level: "debug"format: textoauth2:skipApprovalScreen: trueenablePasswordDB: truestaticPasswords:- email: user@example.comhash: $2y$12$4K/VkmDd1q1Orb3xAt82zu8gk7Ad6ReFR4LCP9UeYE90NLiN9Df72# https://github.com/dexidp/dex/pull/1601/commits# FIXME: Use hashFromEnv insteadusername: useruserID: "15841185641784"staticClients:# https://github.com/dexidp/dex/pull/1664- idEnv: OIDC_CLIENT_IDredirectURIs: ["/login/oidc"]name: 'Dex Login Application'secretEnv: OIDC_CLIENT_SECRET
kind: ConfigMap
metadata:annotations:kubectl.kubernetes.io/last-applied-configuration: |{"apiVersion":"v1","data":{"config.yaml":"issuer: http://dex.auth.svc.cluster.local:5556/dex\nstorage:\n type: kubernetes\n config:\n inCluster: true\nweb:\n http: 0.0.0.0:5556\nlogger:\n level: \"debug\"\n format: text\noauth2:\n skipApprovalScreen: true\nenablePasswordDB: true\nstaticPasswords:\n- email: user@example.com\n hash: $2y$12$4K/VkmDd1q1Orb3xAt82zu8gk7Ad6ReFR4LCP9UeYE90NLiN9Df72\n # https://github.com/dexidp/dex/pull/1601/commits\n # FIXME: Use hashFromEnv instead\n username: user\n userID: \"15841185641784\"\nstaticClients:\n# https://github.com/dexidp/dex/pull/1664\n- idEnv: OIDC_CLIENT_ID\n redirectURIs: [\"/login/oidc\"]\n name: 'Dex Login Application'\n secretEnv: OIDC_CLIENT_SECRET\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"dex","namespace":"auth"}}creationTimestamp: "2022-10-25T08:51:08Z"name: dexnamespace: authresourceVersion: "2956"uid: a0716cd6-f259-4a49-bee3-783c1069e8d2
staticPasswords.email就是用戶名
staticPasswords.hash就是密碼
生成代碼(python):
python console下執(zhí)行,可生成hash密碼
from passlib.hash import bcrypt;import getpass;print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))
報(bào)錯(cuò)修復(fù)
1.authservice-0 pod 啟動(dòng)失敗
istio-system下的authservice-0啟動(dòng)失敗,查看日志返現(xiàn)沒(méi)有權(quán)限操作目錄/var/lib/authservice/data.db
authservice-0 pod 啟動(dòng)失敗:Error opening bolt store: open /var/lib/authservice/data.db: permission denied
解決方案:
修改common/oidc-authservice/base/statefulset.yaml
, 添加以下內(nèi)容
initContainers:- name: fix-permissionimage: busyboxcommand: ['sh', '-c']args: ['chmod -R 777 /var/lib/authservice;']volumeMounts:- mountPath: /var/lib/authservicename: data
修改后的文件如下vim common/oidc-authservice/base/statefulset.yaml
:
apiVersion: apps/v1
kind: StatefulSet
metadata:name: authservice
spec:replicas: 1selector:matchLabels:app: authserviceserviceName: authservicetemplate:metadata:annotations:sidecar.istio.io/inject: "false"labels:app: authservicespec:initContainers:- name: fix-permissionimage: busyboxcommand: ['sh', '-c']args: ['chmod -R 777 /var/lib/authservice;']volumeMounts:- mountPath: /var/lib/authservicename: datacontainers:- name: authserviceimage: gcr.io/arrikto/kubeflow/oidc-authservice:6ac9400imagePullPolicy: Alwaysports:- name: http-apicontainerPort: 8080envFrom:- secretRef:name: oidc-authservice-client- configMapRef:name: oidc-authservice-parametersvolumeMounts:- name: datamountPath: /var/lib/authservicereadinessProbe:httpGet:path: /port: 8081securityContext:fsGroup: 111volumes:- name: datapersistentVolumeClaim:claimName: authservice-pvc
2.kubeflow-user-example-com鏡像失敗
kubeflow-user-example-com下兩個(gè)鏡像拉取失敗
ml-pipeline-ui-artifact、ml-pipeline-visualizationserver
查看后發(fā)現(xiàn)拉取的還是gcr.io的鏡像,還沒(méi)來(lái)得及分析具體在哪個(gè)配置文件中修改,但是鏡像跟kubeflow中的是相同的,因此只需要修改兩個(gè)deploment的鏡像地址即可,等有時(shí)間再仔細(xì)研究下在哪個(gè)部署文件修改
集群部署
K8S按照集群部署
kubeflow部署方式不變