蕪湖做網(wǎng)站的鄧健照片企業(yè)培訓(xùn)
論文:ICWS 2021 移動(dòng)邊緣計(jì)算中的用戶分配:一種深度強(qiáng)化學(xué)習(xí)方法
代碼地址:使用強(qiáng)化學(xué)習(xí)在移動(dòng)邊緣計(jì)算環(huán)境中進(jìn)行用戶分配
目錄
Ⅰ.Introduction
II. MOTIVATION-A.驗(yàn)證假設(shè)的觀察結(jié)果?
II. MOTIVATION-A Motivating Example
數(shù)據(jù)驅(qū)動(dòng)方法的基本思想
III.強(qiáng)化學(xué)習(xí)分配
RL框架
?RL Allocation Algorithm代碼
IV.確定性方法作為BASELINE
?ILP Algorithm整數(shù)線性規(guī)劃(ILP)算法代碼
Nearest Neighbourhood Greedy Algorithm代碼?
整體框架代碼:
框架中用到的函數(shù):
對(duì)比實(shí)驗(yàn):
V.實(shí)驗(yàn)與結(jié)果分析
Ⅰ.Introduction
隨著對(duì)低延遲需求的增加,邊緣計(jì)算或霧計(jì)算逐漸成為主流。當(dāng)前最先進(jìn)的技術(shù)假設(shè)邊緣服務(wù)器上的總資源利用率等于從邊緣服務(wù)器提供的所有服務(wù)的資源利用率的總和。然而,邊緣服務(wù)器的資源利用率與服務(wù)請(qǐng)求數(shù)量之間通常存在高度非線性的關(guān)系,尤其CPU-GPU協(xié)同執(zhí)行,使資源利用率的數(shù)學(xué)建模異常復(fù)雜。
Motivation:目前用于解決邊緣用戶分配(EUA)問題的算法普遍假設(shè)服務(wù)的資源利用率與邊緣服務(wù)器上提供的服務(wù)數(shù)呈線性關(guān)系,假設(shè)服務(wù)器的總資源利用率是每個(gè)服務(wù)請(qǐng)求的資源占用量的累積總和。然而,實(shí)際服務(wù)過程中資源使用量是高度動(dòng)態(tài)的,難以通過數(shù)學(xué)建模精確描述。
Method:提出一種設(shè)備端深度強(qiáng)化學(xué)習(xí)(DRL)框架來解決邊緣用戶分配(EUA)問題,基于與 MEC 系統(tǒng)的經(jīng)驗(yàn)和交互逐步學(xué)習(xí)適當(dāng)?shù)馁Y源分配。DRL Agent在服務(wù)延遲閾值約束下學(xué)習(xí)在某邊緣服務(wù)器上服務(wù)的用戶數(shù)量。DRL Agent通過同時(shí)觀察系統(tǒng)參數(shù),直接從邊緣服務(wù)器中學(xué)習(xí)非線性依賴關(guān)系。
Conclusion:實(shí)驗(yàn)評(píng)估表明,DRL框架在用戶分配數(shù)量方面優(yōu)于傳統(tǒng)的確定性方法。
II. MOTIVATION-A.驗(yàn)證假設(shè)的觀察結(jié)果?
分別使用 GPU 和不使用 GPU 觀察目標(biāo)檢測(cè)應(yīng)用程序 YOLO處理圖像的服務(wù)執(zhí)行時(shí)間:
非線性關(guān)系:YOLO的執(zhí)行時(shí)間與CPU和GPU參數(shù)之間的關(guān)系是非線性的。表明執(zhí)行時(shí)間不僅僅取決于單一的參數(shù)變化,還受到許多隱含因素的影響,例如CPU/GPU的可用性。
復(fù)雜性:由于存在多個(gè)隱藏參數(shù),精確建模YOLO執(zhí)行時(shí)間與系統(tǒng)資源之間的關(guān)系是相對(duì)困難的。
建模服務(wù)執(zhí)行時(shí)間困難:
?可用處理器資源和執(zhí)行時(shí)間之間的非線性關(guān)系:可用處理器資源和執(zhí)行時(shí)間之間存在非線性關(guān)系,內(nèi)核數(shù)量少時(shí)減少內(nèi)核數(shù)量對(duì)執(zhí)行時(shí)間的影響更顯著,在高工作負(fù)載情況下增加后臺(tái)工作負(fù)載會(huì)顯著減慢執(zhí)行速度。
? 跨時(shí)間的變化:相同配置的同一臺(tái)機(jī)器上執(zhí)行時(shí)間也存在顯著差異,受服務(wù)調(diào)用模式和溫度等多個(gè)隱藏參數(shù)影響。
? 跨服務(wù)的變化:不同服務(wù)的執(zhí)行時(shí)間模式各異,如 Yolo 執(zhí)行時(shí)間隨用戶增加近似線性增長(zhǎng),而 MobileNet 表現(xiàn)為非線性,使建模任務(wù)復(fù)雜化。
II. MOTIVATION-A Motivating Example
七個(gè)移動(dòng)用戶 U1、U2 .. U7和兩個(gè)邊緣服務(wù)器 e1 和 e2,每個(gè)用戶請(qǐng)求邊緣服務(wù)器上可用的兩個(gè)服務(wù) s1 和 s2 之一,用戶U1、U2、U4和U6請(qǐng)求服務(wù)s1,其余用戶請(qǐng)求服務(wù)s2,每個(gè)邊緣服務(wù)器由一個(gè)資源向量4 元組(Available RAM、Core 的數(shù)量、CPU 后臺(tái)工作負(fù)載%、GPU Utilized%)。
解釋傳統(tǒng)的確定性方法如何導(dǎo)致用戶到服務(wù)器的分配效率低下:
如對(duì)于邊緣服務(wù)器 e1,服務(wù)s1的單個(gè)用戶請(qǐng)求的執(zhí)行時(shí)間為 3.12s,四個(gè)用戶的預(yù)期執(zhí)行時(shí)間用線性插值得到12.48 s。然而實(shí)際 3.468 s。
假設(shè)延遲閾值6.58 秒,用僅考慮服務(wù)單個(gè)請(qǐng)求的執(zhí)行時(shí)間的確定性方法將U1、U2和U3分配給e1,只會(huì)給s1分配2個(gè)用戶(每個(gè)用戶3.12s),給s2分配1個(gè)用戶(6.32s)
使用確定性方法分配的用戶總數(shù)為 3
數(shù)據(jù)驅(qū)動(dòng)方法的基本思想
數(shù)據(jù)驅(qū)動(dòng)方法基于實(shí)際的執(zhí)行時(shí)間數(shù)據(jù)和更精確的資源利用模型進(jìn)行用戶到服務(wù)器的分配,克服確定性方法的缺點(diǎn),提供更有效的資源分配。
運(yùn)行 YOLO 的四個(gè)用戶的執(zhí)行時(shí)間低于 6.55 的延遲閾值。實(shí)際上可以在 e1 上分配用戶 U1、U2、U4(只需 3.35)。e2 可以容納 U5 和 U7(只需 6.12)。
使用數(shù)據(jù)驅(qū)動(dòng)的方法分配總共 5 個(gè)用戶(比確定性方法多 2 個(gè)用戶),更準(zhǔn)確地建模資源利用率。
III.強(qiáng)化學(xué)習(xí)分配
MEC環(huán)境中每個(gè)邊緣服務(wù)器
的覆蓋率半徑為
。邊緣服務(wù)器覆蓋半徑內(nèi)的移動(dòng)users
可以請(qǐng)求托管在該服務(wù)器上的服務(wù)
。每個(gè)邊緣服務(wù)器上可用資源(RAM、Cores、CPU 背景工作負(fù)載%、GPU Utilized%)
分配策略根據(jù)動(dòng)態(tài)因素確定用戶與特定邊緣服務(wù)器的綁定:
a)新用戶加入邊緣服務(wù)器的覆蓋區(qū)域? (b)用戶遠(yuǎn)離邊緣服務(wù)器的覆蓋區(qū)域?? (c)用戶服務(wù)請(qǐng)求更改? (d)邊緣服務(wù)器或移動(dòng)用戶離線
分配策略目標(biāo)是在遵守服務(wù)執(zhí)行的延遲閾值𝛤Γ的同時(shí)盡可能多滿足服務(wù)請(qǐng)求。傳統(tǒng)的確定性方法依賴歷史數(shù)據(jù)預(yù)測(cè)執(zhí)行時(shí)間,但由于執(zhí)行時(shí)間的動(dòng)態(tài)特性,可能導(dǎo)致資源分配過度或不足。提出的RL學(xué)習(xí)框架通過從邊緣服務(wù)器的實(shí)際經(jīng)驗(yàn)中學(xué)習(xí)服務(wù)執(zhí)行模式,實(shí)時(shí)優(yōu)化用戶-服務(wù)器綁定決策,更有效地應(yīng)對(duì)動(dòng)態(tài)環(huán)境。
RL框架
RL 框架中的Agent通過探索環(huán)境并從動(dòng)作接收反饋來學(xué)習(xí)環(huán)境以選擇更好的動(dòng)作選擇,RL可以在不需要大量標(biāo)記數(shù)據(jù)的情況下學(xué)習(xí)底層環(huán)境。
在這個(gè) RL 框架中,Agent不斷地與邊緣服務(wù)器交互以采取行動(dòng),執(zhí)行多個(gè)服務(wù)請(qǐng)求并根據(jù)執(zhí)行占用空間獲得相應(yīng)的獎(jiǎng)勵(lì)。
狀態(tài):邊緣服務(wù)器的資源向量以及服務(wù)請(qǐng)求的數(shù)量
動(dòng)作:等待在邊緣服務(wù)器上執(zhí)行的服務(wù)請(qǐng)求集。
Reward的計(jì)算:?
#compute latencydef get_reward(self, state, action):#將動(dòng)作 action 轉(zhuǎn)換為兩個(gè)用戶數(shù)量 u1 和 u2u1 = action//5 + 1u2 = (action+1) - (u1-1)*5#sample time from dataframegram = state[0]gcores = state[1]gwl_c = state[2]gwl_g = state[3]gs1 = u1*100gs2 = u2*100#查找與當(dāng)前狀態(tài)和動(dòng)作匹配的記錄fetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) & (self.df['workload_gpu']==gwl_g) & (self.df['users_yolo']==gs1) & (self.df['users_mnet']==gs2)]#計(jì)算獎(jiǎng)勵(lì): if fetch_state.empty:#找不到匹配的狀態(tài)信息,則返回較大的負(fù)獎(jiǎng)勵(lì),表示這是一個(gè)不利的動(dòng)作選擇return -20 # 計(jì)算網(wǎng)絡(luò)延遲:time1 = fetch_state.sample().iloc[0]['time_yolo'] #從匹配的狀態(tài)信息中隨機(jī)選擇一個(gè)樣本time2 = fetch_state.sample().iloc[0]['time_mnet']#獲取 time_yolo 和 time_mnet 的延遲時(shí)間tm = max(time1, time2)#兩者中較大的延遲時(shí)間用作網(wǎng)絡(luò)延遲的閾值#add total latencies due to network based on number of u1 and u2if (tm <= latency_threshold): #用戶數(shù)量的變化對(duì)應(yīng)的獎(jiǎng)勵(lì),以及動(dòng)作本身的基礎(chǔ)獎(jiǎng)勵(lì)(u1 + u2)return 0.01*(gs1 - state[4]) + 0.01*(gs2 - state[5]) + u1 + u2 else:return -5 - u1 - u2
?訓(xùn)練RL Agent
RL環(huán)境:
import numpy as np
import gym
from gym import spaces
from gym.utils import seedingclass yolosystem(gym.Env):metadata = {'render.modes': ['human']}def __init__(self, n_actions, filename):super(yolosystem, self).__init__()self.n_actions = n_actions #total number of action space after ranging [10, 20, 30 ...]self.action_space = spaces.Discrete(self.n_actions) #total number of users in the action space; starts with zeroself.observation_space = spaces.Box(low=np.array([0,0,0,0,0,0]), high=np.array([11000]*6), shape=(6, ), dtype=np.int32) #<RAM, Core, Workload>self.seed()self.current_obs = np.array( [3000, 2, 40, 2, 100, 100] ) #current observation = <ram, cores, workload%>#Load datasetself.df = pd.read_csv(filename)# computer percentage of GPU usage from actual useself.df['workload_gpu'] = self.df['workload_gpu'].multiply(1/80).round(0).astype(int) #round gpu workload#get unique data in setself.ram = self.df.ram.unique()self.cores = self.df.cores.unique()self.workload_cpu = self.df.workload_cpu.unique()print(self.df) #print datasetdef seed(self, seed=1010):self.np_random, seed = seeding.np_random(seed)return [seed]def step(self, action):assert self.action_space.contains(action) #action should be in action spacestate = self.current_obsdone = True #Episodes ends after each action#compute latecy from the number of usersreward = self.get_reward(state, action) #linear latency
# print(action, reward)self.current_obs = self.get_random_state() #go to a random state# print(self.current_obs)return self.current_obs, reward, done, {} #no-states, reward, episode-done, no-infodef reset(self):self.current_obs = self.get_random_state()return self.current_obs #current state of the system with no loaddef render(self, mode='human', close=False):print(f"Current State:<{self.current_obs}>")#compute latencydef get_reward(self, state, action):#將動(dòng)作 action 轉(zhuǎn)換為兩個(gè)用戶數(shù)量 u1 和 u2u1 = action//5 + 1u2 = (action+1) - (u1-1)*5#sample time from dataframegram = state[0]gcores = state[1]gwl_c = state[2]gwl_g = state[3]gs1 = u1*100gs2 = u2*100#查找與當(dāng)前狀態(tài)和動(dòng)作匹配的記錄fetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) & (self.df['workload_gpu']==gwl_g) & (self.df['users_yolo']==gs1) & (self.df['users_mnet']==gs2)]#計(jì)算獎(jiǎng)勵(lì): if fetch_state.empty:#找不到匹配的狀態(tài)信息,則返回較大的負(fù)獎(jiǎng)勵(lì),表示這是一個(gè)不利的動(dòng)作選擇return -20 # 計(jì)算網(wǎng)絡(luò)延遲:time1 = fetch_state.sample().iloc[0]['time_yolo'] #從匹配的狀態(tài)信息中隨機(jī)選擇一個(gè)樣本time2 = fetch_state.sample().iloc[0]['time_mnet']#獲取 time_yolo 和 time_mnet 的延遲時(shí)間tm = max(time1, time2)#兩者中較大的延遲時(shí)間用作網(wǎng)絡(luò)延遲的閾值#add total latencies due to network based on number of u1 and u2if (tm <= latency_threshold): #用戶數(shù)量的變化對(duì)應(yīng)的獎(jiǎng)勵(lì),以及動(dòng)作本身的基礎(chǔ)獎(jiǎng)勵(lì)(u1 + u2)return 0.01*(gs1 - state[4]) + 0.01*(gs2 - state[5]) + u1 + u2 else:return -5 - u1 - u2 #get to some random state after taking an actiondef get_random_state(self):#generate state randomlygram = np.random.choice(self.ram, 1)[0]gcores = np.random.choice(self.cores, 1)[0]gwl_c = np.random.choice(self.workload_cpu, 1)[0]#fetch gamma for the statefetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) ]gwl_g = fetch_state.sample().iloc[0]['workload_gpu'] #fetch workload randmolygs1 = random.randrange(50, 550, 50)gs2 = random.randrange(50, 550, 50)return np.array( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )
from stable_baselines3.common.monitor import Monitor
import os
# Create log dir
log_dir = './agent_tensorboard/'
os.makedirs(log_dir, exist_ok=True)env = Monitor(env, log_dir)from stable_baselines3 import DQN
from stable_baselines3.dqn import MlpPolicy
from stable_baselines3.common.vec_env import DummyVecEnv# wrap it非向量化的環(huán)境 env 轉(zhuǎn)換為一個(gè)向量化的環(huán)境 env
env = DummyVecEnv([lambda: env])
?
訓(xùn)練?
model = DQN(MlpPolicy, env, verbose=0, tensorboard_log = log_dir, exploration_fraction=0.4, learning_starts=150000, train_freq=30, target_update_interval=30000, exploration_final_eps=0.07)begin = time.time()
model.learn(total_timesteps=500000) #reset_num_timesteps=False
end = time.time()
training_time = end-begin
?RL Allocation Algorithm代碼
#Load model
def rl_algo():#對(duì)于每臺(tái)服務(wù)器,使用RL預(yù)測(cè)每個(gè)服務(wù)器的容量server_capacity = np.zeros((N, S))for server_id in range(N):state = server_state[server_id]
# if model_type == 'lin':action = model_rl.predict(np.array(state), deterministic=True)
# if model_type == 'exp':
# action = model_exp.predict(np.array(state), deterministic=True)
# 根據(jù)action計(jì)算兩種服務(wù)的預(yù)測(cè)容量 (u1 和 u2)u1 = action[0]//5 + 1u2 = (action[0]+1) - (u1-1)*5server_capacity[server_id][0] = u1*100 #model outputserver_capacity[server_id][1] = u2*100 #model output#根據(jù)每個(gè)用戶的服務(wù)器數(shù)進(jìn)行排序col1 = np.array([np.sum(ngb,axis=1)])col2 = np.array([np.arange(U)])sorted_ngb = np.concatenate((ngb, col1.T, col2.T), axis=1) #add rowsum and index column添加行和索引列sorted_ngb = sorted_ngb[np.argsort(sorted_ngb[:, N])] #sort the rows based on rowsum column根據(jù)行和列對(duì)行進(jìn)行排序#分配用戶到服務(wù)器#run allocation algorithmrl_allocation = []
# 遍歷用戶,根據(jù)用戶連接的服務(wù)器列表和服務(wù)請(qǐng)求,選擇最大預(yù)測(cè)容量的服務(wù)器分配。服務(wù)器有足夠容量則更新服務(wù)器容量并記錄分配結(jié)果for i in range(U):server_list = np.where(sorted_ngb[i, :N] == 1)[0] #獲取用戶連接到的服務(wù)器列表if len(server_list) == 0: #跳過沒有服務(wù)器的用戶continueser = int(service[i]) #用戶正在請(qǐng)求哪個(gè)服務(wù)choosen_server = server_list[np.argmax(server_capacity[server_list, ser])] #找到所選服務(wù)器的 IDif server_capacity[choosen_server][ser] > 0: #將用戶分配給choosen_serverserver_capacity[choosen_server][ser] -= 1 #減少服務(wù)器容量rl_allocation.append( (int(sorted_ngb[i, N+1]), choosen_server) ) #(user, server) alloc pairprint('RL Num of allocation: {}'.format(len(rl_allocation)))return rl_allocation
IV.確定性方法作為BASELINE
使用歷史服務(wù)執(zhí)行數(shù)據(jù)的平均值確定邊緣服務(wù)器上服務(wù)
的執(zhí)行時(shí)間
,進(jìn)而確定可以分配到邊緣服務(wù)器的用戶數(shù)量的相應(yīng)代碼:allocation.ipynb? def generate_server_state(num_server)
#獲取與選擇的 GPU 工作負(fù)載值匹配的行 計(jì)算YOLO 和 MNet 的平均時(shí)間time_yolo = fetch_time['time_yolo'].mean() #average of time for particular statetime_mnet = fetch_time['time_mnet'].mean()# 根據(jù)每個(gè)服務(wù)器的服務(wù)請(qǐng)求分配狀態(tài)gs1 = server_service[s_id][0]gs2 = server_service[s_id][1] server_state.append( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )# 追加每個(gè)服務(wù)器的 gamma 值gamma.append((time_yolo, time_mnet)) #append the gamma value of each server
?ILP Algorithm整數(shù)線性規(guī)劃(ILP)算法代碼
解決用戶分配到服務(wù)器的問題,目標(biāo)是最大化覆蓋的用戶數(shù)量
def ilp_algo():## ===================================ILP with python mip# >> solver_name=GRB# >> Currently using CBCI = range(U) #user 用戶的范圍J = range(N) #server服務(wù)器的范圍alloc = Model(sense=MAXIMIZE, name="alloc", solver_name=CBC)alloc.verbose = 0def coverage(user_ix, server_ix):if ngb[user_ix][server_ix]==1:return 1else:return 0#U: num of users, N: num of servers# 創(chuàng)建二進(jìn)制變量矩陣 x,其中 x[i][j] 表示用戶 i 是否被分配到服務(wù)器 jx = [[ alloc.add_var(f"x{i}{j}", var_type=BINARY) for j in J] for i in I]#Objective Equation# 目標(biāo)函數(shù):最大化分配的用戶數(shù)量alloc.objective = xsum( x[i][j] for i in I for j in J )#1. 覆蓋約束for i in I:for j in J: if not coverage(i,j):alloc += x[i][j] == 0# 2. 每個(gè)用戶只能被分配到一個(gè)服務(wù)器for i in I:alloc += xsum( x[i][j] for j in J ) <=1# 3. 延遲約束for j in J:alloc += xsum( gamma[j][int(service[i])]*x[i][j] for i in I ) <=latency_threshold-network_latency[j] #alloc.write("test-model.lp")#===========Start Optimization=========alloc.optimize(max_seconds=25)# 優(yōu)化模型#==========ILP Ends here#print(f"Number of Solutions:{qoe.num_solutions}")ilp_allocation = [ (i,j) for i in I for j in J if x[i][j].x >= 0.99] # 獲取分配結(jié)果#print(f"Number of Solutions:{qoe.num_solutions}")#print(f"Objective Value:{qoe.objective_value}")allocated_num_users = len(ilp_allocation)print("ILP Allocated Num of Users: {}".format(allocated_num_users))# 輸出分配的用戶數(shù)量# selected.sort()return ilp_allocation
Nearest Neighbourhood Greedy Algorithm代碼?
def greedy_algo():server_capacity = np.zeros(N)# 初始化服務(wù)器容量數(shù)組rl_allocation = []for user_id in range(U):#獲取與用戶連接的服務(wù)器列表server_ngb_list = np.where(ngb[user_id, :N] == 1)[0] #get the list of server to which user is connectedif len(server_ngb_list) == 0: #ignore the users which are not under any serverscontinue # 計(jì)算每個(gè)用戶到各個(gè)服務(wù)器的距離并排序#find the distance to each users in the server_ngb_listdist_list = np.array([ server_ngb_list, [server.iloc[i]['geometry'].centroid.distance(user.iloc[user_id]['geometry']) for i in server_ngb_list] ])# sorted list of servers based on the distance from userssorted_distance_list = dist_list[ :, dist_list[1].argsort()]#get the list of servers arranged in least to max distanceserver_list = sorted_distance_list[0].astype(int)# 分配算法lat = 0for server_id in server_list:lat = gamma[server_id][int(service[user_id])]#根據(jù)用戶請(qǐng)求的服務(wù)類型和服務(wù)器,獲取相應(yīng)的服務(wù)延遲if server_capacity[server_id]+lat <= latency_threshold-network_latency[server_id]:server_capacity[server_id] += lat #increment the server_capacity of serverrl_allocation.append( (user_id, server_id) ) #(user, server) alloc pairbreakprint('Greedy-Ngb Num of allocation: {}'.format(len(rl_allocation)))return rl_allocation
整體框架代碼:
固定服務(wù)器數(shù)量更改用戶數(shù)量的情況:
對(duì)于不同用戶數(shù)量,先拿到用戶和服務(wù)器之間的鄰居矩陣ngb,并計(jì)算網(wǎng)絡(luò)延遲network_latency
之后生成服務(wù)器狀態(tài)server_state 計(jì)算每個(gè)服務(wù)器的gamma 值?generate_server_state(num_server)
之后分別使用三種算法計(jì)算成功分配的數(shù)量
if alloc_type == 'server': #服務(wù)器固定,變化用戶數(shù)量"for U in range(100, 600, 100):#用戶數(shù)量100-500for epoch in range(50):print("User:", U, 'Server:', N, 'Epoch:', epoch)ngb, user, server, service, server_service, network_latency = ngb_matrix(U, N, S) #從EUA數(shù)據(jù)生成服務(wù)器和用戶 # 確定鄰域矩陣server_state, gamma = generate_server_state(N) #為每個(gè)用戶分配狀態(tài)和γ值#=======ILP startsstart = 0stop = 0execution_time_ilp = 0start = timeit.default_timer()ilp_aloc = ilp_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_ilp = stop - start#========ILP ends#=======Greedy startsstart = 0stop = 0execution_time_greedy = 0start = timeit.default_timer()greedy_aloc = greedy_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_greedy = stop - start#========Greedy ends#=======RL_linear startsstart = 0stop = 0execution_time_rl = 0start = timeit.default_timer()rl_aloc = rl_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_rl = stop - start#========RL_linear ends#========Store results to fileto_append = [U, N,len(ilp_aloc), execution_time_ilp,len(greedy_aloc), execution_time_greedy,len(rl_aloc), execution_time_rl,] dseries = pd.Series(to_append, index = result_user.columns)result_user = result_user.append(dseries, ignore_index=True)print("epoch:", epoch)result_user.to_csv(result_file, index=False)
框架中用到的函數(shù):
一、生成服務(wù)器狀態(tài)計(jì)算每個(gè)服務(wù)器的 gamma 值
def generate_server_state(num_server):#生成服務(wù)器狀態(tài)計(jì)算每個(gè)服務(wù)器的 gamma 值df = pd.read_csv(filename_base)# 將 GPU 工作負(fù)載的數(shù)值進(jìn)行縮放和四舍五入
# df['ram'] = df['ram'].div(1000).round(0).astype(int)
# df['workload_cpu'] = df['workload_cpu'].div(10).round(0).astype(int)df['workload_gpu'] = df['workload_gpu'].multiply(1/80).round(0).astype(int) #round gpu workload
# df['users_yolo'] = df['users_yolo'].div(100).round(0).astype(int)
# df['users_mnet'] = df['users_mnet'].div(100).round(0).astype(int)#get unique data in set獲取數(shù)據(jù)集中唯一的 RAM、核心數(shù)和 CPU 工作負(fù)載值ram = df.ram.unique()cores = df.cores.unique()workload_cpu = df.workload_cpu.unique()server_state = []#服務(wù)器狀態(tài)gamma = []for s_id in range(num_server):#對(duì)于每一個(gè)服務(wù)器,隨機(jī)選擇一個(gè) RAM、核心數(shù)和 CPU 工作負(fù)載值gram = np.random.choice(ram, 1)[0]gcores = np.random.choice(cores, 1)[0]gwl_c = np.random.choice(workload_cpu, 1)[0]#fetch gamma for the state獲取對(duì)應(yīng)狀態(tài)的行fetch_state = df.loc[ (df['ram'] == gram) & (df['cores']== gcores) & (df['workload_cpu']==gwl_c) ]# 從匹配的狀態(tài)中隨機(jī)選擇一個(gè) GPU 工作負(fù)載值gwl_g = fetch_state.sample().iloc[0]['workload_gpu'] #fetch workload randmolyfetch_time = fetch_state.loc[ (df['workload_gpu'] == gwl_g) ]#獲取與選擇的 GPU 工作負(fù)載值匹配的行 計(jì)算YOLO 和 MNet 的平均時(shí)間time_yolo = fetch_time['time_yolo'].mean() #average of time for particular statetime_mnet = fetch_time['time_mnet'].mean()# 根據(jù)每個(gè)服務(wù)器的服務(wù)請(qǐng)求分配狀態(tài)gs1 = server_service[s_id][0]gs2 = server_service[s_id][1] server_state.append( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )# 追加每個(gè)服務(wù)器的 gamma 值gamma.append((time_yolo, time_mnet)) #append the gamma value of each serverreturn server_state, gamma
二、計(jì)算鄰接矩陣
#================neighbourhood Computing
def ngb_matrix(U, N, S):
# 生成用戶和服務(wù)器之間的鄰居矩陣,并計(jì)算網(wǎng)絡(luò)延遲 #U: number of users#N: number of servers#S: number of services# U X N matrixuser = load_users(U)server = load_servers(N)neighbourhood = np.zeros([U, N]) #用戶和服務(wù)器之間的鄰居矩陣network_latency = np.zeros(N) #每個(gè)服務(wù)器的網(wǎng)絡(luò)延遲latency_data = load_planetlab() #加載 PlanetLab 數(shù)據(jù),返回一個(gè)距離矩陣(bin size 150)# 檢查每個(gè)用戶是否在服務(wù)器的緩沖區(qū)內(nèi),并計(jì)算網(wǎng)絡(luò)延遲for u in range(0, U):for n in range(0, N):#檢查用戶是否在服務(wù)器的緩沖區(qū)內(nèi)(使用幾何空間的 contains 方法)if server.iloc[n].geometry.contains(user.iloc[u].geometry):neighbourhood[u,n]=1#鄰居矩陣中相應(yīng)位置設(shè)為 1# 計(jì)算距離并分配延遲distance = server.iloc[n].geometry.centroid.distance(user.iloc[u].geometry)rep_lat = fetch_network_lat(int(distance), latency_data) #根據(jù)距離從 latency_data 中獲取網(wǎng)絡(luò)延遲if network_latency[n] < rep_lat:#最大可能延遲network_latency[n] = rep_latelse:neighbourhood[u,n]=0service = np.zeros(U)#為用戶分配服務(wù)請(qǐng)求for u in range(0, U):#為每個(gè)用戶隨機(jī)分配一個(gè)從 0 到 S-1 的服務(wù)請(qǐng)求service[u] = random.randrange(0, S, 1) server_service = np.zeros((N, S))#統(tǒng)計(jì)每個(gè)服務(wù)器的服務(wù)請(qǐng)求數(shù)量for n in range(0, N):for u in range(0, U):if neighbourhood[u][n] == 1:server_service[n][int(service[u])] += 1return neighbourhood, user, server, service, server_service, network_latency
三、計(jì)算鄰接矩陣中用到的函數(shù)
#================Load Planet Lab data
# 加載 PlanetLab 數(shù)據(jù)并轉(zhuǎn)換為一個(gè)矩陣格式
def load_planetlab():#Convert to triangleldata = np.loadtxt('eua/PlanetLabData_1')[np.tril_indices(490)]ldata = ldata[ ldata != 0]#提取下三角矩陣的非零值ldata =np.unique(ldata)#去重并重置數(shù)據(jù)大小,使其符合150行的矩陣格式np.set_printoptions(suppress=True)length = ldata.shape[0]latency_row = 150latency_col = (length//latency_row) #Global Data usedldata = np.resize(ldata, latency_col*latency_row)latency = ldata.reshape(latency_row,-1)return latency#=================Fetch Network latency
# 根據(jù)距離從延遲數(shù)據(jù)中獲取網(wǎng)絡(luò)延遲
def fetch_network_lat(distance, latency_data):rep_lat = np.random.choice(latency_data[distance], size=1, replace=True)#根據(jù)距離從延遲數(shù)據(jù)中隨機(jī)選擇一個(gè)延遲值return rep_lat/1000 #將延遲值轉(zhuǎn)換為秒#===============User Data
# 加載用戶數(shù)據(jù)并轉(zhuǎn)換為地理數(shù)據(jù)格式
def load_users(num_of_users):user_raw = pd.read_csv("eua/users.csv")user_raw = user_raw.rename_axis("UID")#將數(shù)據(jù)框的索引軸重命名為 "UID",即用戶的唯一標(biāo)識(shí)符df = user_raw.sample(num_of_users)#隨機(jī)抽樣指定數(shù)量的用戶數(shù)據(jù)
# 創(chuàng)建地理數(shù)據(jù)框,使用Longitude和Latitude列創(chuàng)建點(diǎn)幾何對(duì)象,并轉(zhuǎn)換坐標(biāo)參考系統(tǒng)(CRS)gdf = geopandas.GeoDataFrame(df, geometry = geopandas.points_from_xy(df.Longitude, df.Latitude), crs = 'epsg:4326')#創(chuàng)建地理數(shù)據(jù)框user = gdf [['geometry']] #保留geometry列user = user.to_crs(epsg=28355) #指定數(shù)據(jù)的坐標(biāo)參考系統(tǒng)(WGS84投影)#Insert additional data to dataframe#user = user.apply(add_data, axis=1)return user#================Server Data
def load_servers(num_of_servers):# 加載服務(wù)器數(shù)據(jù),并將其轉(zhuǎn)換為地理數(shù)據(jù)格式server_raw = pd.read_csv("eua/servers.csv")server_raw = server_raw.rename_axis("SID")#將數(shù)據(jù)框的索引軸重命名為 "SID",即服務(wù)器的唯一標(biāo)識(shí)符df = server_raw.sample(num_of_servers) #隨機(jī)抽樣指定數(shù)量的服務(wù)器gdf = geopandas.GeoDataFrame(df, geometry = geopandas.points_from_xy(df.LONGITUDE, df.LATITUDE), crs = 'epsg:4326')#創(chuàng)建地理數(shù)據(jù)框server = gdf [['geometry']] #Keep Geometry columnserver = server.to_crs(epsg=28355) #Cover to crs in Australian EPSGdef add_radius(series):
# radius = random.randrange(150, 250, 10)
# 為每個(gè)服務(wù)器添加一個(gè)固定半徑的緩沖區(qū)radius = 150 #每個(gè)服務(wù)器的緩沖區(qū)半徑設(shè)為固定值 150series.geometry = series.geometry.buffer(radius)series['radius'] = radius
# series['resource'] = tcompreturn seriesserver = server.apply(add_radius, axis = 1)return server#繪制用戶和服務(wù)器數(shù)據(jù)在地圖上的分布情況
def plot_data(user, server):%config InlineBackend.figure_format='retina'%matplotlib inlinecbd = geopandas.read_file('eua/maps', crs = {'init': 'epsg=28355'} ) #read cbd-australia location datafig, ax = plt.subplots(1, 1, figsize=(15,10))ax.set_aspect('equal')ax.set_xlim(319400, 322100)ax.set_ylim(5811900, 5813700)user.plot(ax=ax, marker='o', color='red', markersize=20, zorder=3, label="users")server.plot(ax =ax, linestyle='dashed', edgecolor='green', linewidth=1, facecolor="none", zorder=1)server.centroid.plot(ax=ax, marker='s', color='blue', markersize=50, zorder=2, label="server")cbd.plot(ax=ax, color='grey', zorder=0, alpha = 0.3);ax.set_title("MEC Environment(EUA): CBD Melbourne(Australia)")ax.legend(bbox_to_anchor=(1, 0), loc='lower left')plt.show()
運(yùn)行結(jié)果:
對(duì)比實(shí)驗(yàn):
一、對(duì)于RL算法 使用不同訓(xùn)練回合數(shù):
rl_algo_prop()和rl_algo_und()只是用了兩個(gè)不同程度訓(xùn)練的agent模型,其余部分完全一致,這里只展示rl_algo_prop()。
分別是訓(xùn)練30,000回合的RL Agent生成的分配數(shù)量和訓(xùn)練1,50,000回合
動(dòng)作空間的量化大小= 2
model_und = DQN.load("trained_agents/edge_agent_under_train")
model_prop = DQN.load("trained_agents/edge_agent_proper_train")
#Load model
def rl_algo_prop():#...同rl_algo()#換個(gè)模型就OK action = model_prop.predict(np.array(state), deterministic=True)print('Actionprop: {}'.format(action))u1 = (action[0]//10)*2 + 1u2 = (action[0]%10)*2 + 1server_capacity[server_id][0] = u1 #model outputserver_capacity[server_id][1] = u2 #model output#...同rl_algo()
二、對(duì)于RL算法 使用不同量化因子:
其余兩種只在模型和和動(dòng)作空間映射改變
rl_algo_act() 代碼中說是=5:
action = model_act.predict(np.array(state), deterministic=True)print('Actionact: {}'.format(action))u1 = (action[0]//5)*4 + 1 #25 action spaceu2 = (action[0]%5)*4 + 1server_capacity[server_id][0] = u1 #model outputserver_capacity[server_id][1] = u2 #model output
?rl_algo_thres10()代碼中說是=100:
action = model_thres10.predict(np.array(state), deterministic=True)print('Actionthres10: {}'.format(action))u1 = action[0]//5 + 1u2 = (action[0]+1) - (u1-1)*5server_capacity[server_id][0] = u1*100 #model outputserver_capacity[server_id][1] = u2*100 #model output
對(duì)于訓(xùn)練不同回合數(shù)rl_algo_prop中動(dòng)作空間的量化大小= 2的個(gè)人理解,不一定對(duì):
每次agent預(yù)測(cè)出action之后,從中還原出兩個(gè)服務(wù)s1、s2上的服務(wù)請(qǐng)求數(shù)(動(dòng)作)使用的方法不同,= 2時(shí)的映射方法如下,輸出action
for action in range(25):#rl_algo_prop()u1 = (action//10)*2 + 1u2 = (action%10)*2 + 1print(f"Action: {action}, u1: {u1}, u2: {u2}")
Action: 0, u1: 1, u2: 1
Action: 1, u1: 1, u2: 3
Action: 2, u1: 1, u2: 5
Action: 3, u1: 1, u2: 7
Action: 4, u1: 1, u2: 9
Action: 5, u1: 1, u2: 11
Action: 6, u1: 1, u2: 13
Action: 7, u1: 1, u2: 15
Action: 8, u1: 1, u2: 17
Action: 9, u1: 1, u2: 19
Action: 10, u1: 3, u2: 1
Action: 11, u1: 3, u2: 3
Action: 12, u1: 3, u2: 5
Action: 13, u1: 3, u2: 7
Action: 14, u1: 3, u2: 9
Action: 15, u1: 3, u2: 11
Action: 16, u1: 3, u2: 13
Action: 17, u1: 3, u2: 15
Action: 18, u1: 3, u2: 17
Action: 19, u1: 3, u2: 19
Action: 20, u1: 5, u2: 1
Action: 21, u1: 5, u2: 3
Action: 22, u1: 5, u2: 5
Action: 23, u1: 5, u2: 7
Action: 24, u1: 5, u2: 9
文章提到:使用大小為的量化減少動(dòng)作空間的基數(shù),
=10時(shí)新的動(dòng)作元組 (2,2) 表示舊動(dòng)作空間中范圍 (11 - 20,11 - 20) 中的所有動(dòng)作
于是我暫且認(rèn)為=2指的是,量化后的動(dòng)作空間中的一個(gè)動(dòng)作,代表原來動(dòng)作空間中的兩個(gè)動(dòng)作,也就是,第一個(gè)動(dòng)作中的 u1: 1是我們選擇來代表原來動(dòng)作空間中u1: 1、u1: 2;同理u2:1代表u2:1、u2:2。下一個(gè)動(dòng)作就從3開始。
但無法解釋后邊=5,
=100情況:
=5:
for action in range(25):#rl_algo_act()u1 = (action//5)*4 + 1u2 = (action%5)*4 + 1print(f"Action: {action+1}, u1: {u1}, u2: {u2}")
Action: 0, u1: 1, u2: 1
Action: 1, u1: 1, u2: 5
Action: 2, u1: 1, u2: 9
Action: 3, u1: 1, u2: 13
Action: 4, u1: 1, u2: 17
Action: 5, u1: 5, u2: 1
Action: 6, u1: 5, u2: 5
Action: 7, u1: 5, u2: 9
Action: 8, u1: 5, u2: 13
Action: 9, u1: 5, u2: 17
Action: 10, u1: 9, u2: 1
Action: 11, u1: 9, u2: 5
Action: 12, u1: 9, u2: 9
Action: 13, u1: 9, u2: 13
Action: 14, u1: 9, u2: 17
Action: 15, u1: 13, u2: 1
Action: 16, u1: 13, u2: 5
Action: 17, u1: 13, u2: 9
Action: 18, u1: 13, u2: 13
Action: 19, u1: 13, u2: 17
Action: 20, u1: 17, u2: 1
Action: 21, u1: 17, u2: 5
Action: 22, u1: 17, u2: 9
Action: 23, u1: 17, u2: 13
Action: 24, u1: 17, u2: 17
?
=100:
for action in range(25):#rl_algo_thres10u1 = action//5 + 1u2 = (action+1) - (u1-1)*5print(f"Action: {action+1}, u1: {u1}, u2: {u2}")
?Action: 1, u1: 1, u2: 1
Action: 2, u1: 1, u2: 2
Action: 3, u1: 1, u2: 3
Action: 4, u1: 1, u2: 4
Action: 5, u1: 1, u2: 5
Action: 6, u1: 2, u2: 1
Action: 7, u1: 2, u2: 2
Action: 8, u1: 2, u2: 3
Action: 9, u1: 2, u2: 4
Action: 10, u1: 2, u2: 5
Action: 11, u1: 3, u2: 1
Action: 12, u1: 3, u2: 2
Action: 13, u1: 3, u2: 3
Action: 14, u1: 3, u2: 4
Action: 15, u1: 3, u2: 5
Action: 16, u1: 4, u2: 1
Action: 17, u1: 4, u2: 2
Action: 18, u1: 4, u2: 3
Action: 19, u1: 4, u2: 4
Action: 20, u1: 4, u2: 5
Action: 21, u1: 5, u2: 1
Action: 22, u1: 5, u2: 2
Action: 23, u1: 5, u2: 3
Action: 24, u1: 5, u2: 4
Action: 25, u1: 5, u2: 5
暫且存疑
V.實(shí)驗(yàn)與結(jié)果分析
B. Experimental Results