中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

當前位置: 首頁 > news >正文

wordpress修改主題文件做seo需要哪些知識

wordpress修改主題文件,做seo需要哪些知識,做網(wǎng)站稅費,中國最大的網(wǎng)絡(luò)公司排名GPU-Puzzles項目可以讓你學習到GPU編程和cuda核心并行編程的概念,通過一個個小問題讓你理解cuda的編程和調(diào)用,創(chuàng)建共享顯存空間,實現(xiàn)卷積和矩陣乘法等,通過每個小問題之后還會獎勵一個狗狗小視頻😁 下面是項目的倉庫&…

GPU-Puzzles項目可以讓你學習到GPU編程和cuda核心并行編程的概念,通過一個個小問題讓你理解cuda的編程和調(diào)用,創(chuàng)建共享顯存空間,實現(xiàn)卷積和矩陣乘法等,通過每個小問題之后還會獎勵一個狗狗小視頻😁

下面是項目的倉庫:https://github.com/srush/GPU-Puzzlesicon-default.png?t=O83Ahttps://github.com/srush/GPU-Puzzles

我本人也是做完了所有的puzzles,特地做一份講解供大家參考

Puzzle 1: Map?

Implement a "kernel" (GPU function) that adds 10 to each position of vector?a?and stores it in vector?out. You have 1 thread per position.

題目的目的是讓out輸出為a中所有元素+10

def map_spec(a):return a + 10def map_test(cuda):def call(out, a) -> None:local_i = cuda.threadIdx.x# FILL ME IN (roughly 1 lines)out[local_i] = a[local_i] + 10return callSIZE = 4
out = np.zeros((SIZE,))
a = np.arange(SIZE)
problem = CudaProblem("Map", map_test, [a], out, threadsperblock=Coord(SIZE, 1), spec=map_spec
)
problem.show()

這里不能完全用Python代碼的思想去閱讀,函數(shù)每次調(diào)用cuda.threadId.x的時候都會取一個新的核心,實現(xiàn)并行的效果,下面是可視化運行效果

# MapScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             0 |             0 | 

Puzzle 2 - Zip

Implement a kernel that adds together each position of?a?and?b?and stores it in?out. You have 1 thread per position.

在out中每個元素是a,b向量中對應位置元素和,這里不需要什么額外操作,直接local_i作為對應位置的索引即可

def zip_spec(a, b):return a + bdef zip_test(cuda):def call(out, a, b) -> None:local_i = cuda.threadIdx.x# FILL ME IN (roughly 1 lines)out[local_i] = a[local_i] + b[local_i]return callSIZE = 4
out = np.zeros((SIZE,))
a = np.arange(SIZE)
b = np.arange(SIZE)
problem = CudaProblem("Zip", zip_test, [a, b], out, threadsperblock=Coord(SIZE, 1), spec=zip_spec
)
problem.show()

可視化效果


# ZipScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             2 |             1 |             0 |             0 | 

Puzzle 3 - Guards?

Implement a kernel that adds 10 to each position of?a?and stores it in?out. You have more threads than positions.

這里是Map的升級版,我們擁有更多的cuda線程,但是這不影響,利用if判斷使local_i索引有效即可

def map_guard_test(cuda):def call(out, a, size) -> None:local_i = cuda.threadIdx.x# FILL ME IN (roughly 2 lines)if (local_i < size):out[local_i] = a[local_i] + 10return callSIZE = 4
out = np.zeros((SIZE,))
a = np.arange(SIZE)
problem = CudaProblem("Guard",map_guard_test,[a],out,[SIZE],threadsperblock=Coord(8, 1),spec=map_spec,
)
problem.show()

可視化效果

# GuardScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             0 |             0 | 

Puzzle 4 - Map 2D

Implement a kernel that adds 10 to each position of?a?and stores it in?out. Input?a?is 2D and square. You have more threads than positions.

這里更進了一步,教我們cuda可以創(chuàng)建二維的線程,我們創(chuàng)建了多余的線程,需要更多邊界判斷

def map_2D_test(cuda):def call(out, a, size) -> None:local_i = cuda.threadIdx.xlocal_j = cuda.threadIdx.y# FILL ME IN (roughly 2 lines)if local_i < size and local_j < size:out[local_i, local_j] = a[local_i, local_j]+1return callSIZE = 2
out = np.zeros((SIZE, SIZE))
a = np.arange(SIZE * SIZE).reshape((SIZE, SIZE))
problem = CudaProblem("Map 2D", map_2D_test, [a], out, [SIZE], threadsperblock=Coord(3, 3), spec=map_spec
)
problem.show()

可視化效果

# Map 2DScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             0 |             0 | 

Puzzle 5 - Broadcast

Implement a kernel that adds?a?and?b?and stores it in?out. Inputs?a?and?b?are vectors. You have more threads than positions.

其實就是前面zip的二維版本,對行向量和列向量相同索引位置相加,線程比實際矩陣大,要注意邊界

def broadcast_test(cuda):def call(out, a, b, size) -> None:local_i = cuda.threadIdx.xlocal_j = cuda.threadIdx.y# FILL ME IN (roughly 2 lines)if local_i < size and local_j < size:out[local_i, local_j] = a[local_i, 0] + b[0, local_j]return callSIZE = 2
out = np.zeros((SIZE, SIZE))
a = np.arange(SIZE).reshape(SIZE, 1)
b = np.arange(SIZE).reshape(1, SIZE)
problem = CudaProblem("Broadcast",broadcast_test,[a, b],out,[SIZE],threadsperblock=Coord(3, 3),spec=zip_spec,
)
problem.show()

可視化效果

# BroadcastScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             2 |             1 |             0 |             0 | 

Puzzle 6 - Blocks

Implement a kernel that adds 10 to each position of?a?and stores it in?out. You have fewer threads per block than the size of?a.

這里和前面不同,每一塊中線程數(shù)比矩陣要小,但是這也不影響,因為把塊是把線程分組了,每個塊有一定數(shù)量的線程,程序會循環(huán)遍歷取出一個個線程塊,判斷好邊界條件即可

def map_block_test(cuda):def call(out, a, size) -> None:i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x# FILL ME IN (roughly 2 lines)if i < size:out[i] = a[i] + 10return callSIZE = 9
out = np.zeros((SIZE,))
a = np.arange(SIZE)
problem = CudaProblem("Blocks",map_block_test,[a],out,[SIZE],threadsperblock=Coord(4, 1),blockspergrid=Coord(3, 1),spec=map_spec,
)
problem.show()

可視化效果

# BlocksScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             0 |             0 | 

Puzzle 7 - Blocks 2D?

Implement the same kernel in 2D. You have fewer threads per block than the size of?a?in both directions.

這里線程矩陣比實際的矩陣大小要小,但是沒關(guān)系,設(shè)置好邊界條件直接遍歷即可

def map_block2D_test(cuda):def call(out, a, size) -> None:i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x# FILL ME IN (roughly 4 lines)j = cuda.blockIdx.y * cuda.blockDim.y + cuda.threadIdx.yif i < size and j < size:out[i, j] = a[i, j] + 10return callSIZE = 5
out = np.zeros((SIZE, SIZE))
a = np.ones((SIZE, SIZE))problem = CudaProblem("Blocks 2D",map_block2D_test,[a],out,[SIZE],threadsperblock=Coord(3, 3),blockspergrid=Coord(2, 2),spec=map_spec,
)
problem.show()

可視化效果

# Blocks 2DScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             0 |             0 | 

Puzzle 8 - Shared?

Implement a kernel that adds 10 to each position of?a?and stores it in?out. You have fewer threads per block than the size of?a.

這里是利用共享內(nèi)存的教學,cuda申請共享內(nèi)存只能用靜態(tài)變量

TPB = 4
def shared_test(cuda):def call(out, a, size) -> None:shared = cuda.shared.array(TPB, numba.float32)i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.xlocal_i = cuda.threadIdx.xif i < size:shared[local_i] = a[i]cuda.syncthreads()# FILL ME IN (roughly 2 lines)if I < size:out[i] = shared[local_i] + 10return callSIZE = 8
out = np.zeros(SIZE)
a = np.ones(SIZE)
problem = CudaProblem("Shared",shared_test,[a],out,[SIZE],threadsperblock=Coord(TPB, 1),blockspergrid=Coord(2, 1),spec=map_spec,
)
problem.show()

可視化效果

# SharedScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             1 |             1 | 

Puzzle 9 - Pooling

Implement a kernel that sums together the last 3 position of?a?and stores it in?out. You have 1 thread per position. You only need 1 global read and 1 global write per thread.

這里教會我們要控制全局讀寫次數(shù),如果把需要重復讀的內(nèi)容移動到共享內(nèi)存,可以提升效率

def pool_spec(a):out = np.zeros(*a.shape)for i in range(a.shape[0]):out[i] = a[max(i - 2, 0) : i + 1].sum()return outTPB = 8
def pool_test(cuda):def call(out, a, size) -> None:shared = cuda.shared.array(TPB, numba.float32)i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.xlocal_i = cuda.threadIdx.x# FILL ME IN (roughly 8 lines)if i < size:shared[local_i] = a[i]cuda.syncthreads()if i == 0:out[i] = shared[local_i]elif i == 1:out[i] = shared[local_i] + shared[local_i - 1]else:out[i] = shared[local_i] + shared[local_i - 1] + shared[local_i - 2]return callSIZE = 8
out = np.zeros(SIZE)
a = np.arange(SIZE)
problem = CudaProblem("Pooling",pool_test,[a],out,[SIZE],threadsperblock=Coord(TPB, 1),blockspergrid=Coord(1, 1),spec=pool_spec,
)
problem.show()

可視化效果

# PoolingScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             1 |             1 |             3 |             1 | 

Puzzle 10 - Dot Product

Implement a kernel that computes the dot-product of?a?and?b?and stores it in?out. You have 1 thread per position. You only need 2 global reads and 1 global write per thread.

這里是手動實現(xiàn)向量點乘,我們可以先把結(jié)果暫存到一個共享內(nèi)存中,然后最后用一個線程統(tǒng)一計算共享內(nèi)存的數(shù)據(jù)并輸出到結(jié)果,這樣可以控制全局讀寫次數(shù)

def dot_spec(a, b):return a @ bTPB = 8
def dot_test(cuda):def call(out, a, b, size) -> None:shared = cuda.shared.array(TPB, numba.float32)i = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.xlocal_i = cuda.threadIdx.x# FILL ME IN (roughly 9 lines)if i < size:shared[local_i] = a[i] * b[i]cuda.syncthreads()if local_i == 0:total = 0.0for j in range(TPB):total += shared[j]out[0] = totalreturn callSIZE = 8
out = np.zeros(1)
a = np.arange(SIZE)
b = np.arange(SIZE)
problem = CudaProblem("Dot",dot_test,[a, b],out,[SIZE],threadsperblock=Coord(SIZE, 1),blockspergrid=Coord(1, 1),spec=dot_spec,
)
problem.show()

可視化效果

# DotScore (Max Per Thread):|  Global Reads | Global Writes |  Shared Reads | Shared Writes ||             2 |             1 |             8 |             1 | 

后面還有puzzle11-14,因為難度比較大,思路較為復雜,留在下一篇文章做講解

???????👉🏻GPU Puzzles講解(二)-CSDN博客

http://www.risenshineclean.com/news/2736.html

相關(guān)文章:

  • 重慶企業(yè)vi設(shè)計公司安卓優(yōu)化大師下載安裝
  • 牛網(wǎng)網(wǎng)站建設(shè)互聯(lián)網(wǎng)營銷推廣怎么做
  • 蘇州市住房和城鄉(xiāng)建設(shè)局網(wǎng)站首頁手機如何做網(wǎng)站
  • 查詢建筑企業(yè)網(wǎng)站太原seo快速排名
  • 網(wǎng)站url和網(wǎng)站域名重慶森林經(jīng)典臺詞 鳳梨罐頭
  • 如何下載ppt模板免費短視頻關(guān)鍵詞seo優(yōu)化
  • 武漢市建設(shè)委員會網(wǎng)站如何使用網(wǎng)絡(luò)營銷策略
  • 電器網(wǎng)站建設(shè)策劃書百度信息流推廣技巧
  • 茂名做網(wǎng)站公司外貿(mào)營銷網(wǎng)站建設(shè)介紹
  • 優(yōu)化網(wǎng)站作用永久免費域名申請
  • 手機做網(wǎng)站多少錢中小企業(yè)管理培訓課程
  • 昆明住房和城鄉(xiāng)建設(shè)部網(wǎng)站關(guān)鍵詞制作軟件
  • 免費英文建設(shè)網(wǎng)站企點qq官網(wǎng)
  • 購買手表的網(wǎng)站百度收錄需要多久
  • 網(wǎng)站規(guī)劃明細表昆明seo關(guān)鍵字推廣
  • 如何做 試題類 網(wǎng)站百度競價投放
  • 哪些做展架圖的網(wǎng)站好代運營服務(wù)
  • 網(wǎng)站后臺是怎么做的網(wǎng)頁怎么做出來的
  • 西安網(wǎng)站建設(shè)hyk123網(wǎng)站制作培訓
  • 網(wǎng)站開發(fā)所需費用自學seo能找到工作嗎
  • seo是什么意思網(wǎng)絡(luò)用語seo外包公司哪家好
  • 專做外貿(mào)的網(wǎng)站有哪些免費十八種禁用網(wǎng)站
  • 凡客做網(wǎng)站廣州搜發(fā)網(wǎng)絡(luò)科技有限公司
  • 蘇州好的做網(wǎng)站的公司哪家好seo雙標題軟件
  • 怎樣用jsp做網(wǎng)站微信推廣平臺
  • 網(wǎng)站怎么做模板切換山東免費網(wǎng)絡(luò)推廣工具
  • 網(wǎng)站不備案支付寶接口大二網(wǎng)絡(luò)營銷實訓報告
  • 武漢城市建設(shè)學院網(wǎng)站公司廣告推廣
  • 網(wǎng)站建設(shè)制作作業(yè)外鏈優(yōu)化
  • 做網(wǎng)站為什么需要服務(wù)器關(guān)鍵詞歌詞完整版