當(dāng)前位置：首頁 > news >正文

承德做網(wǎng)站優(yōu)化搜狗網(wǎng)址大全

news 2025/7/5 14:22:43

承德做網(wǎng)站優(yōu)化,搜狗網(wǎng)址大全,做bt網(wǎng)站安全不,網(wǎng)站建設(shè) 鄭州💥今天看一下 PyTorch數(shù)據(jù)通常的處理方法~ 一般我們會將dataset用來封裝自己的數(shù)據(jù)集，dataloader用于讀取數(shù)據(jù) Dataset格式說明 💬dataset定義了這個數(shù)據(jù)集的總長度，以及會返回哪些參數(shù)，模板： from tor…

💥今天看一下 PyTorch數(shù)據(jù)通常的處理方法~

一般我們會將dataset用來封裝自己的數(shù)據(jù)集，dataloader用于讀取數(shù)據(jù)?

Dataset格式說明?

💬dataset定義了這個數(shù)據(jù)集的總長度，以及會返回哪些參數(shù)，模板：

from torch.utils.data import Datasetclass MyDataset(Dataset):def __init__(self, ):# 定義數(shù)據(jù)集包含的數(shù)據(jù)和標(biāo)簽def __len__(self):return len(...)def __getitem__(self, index):# 當(dāng)數(shù)據(jù)集被讀取時，返回一個包含數(shù)據(jù)和標(biāo)簽的元組return self.x_data[index], self.y_data[index]

DataLoader格式說明

my_dataset = DataLoader(mydataset, batch_size=2, shuffle=True,num_workers=4)# num_workers:多進(jìn)程讀取數(shù)據(jù)

導(dǎo)入兩個列表到Dataset

class MyDataset(Dataset):def __init__(self, ):# 定義數(shù)據(jù)集包含的數(shù)據(jù)和標(biāo)簽self.x_data = [i for i in range(10)]self.y_data = [2*i for i in range(10)]def __len__(self):return len(self.x_data)def __getitem__(self, index):# 當(dāng)數(shù)據(jù)集被讀取時，返回一個包含數(shù)據(jù)和標(biāo)簽的元組return self.x_data[index], self.y_data[index]mydataset = MyDataset()
my_dataset = DataLoader(mydataset)for x_i ,y_i in my_dataset:print(x_i,y_i)

💬輸出：

tensor([0]) tensor([0])
tensor([1]) tensor([2])
tensor([2]) tensor([4])
tensor([3]) tensor([6])
tensor([4]) tensor([8])
tensor([5]) tensor([10])
tensor([6]) tensor([12])
tensor([7]) tensor([14])
tensor([8]) tensor([16])
tensor([9]) tensor([18])

💬如果修改batch_size為2，則輸出：

tensor([0, 1]) tensor([0, 2])
tensor([2, 3]) tensor([4, 6])
tensor([4, 5]) tensor([ 8, 10])
tensor([6, 7]) tensor([12, 14])
tensor([8, 9]) tensor([16, 18])

我們可以看出，這是管理每次輸出的批次的
還可以控制用多少個線程來加速讀取數(shù)據(jù)（Num Workers），這參數(shù)和電腦cpu核心數(shù)有關(guān)系，盡量不超過電腦的核心數(shù)

導(dǎo)入Excel數(shù)據(jù)到Dataset中

💥dataset只是一個類，因此數(shù)據(jù)可以從外部導(dǎo)入，我們也可以在dataset中規(guī)定數(shù)據(jù)在返回時進(jìn)行更多的操作，數(shù)據(jù)在返回時也不一定是有兩個。

pip install pandas
pip install openpyxl

class myDataset(Dataset):def __init__(self, data_loc):data = pd.read_ecl(data_loc)self.x1,self.x2,self.x3,self.x4,self.y = data['x1'],data['x2'],data['x3'] ,data['x4'],data['y']def __len__(self):return len(self.x1)def __getitem__(self, idx):return self.x1[idx],self.x2[idx],self.x3[idx],self.x4[idx],self.y[idx]mydataset = myDataset(data_loc='e:\pythonProject Pytorch1\data.xls')
my_dataset = DataLoader(mydataset,batch_size=2)
for x1_i ,x2_i,x3_i,x4_i,y_i in my_dataset:print(x1_i,x2_i,x3_i,x4_i,y_i)