做家鄉(xiāng)網(wǎng)站源代碼網(wǎng)站收錄查詢網(wǎng)
pytorch筆記篇:pandas之?dāng)?shù)據(jù)預(yù)處理
- pytorch筆記篇:pandas之?dāng)?shù)據(jù)預(yù)處理(更新中)
- 測試?yán)a
- 相關(guān)的算子
pytorch筆記篇:pandas之?dāng)?shù)據(jù)預(yù)處理(更新中)
測試?yán)a
print(train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])
# (※1) 為什么test_data的列最后不是-1,是因?yàn)閠est_data沒有價(jià)格這個(gè)列項(xiàng)
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))
print('-----------------------------------------------')
print(all_features.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])# (※2) 獲取到不是數(shù)值的列index]
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index# print('++++++++++++++++++++++++')
# (※3) print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# print('----------------------')
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
# print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# input()# (※4) 在標(biāo)準(zhǔn)化數(shù)據(jù)之后,所有均值消失,因此我們可以將缺失值設(shè)置為0
all_features[numeric_features] = all_features[numeric_features].fillna(0)# (※5) dummies & pd to tensor
print('++++++++++ demo test dummies +++++++++++')
test = pd.DataFrame({'“x”':[1,2,3,4,5, 6], "seasion":['here', 'over', '', 'next', '', 'here']})
print(test)
print('-------------------------------')
test = pd.get_dummies(test, dummy_na=True)
print(test)
test = test*1
print(test)
print('++++++++++ test trans to tensor +++++++++++')
# test1 = torch.tensor(test)
# 全部轉(zhuǎn)化
test1 = torch.tensor(test.values, dtype=torch.float32)
print(test1.shape)
print(test1)
print('-------------------------------')
# 不用iloc的話就是光是行處理
test2 = torch.tensor(test[:3].values, dtype=torch.float32)
print(test2.shape)
print(test2)
print('-------------------------------')
# 特定行列轉(zhuǎn)化需要熟練運(yùn)動(dòng)iloc
test3 = torch.tensor(test.iloc[:2, :-1].values, dtype=torch.float32)
print(test3.shape)
print(test3)
input()output-begin:
(1460, 81)
(1459, 80)Id MSSubClass MSZoning LotFrontage SaleType SaleCondition SalePrice
0 1 60 RL 65.0 WD Normal 208500
1 2 20 RL 80.0 WD Normal 181500
2 3 60 RL 68.0 WD Normal 223500
3 4 70 RL 60.0 WD Abnorml 140000
-----------------------------------------------MSSubClass MSZoning LotFrontage LotArea YrSold SaleType SaleCondition
0 60 RL 65.0 8450 2008 WD Normal
1 20 RL 80.0 9600 2007 WD Normal
2 60 RL 68.0 11250 2008 WD Normal
3 70 RL 60.0 9550 2006 WD Abnorml
++++++++++ demo test dummies +++++++++++“x” seasion
0 1 here
1 2 over
2 3
3 4 next
4 5
5 6 here
-------------------------------“x” seasion_ seasion_here seasion_next seasion_over seasion_nan
0 1 False True False False False
1 2 False False False True False
2 3 True False False False False
3 4 False False True False False
4 5 True False False False False
5 6 False True False False False“x” seasion_ seasion_here seasion_next seasion_over seasion_nan
0 1 0 1 0 0 0
1 2 0 0 0 1 0
2 3 1 0 0 0 0
3 4 0 0 1 0 0
4 5 1 0 0 0 0
5 6 0 1 0 0 0
++++++++++ test trans to tensor +++++++++++
torch.Size([6, 6])
tensor([[1., 0., 1., 0., 0., 0.],[2., 0., 0., 0., 1., 0.],[3., 1., 0., 0., 0., 0.],[4., 0., 0., 1., 0., 0.],[5., 1., 0., 0., 0., 0.],[6., 0., 1., 0., 0., 0.]])
-------------------------------
torch.Size([3, 6])
tensor([[1., 0., 1., 0., 0., 0.],[2., 0., 0., 0., 1., 0.],[3., 1., 0., 0., 0., 0.]])
-------------------------------
torch.Size([2, 5])
tensor([[1., 0., 1., 0., 0.],[2., 0., 0., 0., 1.]])
output-end
相關(guān)的算子
concat — 合并.
iloc — 篩選行列.
apply — 處理列數(shù)據(jù).
fillna — 填補(bǔ)數(shù)值空缺.
get_dummies — 獨(dú)熱編碼(自行測試顯示)
無
PS: 略。