中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

當(dāng)前位置: 首頁(yè) > news >正文

請(qǐng)問(wèn)網(wǎng)上有沒(méi)有比較好的網(wǎng)站可以做照片書的呀?要求質(zhì)量比較好的!品牌推廣方案ppt

請(qǐng)問(wèn)網(wǎng)上有沒(méi)有比較好的網(wǎng)站可以做照片書的呀?要求質(zhì)量比較好的!,品牌推廣方案ppt,電腦怎么做最新系統(tǒng)下載網(wǎng)站,零食天堂 專做零食推薦的網(wǎng)站目錄 介紹 數(shù)據(jù)集 設(shè)置 準(zhǔn)備數(shù)據(jù) 將電影評(píng)分?jǐn)?shù)據(jù)轉(zhuǎn)換為序列 定義元數(shù)據(jù) 創(chuàng)建用于訓(xùn)練和評(píng)估的 tf.data.Dataset 創(chuàng)建模型輸入 輸入特征編碼 創(chuàng)建 BST 模型 開(kāi)展培訓(xùn)和評(píng)估實(shí)驗(yàn) 政安晨的個(gè)人主頁(yè):政安晨 歡迎 👍點(diǎn)贊?評(píng)論?收藏 希望政安晨的…

目錄

介紹

數(shù)據(jù)集

設(shè)置

準(zhǔn)備數(shù)據(jù)

將電影評(píng)分?jǐn)?shù)據(jù)轉(zhuǎn)換為序列

定義元數(shù)據(jù)

創(chuàng)建用于訓(xùn)練和評(píng)估的 tf.data.Dataset

創(chuàng)建模型輸入

輸入特征編碼

創(chuàng)建 BST 模型

開(kāi)展培訓(xùn)和評(píng)估實(shí)驗(yàn)


政安晨的個(gè)人主頁(yè):政安晨

歡迎?👍點(diǎn)贊?評(píng)論?收藏

希望政安晨的博客能夠?qū)δ兴砸?#xff0c;如有不足之處,歡迎在評(píng)論區(qū)提出指正!

本文目標(biāo):在 Movielens 上使用行為序列轉(zhuǎn)換器(BST)模型預(yù)測(cè)評(píng)級(jí)率。

介紹

本示例使用 Movielens 數(shù)據(jù)集演示了陳啟偉等人的行為序列轉(zhuǎn)換器(BST)模型。 BST 模型利用用戶觀看電影和給電影評(píng)分的順序行為,以及用戶資料和電影特征,來(lái)預(yù)測(cè)用戶對(duì)目標(biāo)電影的評(píng)分。

更確切地說(shuō),BST 模型旨在通過(guò)接受以下輸入來(lái)預(yù)測(cè)目標(biāo)電影的評(píng)分:

  1. 用戶觀看過(guò)的電影的固定長(zhǎng)度序列。
  2. 用戶觀看過(guò)的電影評(píng)分的固定長(zhǎng)度序列。
  3. 輸入序列中每部電影和目標(biāo)電影的類型集。
  4. 輸入序列中每部電影和目標(biāo)電影的類型集。
  5. 要預(yù)測(cè)評(píng)分的 target_movie_id。

該示例以下列方式修改了原始 BST 模型:

1. 我們?cè)谔幚磔斎胄蛄兄械拿坎侩娪昂湍繕?biāo)電影的嵌入過(guò)程中都加入了電影特征(流派),而不是將其視為轉(zhuǎn)換層之外的 "其他特征"。

2. 我們利用輸入序列中電影的評(píng)分以及它們?cè)谛蛄兄械奈恢脕?lái)更新它們,然后再將它們輸入自我關(guān)注層。

請(qǐng)注意,本示例應(yīng)在 TensorFlow 2.4 或更高版本中運(yùn)行。

數(shù)據(jù)集

我們使用的是 Movielens 數(shù)據(jù)集的 1M 版本。 該數(shù)據(jù)集包含 6000 名用戶對(duì) 4000 部電影的約 100 萬(wàn)個(gè)評(píng)分,以及一些用戶特征和電影類型。 此外,數(shù)據(jù)集還提供了每個(gè)用戶對(duì)電影評(píng)分的時(shí)間戳,這樣就可以按照 BST 模型的預(yù)期,為每個(gè)用戶創(chuàng)建電影評(píng)分序列。

設(shè)置

import osos.environ["KERAS_BACKEND"] = "tensorflow"import math
from zipfile import ZipFile
from urllib.request import urlretrieveimport keras
import numpy as np
import pandas as pd
import tensorflow as tf
from keras import layers
from keras.layers import StringLookup

準(zhǔn)備數(shù)據(jù)

下載并準(zhǔn)備數(shù)據(jù)框

首先,讓我們下載 movielens 數(shù)據(jù)。

下載的文件夾將包含三個(gè)數(shù)據(jù)文件:users.dat、movies.dat 和 ratings.dat。

urlretrieve("http://files.grouplens.org/datasets/movielens/ml-1m.zip", "movielens.zip")
ZipFile("movielens.zip", "r").extractall()

然后,我們用正確的列名將數(shù)據(jù)加載到 pandas DataFrames 中。

users = pd.read_csv("ml-1m/users.dat",sep="::",names=["user_id", "sex", "age_group", "occupation", "zip_code"],encoding="ISO-8859-1",engine="python",
)ratings = pd.read_csv("ml-1m/ratings.dat",sep="::",names=["user_id", "movie_id", "rating", "unix_timestamp"],encoding="ISO-8859-1",engine="python",
)movies = pd.read_csv("ml-1m/movies.dat",sep="::",names=["movie_id", "title", "genres"],encoding="ISO-8859-1",engine="python",
)

在此,我們進(jìn)行一些簡(jiǎn)單的數(shù)據(jù)處理,以固定列的數(shù)據(jù)類型。

users["user_id"] = users["user_id"].apply(lambda x: f"user_{x}")
users["age_group"] = users["age_group"].apply(lambda x: f"group_{x}")
users["occupation"] = users["occupation"].apply(lambda x: f"occupation_{x}")movies["movie_id"] = movies["movie_id"].apply(lambda x: f"movie_{x}")ratings["movie_id"] = ratings["movie_id"].apply(lambda x: f"movie_{x}")
ratings["user_id"] = ratings["user_id"].apply(lambda x: f"user_{x}")
ratings["rating"] = ratings["rating"].apply(lambda x: float(x))

每部電影都有多種類型。 我們將它們分成電影 DataFrame 中的不同列。

genres = ["Action", "Adventure", "Animation", "Children's", "Comedy", "Crime"]
genres += ["Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical"]
genres += ["Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]for genre in genres:movies[genre] = movies["genres"].apply(lambda values: int(genre in values.split("|")))

將電影評(píng)分?jǐn)?shù)據(jù)轉(zhuǎn)換為序列

首先,我們使用 unix_timestamp 對(duì)評(píng)分?jǐn)?shù)據(jù)進(jìn)行排序,然后按用戶 ID 對(duì)電影 ID 值和評(píng)分值進(jìn)行分組。

ratings_group = ratings.sort_values(by=["unix_timestamp"]).groupby("user_id")ratings_data = pd.DataFrame(data={"user_id": list(ratings_group.groups.keys()),"movie_ids": list(ratings_group.movie_id.apply(list)),"ratings": list(ratings_group.rating.apply(list)),"timestamps": list(ratings_group.unix_timestamp.apply(list)),}
)

現(xiàn)在,讓我們把 movie_ids 列表拆分成一組固定長(zhǎng)度的序列。 我們對(duì)評(píng)分也做同樣的處理。 設(shè)置 sequence_length 變量可改變模型輸入序列的長(zhǎng)度。 您還可以改變 step_size 來(lái)控制為每個(gè)用戶生成的序列數(shù)量。

sequence_length = 4
step_size = 2def create_sequences(values, window_size, step_size):sequences = []start_index = 0while True:end_index = start_index + window_sizeseq = values[start_index:end_index]if len(seq) < window_size:seq = values[-window_size:]if len(seq) == window_size:sequences.append(seq)breaksequences.append(seq)start_index += step_sizereturn sequencesratings_data.movie_ids = ratings_data.movie_ids.apply(lambda ids: create_sequences(ids, sequence_length, step_size)
)ratings_data.ratings = ratings_data.ratings.apply(lambda ids: create_sequences(ids, sequence_length, step_size)
)del ratings_data["timestamps"]

然后,我們對(duì)輸出進(jìn)行處理,使每個(gè)序列在 DataFrame 中都有單獨(dú)的記錄。 此外,我們還將用戶特征與評(píng)分?jǐn)?shù)據(jù)結(jié)合起來(lái)。

ratings_data_movies = ratings_data[["user_id", "movie_ids"]].explode("movie_ids", ignore_index=True
)
ratings_data_rating = ratings_data[["ratings"]].explode("ratings", ignore_index=True)
ratings_data_transformed = pd.concat([ratings_data_movies, ratings_data_rating], axis=1)
ratings_data_transformed = ratings_data_transformed.join(users.set_index("user_id"), on="user_id"
)
ratings_data_transformed.movie_ids = ratings_data_transformed.movie_ids.apply(lambda x: ",".join(x)
)
ratings_data_transformed.ratings = ratings_data_transformed.ratings.apply(lambda x: ",".join([str(v) for v in x])
)del ratings_data_transformed["zip_code"]ratings_data_transformed.rename(columns={"movie_ids": "sequence_movie_ids", "ratings": "sequence_ratings"},inplace=True,
)

在 sequence_length 為 4、step_size 為 2 的情況下,我們最終得到了 498 623 個(gè)序列。 最后,我們將數(shù)據(jù)分成訓(xùn)練和測(cè)試兩個(gè)部分,分別包含 85% 和 15% 的實(shí)例,并將它們存儲(chǔ)到 CSV 文件中。

random_selection = np.random.rand(len(ratings_data_transformed.index)) <= 0.85
train_data = ratings_data_transformed[random_selection]
test_data = ratings_data_transformed[~random_selection]train_data.to_csv("train_data.csv", index=False, sep="|", header=False)
test_data.to_csv("test_data.csv", index=False, sep="|", header=False)

定義元數(shù)據(jù)

CSV_HEADER = list(ratings_data_transformed.columns)CATEGORICAL_FEATURES_WITH_VOCABULARY = {"user_id": list(users.user_id.unique()),"movie_id": list(movies.movie_id.unique()),"sex": list(users.sex.unique()),"age_group": list(users.age_group.unique()),"occupation": list(users.occupation.unique()),
}USER_FEATURES = ["sex", "age_group", "occupation"]MOVIE_FEATURES = ["genres"]

創(chuàng)建用于訓(xùn)練和評(píng)估的 tf.data.Dataset

def get_dataset_from_csv(csv_file_path, shuffle=False, batch_size=128):def process(features):movie_ids_string = features["sequence_movie_ids"]sequence_movie_ids = tf.strings.split(movie_ids_string, ",").to_tensor()# The last movie id in the sequence is the target movie.features["target_movie_id"] = sequence_movie_ids[:, -1]features["sequence_movie_ids"] = sequence_movie_ids[:, :-1]ratings_string = features["sequence_ratings"]sequence_ratings = tf.strings.to_number(tf.strings.split(ratings_string, ","), tf.dtypes.float32).to_tensor()# The last rating in the sequence is the target for the model to predict.target = sequence_ratings[:, -1]features["sequence_ratings"] = sequence_ratings[:, :-1]return features, targetdataset = tf.data.experimental.make_csv_dataset(csv_file_path,batch_size=batch_size,column_names=CSV_HEADER,num_epochs=1,header=False,field_delim="|",shuffle=shuffle,).map(process)return dataset

創(chuàng)建模型輸入

def create_model_inputs():return {"user_id": keras.Input(name="user_id", shape=(1,), dtype="string"),"sequence_movie_ids": keras.Input(name="sequence_movie_ids", shape=(sequence_length - 1,), dtype="string"),"target_movie_id": keras.Input(name="target_movie_id", shape=(1,), dtype="string"),"sequence_ratings": keras.Input(name="sequence_ratings", shape=(sequence_length - 1,), dtype=tf.float32),"sex": keras.Input(name="sex", shape=(1,), dtype="string"),"age_group": keras.Input(name="age_group", shape=(1,), dtype="string"),"occupation": keras.Input(name="occupation", shape=(1,), dtype="string"),}

輸入特征編碼

輸入特征編碼方法的工作原理如下:

每個(gè)分類用戶特征都使用層嵌入(layer.Embedding)編碼,嵌入維度等于特征詞匯量的平方根。

電影序列中的每部電影和目標(biāo)電影都使用層.嵌入編碼,嵌入維度等于電影數(shù)量的平方根。

每部電影的多熱點(diǎn)流派向量與其嵌入向量連接,并使用非線性層.密集處理,以輸出具有相同電影嵌入維度的向量。
將位置嵌入添加到序列中的每個(gè)電影嵌入中,然后乘以評(píng)分序列中的評(píng)分。

將目標(biāo)電影嵌入與序列電影嵌入連接起來(lái),產(chǎn)生一個(gè)張量,其形狀為[批量大小、序列長(zhǎng)度、嵌入大小],正如轉(zhuǎn)換器架構(gòu)的注意層所預(yù)期的那樣。

該方法返回一個(gè)由兩個(gè)元素組成的元組:編碼轉(zhuǎn)換器特征和編碼其他特征。

def encode_input_features(inputs,include_user_id=True,include_user_features=True,include_movie_features=True,
):encoded_transformer_features = []encoded_other_features = []other_feature_names = []if include_user_id:other_feature_names.append("user_id")if include_user_features:other_feature_names.extend(USER_FEATURES)## Encode user featuresfor feature_name in other_feature_names:# Convert the string input values into integer indices.vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]idx = StringLookup(vocabulary=vocabulary, mask_token=None, num_oov_indices=0)(inputs[feature_name])# Compute embedding dimensionsembedding_dims = int(math.sqrt(len(vocabulary)))# Create an embedding layer with the specified dimensions.embedding_encoder = layers.Embedding(input_dim=len(vocabulary),output_dim=embedding_dims,name=f"{feature_name}_embedding",)# Convert the index values to embedding representations.encoded_other_features.append(embedding_encoder(idx))## Create a single embedding vector for the user featuresif len(encoded_other_features) > 1:encoded_other_features = layers.concatenate(encoded_other_features)elif len(encoded_other_features) == 1:encoded_other_features = encoded_other_features[0]else:encoded_other_features = None## Create a movie embedding encodermovie_vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY["movie_id"]movie_embedding_dims = int(math.sqrt(len(movie_vocabulary)))# Create a lookup to convert string values to integer indices.movie_index_lookup = StringLookup(vocabulary=movie_vocabulary,mask_token=None,num_oov_indices=0,name="movie_index_lookup",)# Create an embedding layer with the specified dimensions.movie_embedding_encoder = layers.Embedding(input_dim=len(movie_vocabulary),output_dim=movie_embedding_dims,name=f"movie_embedding",)# Create a vector lookup for movie genres.genre_vectors = movies[genres].to_numpy()movie_genres_lookup = layers.Embedding(input_dim=genre_vectors.shape[0],output_dim=genre_vectors.shape[1],embeddings_initializer=keras.initializers.Constant(genre_vectors),trainable=False,name="genres_vector",)# Create a processing layer for genres.movie_embedding_processor = layers.Dense(units=movie_embedding_dims,activation="relu",name="process_movie_embedding_with_genres",)## Define a function to encode a given movie id.def encode_movie(movie_id):# Convert the string input values into integer indices.movie_idx = movie_index_lookup(movie_id)movie_embedding = movie_embedding_encoder(movie_idx)encoded_movie = movie_embeddingif include_movie_features:movie_genres_vector = movie_genres_lookup(movie_idx)encoded_movie = movie_embedding_processor(layers.concatenate([movie_embedding, movie_genres_vector]))return encoded_movie## Encoding target_movie_idtarget_movie_id = inputs["target_movie_id"]encoded_target_movie = encode_movie(target_movie_id)## Encoding sequence movie_ids.sequence_movies_ids = inputs["sequence_movie_ids"]encoded_sequence_movies = encode_movie(sequence_movies_ids)# Create positional embedding.position_embedding_encoder = layers.Embedding(input_dim=sequence_length,output_dim=movie_embedding_dims,name="position_embedding",)positions = tf.range(start=0, limit=sequence_length - 1, delta=1)encodded_positions = position_embedding_encoder(positions)# Retrieve sequence ratings to incorporate them into the encoding of the movie.sequence_ratings = inputs["sequence_ratings"]sequence_ratings = keras.ops.expand_dims(sequence_ratings, -1)# Add the positional encoding to the movie encodings and multiply them by rating.encoded_sequence_movies_with_poistion_and_rating = layers.Multiply()([(encoded_sequence_movies + encodded_positions), sequence_ratings])# Construct the transformer inputs.for i in range(sequence_length - 1):feature = encoded_sequence_movies_with_poistion_and_rating[:, i, ...]feature = keras.ops.expand_dims(feature, 1)encoded_transformer_features.append(feature)encoded_transformer_features.append(encoded_target_movie)encoded_transformer_features = layers.concatenate(encoded_transformer_features, axis=1)return encoded_transformer_features, encoded_other_features

創(chuàng)建 BST 模型

include_user_id = False
include_user_features = False
include_movie_features = Falsehidden_units = [256, 128]
dropout_rate = 0.1
num_heads = 3def create_model():inputs = create_model_inputs()transformer_features, other_features = encode_input_features(inputs, include_user_id, include_user_features, include_movie_features)# Create a multi-headed attention layer.attention_output = layers.MultiHeadAttention(num_heads=num_heads, key_dim=transformer_features.shape[2], dropout=dropout_rate)(transformer_features, transformer_features)# Transformer block.attention_output = layers.Dropout(dropout_rate)(attention_output)x1 = layers.Add()([transformer_features, attention_output])x1 = layers.LayerNormalization()(x1)x2 = layers.LeakyReLU()(x1)x2 = layers.Dense(units=x2.shape[-1])(x2)x2 = layers.Dropout(dropout_rate)(x2)transformer_features = layers.Add()([x1, x2])transformer_features = layers.LayerNormalization()(transformer_features)features = layers.Flatten()(transformer_features)# Included the other features.if other_features is not None:features = layers.concatenate([features, layers.Reshape([other_features.shape[-1]])(other_features)])# Fully-connected layers.for num_units in hidden_units:features = layers.Dense(num_units)(features)features = layers.BatchNormalization()(features)features = layers.LeakyReLU()(features)features = layers.Dropout(dropout_rate)(features)outputs = layers.Dense(units=1)(features)model = keras.Model(inputs=inputs, outputs=outputs)return modelmodel = create_model()

開(kāi)展培訓(xùn)和評(píng)估實(shí)驗(yàn)

# Compile the model.
model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.01),loss=keras.losses.MeanSquaredError(),metrics=[keras.metrics.MeanAbsoluteError()],
)# Read the training data.
train_dataset = get_dataset_from_csv("train_data.csv", shuffle=True, batch_size=265)# Fit the model with the training data.
model.fit(train_dataset, epochs=5)# Read the test data.
test_dataset = get_dataset_from_csv("test_data.csv", batch_size=265)# Evaluate the model on the test data.
_, rmse = model.evaluate(test_dataset, verbose=0)
print(f"Test MAE: {round(rmse, 3)}")
Epoch 1/51600/1600 ━━━━━━━━━━━━━━━━━━━━ 19s 11ms/step - loss: 1.5762 - mean_absolute_error: 0.9892
Epoch 2/51600/1600 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - loss: 1.1263 - mean_absolute_error: 0.8502
Epoch 3/51600/1600 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - loss: 1.0885 - mean_absolute_error: 0.8361
Epoch 4/51600/1600 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - loss: 1.0943 - mean_absolute_error: 0.8388
Epoch 5/51600/1600 ━━━━━━━━━━━━━━━━━━━━ 17s 10ms/step - loss: 1.0360 - mean_absolute_error: 0.8142
Test MAE: 0.782

測(cè)試數(shù)據(jù)的平均絕對(duì)誤差 (MAE) 應(yīng)該在 0.7 左右。


http://www.risenshineclean.com/news/58043.html

相關(guān)文章:

  • 商城網(wǎng)站開(kāi)發(fā)報(bào)價(jià)深圳網(wǎng)絡(luò)推廣培訓(xùn)機(jī)構(gòu)
  • 申請(qǐng)免費(fèi)建站海外seo培訓(xùn)
  • 信譽(yù)好的揚(yáng)中網(wǎng)站建設(shè)app推廣軟件有哪些
  • 四川建設(shè)廳官方網(wǎng)站文件下載企業(yè)網(wǎng)絡(luò)營(yíng)銷策略
  • p2p網(wǎng)站建設(shè) 上海網(wǎng)店代運(yùn)營(yíng)騙局
  • 做校園網(wǎng)站 怎么備案百度推廣在哪里能看到
  • 網(wǎng)站商城定制網(wǎng)站建設(shè)蘇州seo營(yíng)銷
  • 昆明網(wǎng)站開(kāi)發(fā)多少錢免費(fèi)域名注冊(cè)平臺(tái)
  • 做鞋子有什么好網(wǎng)站好北京seo關(guān)鍵詞排名
  • 關(guān)于做ppt的網(wǎng)站有哪些內(nèi)容杭州百度seo代理
  • 織夢(mèng)網(wǎng)站文章內(nèi)容模板信息發(fā)布推廣平臺(tái)
  • 能訪問(wèn)各種網(wǎng)站的瀏覽器seo是什么意思 seo是什么職位
  • 網(wǎng)站開(kāi)發(fā)tt0546軟文營(yíng)銷的技巧
  • 網(wǎng)站直播間怎么做2023年9月疫情又開(kāi)始了嗎
  • 南寧網(wǎng)絡(luò)系統(tǒng)開(kāi)發(fā)win10優(yōu)化大師是官方的嗎
  • 國(guó)外網(wǎng)站入口錦繡大地seo官網(wǎng)
  • 網(wǎng)站的內(nèi)容有哪些內(nèi)容嗎褲子seo標(biāo)題優(yōu)化關(guān)鍵詞
  • 如何在工商局網(wǎng)站做身份確認(rèn)關(guān)鍵詞搜索熱度查詢
  • 網(wǎng)站制作屬于什么行業(yè)網(wǎng)站seo具體怎么做
  • 銘萬(wàn)做的網(wǎng)站國(guó)內(nèi)設(shè)計(jì)公司前十名
  • 汽車租賃網(wǎng)站怎么做電商seo優(yōu)化是什么意思
  • 網(wǎng)站服務(wù)器服務(wù)商3d建模培訓(xùn)班一般多少錢
  • 鄭州專業(yè)網(wǎng)站設(shè)計(jì)商丘網(wǎng)絡(luò)推廣外包
  • 互聯(lián)網(wǎng)定制網(wǎng)站網(wǎng)站優(yōu)化排名推廣
  • 做瞹視頻網(wǎng)站網(wǎng)站收錄情況
  • 辦網(wǎng)站需流程網(wǎng)絡(luò)推廣一個(gè)月的收入
  • 網(wǎng)站建設(shè)圖文教程網(wǎng)站如何提交百度收錄
  • 照片做視頻模板下載網(wǎng)站seo外包多少錢
  • 深圳 服裝 網(wǎng)站建設(shè)青島百度網(wǎng)站排名
  • 清溪網(wǎng)站仿做海外推廣專員