中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

當前位置: 首頁 > news >正文

中山商城型網(wǎng)站建設企業(yè)模板建站

中山商城型網(wǎng)站建設,企業(yè)模板建站,網(wǎng)站建設 風險,網(wǎng)站設計知識準備yelp數(shù)據(jù)集是研究B2C業(yè)態(tài)的一個很好的數(shù)據(jù)集,要識別潛在的熱門商家是一個多維度的分析過程,涉及用戶行為、商家特征和社區(qū)結構等多個因素。從yelp數(shù)據(jù)集里我們可以挖掘到下面信息有助于識別熱門商家 用戶評分和評論分析 評分均值: 商家的平均評分是反映其…

yelp數(shù)據(jù)集是研究B2C業(yè)態(tài)的一個很好的數(shù)據(jù)集,要識別潛在的熱門商家是一個多維度的分析過程,涉及用戶行為、商家特征和社區(qū)結構等多個因素。從yelp數(shù)據(jù)集里我們可以挖掘到下面信息有助于識別熱門商家

用戶評分和評論分析

  • 評分均值: 商家的平均評分是反映其受歡迎程度的重要指標。較高的平均評分通常意味著顧客滿意度高,從而可能成為熱門商家。
  • 評論數(shù)量: 評論數(shù)量可以反映商家的活躍度和用戶的參與程度。評論數(shù)量多的商家更可能受到廣泛關注。

用戶活躍度

  • 用戶評分行為: 分析活躍用戶(頻繁評分的用戶)對商家的評分,可以識別出哪些商家在用戶群體中更受歡迎。
  • 用戶影響力: 一些用戶的評分會對其他用戶的選擇產(chǎn)生較大影響(例如,社交媒體影響者)。識別這些高影響力用戶對商家的評分可以幫助識別潛在熱門商家。

社交網(wǎng)絡分析

  • 用戶與商家的關系網(wǎng)絡: 使用圖神經(jīng)網(wǎng)絡等算法分析用戶與商家之間的關系。商家與許多用戶有互動,且用戶在網(wǎng)絡中有較高影響力的商家,可能會被視為熱門商家。
  • 社區(qū)發(fā)現(xiàn): 通過分析用戶和商家之間的關系網(wǎng)絡,識別出相似用戶群體,進而識別出在這些群體中受歡迎的商家。

多維度評價

  • 綜合評價: 結合多個指標(如評分、評論數(shù)、用戶活躍度、地理位置等),使用加權方法或多指標決策模型來綜合評估商家的受歡迎程度。

使用的文件

  1. yelp_academic_dataset_business.json:

    • 包含商家的基本信息,如商家 ID、名稱、類別、位置等。
  2. yelp_academic_dataset_review.json:

    • 包含用戶對商家的評論及評分,可以用來分析商家的受歡迎程度和用戶的行為。
  3. yelp_academic_dataset_user.json:

    • 包含用戶的基本信息,比如用戶 ID、注冊時間、評價數(shù)量等,可以用來分析用戶的活躍度和影響力。

通過圖神經(jīng)網(wǎng)絡(GNN)來識別商家的影響力:

先加載必要的庫并讀取數(shù)據(jù)文件:

import pandas as pd
import json# 讀取數(shù)據(jù)
with open('yelp_academic_dataset_business.json', 'r') as f:businesses = pd.DataFrame([json.loads(line) for line in f])with open('yelp_academic_dataset_review.json', 'r') as f:reviews = pd.DataFrame([json.loads(line) for line in f])with open('yelp_academic_dataset_user.json', 'r') as f:users = pd.DataFrame([json.loads(line) for line in f])

清洗數(shù)據(jù)以提取有用的信息:

# 過濾出需要的商家和用戶數(shù)據(jù)
businesses = businesses[['business_id', 'name', 'categories', 'city', 'state', 'review_count', 'stars']]
reviews = reviews[['user_id', 'business_id', 'stars']]
users = users[['user_id', 'review_count', 'average_stars']]# 處理類別數(shù)據(jù)
businesses['categories'] = businesses['categories'].str.split(', ').apply(lambda x: x[0] if x else None)

構建商家和用戶之間的圖,節(jié)點為商家和用戶,邊為用戶對商家的評分。

    edges = []for _, row in reviews.iterrows():if row['user_id'] in node_mapping and row['business_id'] in node_mapping:edges.append([node_mapping[row['user_id']], node_mapping[row['business_id']]])edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()return node_mapping, edge_index, total_nodes

我們可以通過以下方式計算商家的影響力:

  • 用戶評分的平均值: 表示商家的受歡迎程度。
  • 評論數(shù): 提供商家影響力的直觀指標。
business_reviews = reviews.groupby('business_id').agg({'stars': ['mean', 'count']
}).reset_index()
business_reviews.columns = ['business_id', 'average_rating', 'review_count']# 合并商家信息和評論信息
merged_data = businesses.merge(business_reviews, on='business_id', how='left')# 3. 目標變量定義
# 定義熱門商家的標準
merged_data['is_popular'] = ((merged_data['average_rating'] > 4.0) &(merged_data['review_count'] > 10)).astype(int)

使用 GNN 進一步分析商家的影響力 ,可以構建 GNN 模型并訓練。以下是 GNN 模型的基本示例,使用 PyTorch Geometric:

class GNNModel(torch.nn.Module):def __init__(self, num_node_features):super(GNNModel, self).__init__()self.conv1 = GCNConv(num_node_features, 64)self.conv2 = GCNConv(64, 32)self.conv3 = GCNConv(32, 16)self.fc = torch.nn.Linear(16, 1)self.dropout = torch.nn.Dropout(0.3)def forward(self, x, edge_index):x = F.relu(self.conv1(x, edge_index))x = self.dropout(x)x = F.relu(self.conv2(x, edge_index))x = self.dropout(x)x = F.relu(self.conv3(x, edge_index))x = self.fc(x)return x

使用模型的輸出嵌入來分析商家之間的相似度,識別潛在的熱門商家。

print("Making predictions...")model.eval()with torch.no_grad():predictions = torch.sigmoid(model(data.x.to(device), data.edge_index.to(device))).cpu()# 將預測結果添加到數(shù)據(jù)框merged_data['predicted_popularity'] = 0.0for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]merged_data.loc[row.name, 'predicted_popularity'] = predictions[idx].item()# 輸出潛在熱門商家potential_hot = merged_data[(merged_data['predicted_popularity'] > 0.5) &(merged_data['is_popular'] == 0)].sort_values('predicted_popularity', ascending=False)print("\nPotential Hot Businesses:")print(potential_hot[['name', 'average_rating', 'review_count', 'predicted_popularity']].head())

使用上面定義流程跑一下訓練, 報錯了

Traceback (most recent call last):
? File "/opt/miniconda3/envs/lora/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
? ? return self._engine.get_loc(casted_key)
? File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
? File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
? File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
? File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'review_count'
?

把print('merged_data', merged_data) 加上再試下

[150346 rows x 16 columns]
Index(['business_id', 'name', 'address', 'city', 'state', 'postal_code',
? ? ? ?'latitude', 'longitude', 'stars', 'review_count_x', 'is_open',
? ? ? ?'attributes', 'categories', 'hours', 'average_rating',
? ? ? ?'review_count_y'],
? ? ? dtype='object')?

review_count 列被重命名為 review_count_xreview_count_y。這通常是因為在合并過程中,兩個 DataFrame 中都存在 review_count 列。為了繼續(xù)進行需要選擇合適的列來作為評論數(shù)量的依據(jù)。選擇 review_count_xreview_count_y: 通常,review_count_x 是從 businesses DataFrame 中來的,而 review_count_y 是從 business_reviews DataFrame 中來的。

代碼修改下

import torch
import pandas as pd
import numpy as np
import torch.nn.functional as F
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split# 1. 數(shù)據(jù)加載
def load_data():businesses = pd.read_json('yelp_academic_dataset_business.json', lines=True)reviews = pd.read_json('yelp_academic_dataset_review.json', lines=True)users = pd.read_json('yelp_academic_dataset_user.json', lines=True)return businesses, reviews, users# 2. 數(shù)據(jù)預處理
def preprocess_data(businesses, reviews):# 聚合評論數(shù)據(jù)business_reviews = reviews.groupby('business_id').agg({'stars': ['mean', 'count'],'useful': 'sum','funny': 'sum','cool': 'sum'}).reset_index()# 修復列名business_reviews.columns = ['business_id', 'average_rating', 'review_count','total_useful', 'total_funny', 'total_cool']# 合并商家信息# 刪除businesses中的review_count列(如果存在)if 'review_count' in businesses.columns:businesses = businesses.drop('review_count', axis=1)# 合并商家信息merged_data = businesses.merge(business_reviews, on='business_id', how='left')# 填充缺失值merged_data = merged_data.fillna(0)return merged_data# 3. 特征工程
def engineer_features(merged_data):# 確保使用正確的列名創(chuàng)建特征merged_data['engagement_score'] = (merged_data['total_useful'] +merged_data['total_funny'] +merged_data['total_cool']) / (merged_data['review_count'] + 1)  # 加1避免除零# 定義熱門商家merged_data['is_popular'] = ((merged_data['average_rating'] >= 4.0) &(merged_data['review_count'] >= merged_data['review_count'].quantile(0.75))).astype(int)return merged_data# 4. 圖構建
def build_graph(merged_data, reviews):# 創(chuàng)建節(jié)點映射business_ids = merged_data['business_id'].unique()user_ids = reviews['user_id'].unique()# 修改索引映射,確保從0開始node_mapping = {user_id: i for i, user_id in enumerate(user_ids)}# 商家節(jié)點的索引接續(xù)用戶節(jié)點的索引business_start_idx = len(user_ids)node_mapping.update({business_id: i + business_start_idx for i, business_id in enumerate(business_ids)})# 獲取節(jié)點總數(shù)total_nodes = len(user_ids) + len(business_ids)# 創(chuàng)建邊edges = []for _, row in reviews.iterrows():if row['user_id'] in node_mapping and row['business_id'] in node_mapping:edges.append([node_mapping[row['user_id']], node_mapping[row['business_id']]])edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()return node_mapping, edge_index, total_nodesdef prepare_node_features(merged_data, node_mapping, num_user_nodes, total_nodes):feature_cols = ['average_rating', 'review_count', 'engagement_score']# 確保所有特征列都是數(shù)值類型for col in feature_cols:merged_data[col] = merged_data[col].astype(float)# 標準化特征scaler = StandardScaler()merged_data[feature_cols] = scaler.fit_transform(merged_data[feature_cols])# 創(chuàng)建特征矩陣,使用總節(jié)點數(shù)num_features = len(feature_cols)x = torch.zeros(total_nodes, num_features, dtype=torch.float)# 用戶節(jié)點特征(使用平均值)mean_values = merged_data[feature_cols].mean().values.astype(np.float32)x[:num_user_nodes] = torch.tensor(mean_values, dtype=torch.float)# 商家節(jié)點特征for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]feature_values = row[feature_cols].values.astype(np.float32)if not np.isfinite(feature_values).all():print(f"警告: 發(fā)現(xiàn)無效值 {feature_values}")feature_values = np.nan_to_num(feature_values, 0)x[idx] = torch.tensor(feature_values, dtype=torch.float)return xdef main():print("Starting the program...")# 設置設備device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')print(f"Using device: {device}")# 加載數(shù)據(jù)print("Loading data...")businesses, reviews, users = load_data()# 預處理數(shù)據(jù)print("Preprocessing data...")merged_data = preprocess_data(businesses, reviews)merged_data = engineer_features(merged_data)# 構建圖print("Building graph...")node_mapping, edge_index, total_nodes = build_graph(merged_data, reviews)num_user_nodes = len(reviews['user_id'].unique())# 打印節(jié)點信息print(f"Total nodes: {total_nodes}")print(f"User nodes: {num_user_nodes}")print(f"Business nodes: {total_nodes - num_user_nodes}")print(f"Max node index in mapping: {max(node_mapping.values())}")# 準備特征print("Preparing node features...")x = prepare_node_features(merged_data, node_mapping, num_user_nodes, total_nodes)# 準備標簽print("Preparing labels...")labels = torch.zeros(total_nodes)business_mask = torch.zeros(total_nodes, dtype=torch.bool)for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]labels[idx] = row['is_popular']business_mask[idx] = True# 創(chuàng)建圖數(shù)據(jù)對象data = Data(x=x, edge_index=edge_index)# 初始化模型print("Initializing model...")model = GNNModel(num_node_features=x.size(1)).to(device)# 訓練模型print("Training model...")train_model(model, data, labels, business_mask, device)# 預測print("Making predictions...")model.eval()with torch.no_grad():predictions = torch.sigmoid(model(data.x.to(device), data.edge_index.to(device))).cpu()# 將預測結果添加到數(shù)據(jù)框merged_data['predicted_popularity'] = 0.0for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]merged_data.loc[row.name, 'predicted_popularity'] = predictions[idx].item()# 輸出潛在熱門商家potential_hot = merged_data[(merged_data['predicted_popularity'] > 0.5) &(merged_data['is_popular'] == 0)].sort_values('predicted_popularity', ascending=False)print("\nPotential Hot Businesses:")print(potential_hot[['name', 'average_rating', 'review_count', 'predicted_popularity']].head())# 6. GNN模型定義
class GNNModel(torch.nn.Module):def __init__(self, num_node_features):super(GNNModel, self).__init__()self.conv1 = GCNConv(num_node_features, 64)self.conv2 = GCNConv(64, 32)self.conv3 = GCNConv(32, 16)self.fc = torch.nn.Linear(16, 1)self.dropout = torch.nn.Dropout(0.3)def forward(self, x, edge_index):x = F.relu(self.conv1(x, edge_index))x = self.dropout(x)x = F.relu(self.conv2(x, edge_index))x = self.dropout(x)x = F.relu(self.conv3(x, edge_index))x = self.fc(x)return x# 7. 訓練函數(shù)
def train_model(model, data, labels, business_mask, device, epochs=100):optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)criterion = torch.nn.BCEWithLogitsLoss()model.train()for epoch in range(epochs):optimizer.zero_grad()out = model(data.x.to(device), data.edge_index.to(device))loss = criterion(out[business_mask], labels[business_mask].unsqueeze(1).to(device))loss.backward()optimizer.step()print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')if __name__ == "__main__":main()

開始正式訓練,先按照epoch=100做迭代訓練測試,loss向收斂方向滑動

識別出熱門店鋪

Potential Hot Businesses:
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name ?average_rating ?review_count ?predicted_popularity
100024 ? ? ? ? ? ? ?Mother's Restaurant ? ? ? -0.154731 ? ? 41.821089 ? ? ? ? ? ? ?0.999941
31033 ? ? ? ? ? ? ? ? ? ? ? Royal House ? ? ? ?0.207003 ? ? 40.953749 ? ? ? ? ? ? ?0.999933
113983 ? ? ? ? ? ? Pat's King of Steaks ? ? ? -0.361171 ? ? 34.103369 ? ? ? ? ? ? ?0.999805
64541 ? Felix's Restaurant & Oyster Bar ? ? ? ?0.389155 ? ? 32.023360 ? ? ? ? ? ? ?0.999725
42331 ? ? ? ? ? ? ? ? ? ? ? ?Gumbo Shop ? ? ? ?0.340872 ? ? 31.517411 ? ? ? ? ? ? ?0.999701

http://www.risenshineclean.com/news/7407.html

相關文章:

  • com網(wǎng)站建設app開發(fā)費用一覽表
  • 帝國新聞網(wǎng)站模板近一周的新聞大事熱點
  • 網(wǎng)站建設 php東莞seo網(wǎng)絡公司
  • 橙色企業(yè)網(wǎng)站模板網(wǎng)店推廣有哪些
  • 廣西欽州有人幫做網(wǎng)站的公司嗎百度地圖3d實景地圖
  • 綏化市建設工程網(wǎng)站招投標app推廣公司怎么對接業(yè)務
  • 新蒲建設集團網(wǎng)站數(shù)據(jù)交換平臺
  • 如何做視頻網(wǎng)站技術指標網(wǎng)絡營銷策劃的流程
  • 大安移動網(wǎng)站建設西安企業(yè)seo
  • 展示型手機網(wǎng)站模板seo就業(yè)
  • 湖南省建設廳網(wǎng)站首頁網(wǎng)絡顧問
  • 幫企業(yè)外賣網(wǎng)站做推移動網(wǎng)站優(yōu)化排名
  • 網(wǎng)站免費優(yōu)化工具怎樣宣傳網(wǎng)站
  • 在建設政府門戶網(wǎng)站時要充分考慮到引流推廣是什么意思
  • 廈門網(wǎng)紅鄭州seo技術
  • 網(wǎng)站建設 網(wǎng)站優(yōu)化5118數(shù)據(jù)分析平臺官網(wǎng)
  • 免費域名申請哪個網(wǎng)站好產(chǎn)品推廣方案ppt
  • 建站優(yōu)化收費石家莊抖音seo
  • 好看又免費的圖片素材網(wǎng)站成都網(wǎng)站推廣經(jīng)理
  • 電腦維修 做網(wǎng)站軟文網(wǎng)站名稱
  • 企業(yè)做網(wǎng)站需要什么資料關鍵詞優(yōu)化哪個好
  • 棗莊住房和城鄉(xiāng)建設局網(wǎng)站滄州網(wǎng)站優(yōu)化公司
  • 中山做網(wǎng)站博客推廣的方法與技巧
  • 深圳 公司網(wǎng)站建設優(yōu)化關鍵詞排名哪家好
  • 做網(wǎng)站一般什么配置超級外鏈自動發(fā)布工具
  • 品牌做網(wǎng)站搜索百度網(wǎng)址網(wǎng)頁
  • 申請做網(wǎng)站 論壇版主seo站長綜合查詢
  • 招財貓網(wǎng)站怎么做搜索熱詞排名
  • 做網(wǎng)站程序員都要先做維護么數(shù)字營銷是干啥的
  • 如何搭建免費網(wǎng)站營銷培訓視頻課程免費