當(dāng)前位置：首頁 > news >正文

中山商城型網(wǎng)站建設(shè)廣州網(wǎng)站優(yōu)化方式

news 2025/7/14 19:30:51

中山商城型網(wǎng)站建設(shè),廣州網(wǎng)站優(yōu)化方式,工程項(xiàng)目編號(hào)查詢系統(tǒng),網(wǎng)站信息備案查詢yelp數(shù)據(jù)集是研究B2C業(yè)態(tài)的一個(gè)很好的數(shù)據(jù)集，要識(shí)別潛在的熱門商家是一個(gè)多維度的分析過程，涉及用戶行為、商家特征和社區(qū)結(jié)構(gòu)等多個(gè)因素。從yelp數(shù)據(jù)集里我們可以挖掘到下面信息有助于識(shí)別熱門商家用戶評(píng)分和評(píng)論分析評(píng)分均值: 商家的平均評(píng)分是反映其…

yelp數(shù)據(jù)集是研究B2C業(yè)態(tài)的一個(gè)很好的數(shù)據(jù)集，要識(shí)別潛在的熱門商家是一個(gè)多維度的分析過程，涉及用戶行為、商家特征和社區(qū)結(jié)構(gòu)等多個(gè)因素。從yelp數(shù)據(jù)集里我們可以挖掘到下面信息有助于識(shí)別熱門商家

用戶評(píng)分和評(píng)論分析

評(píng)分均值: 商家的平均評(píng)分是反映其受歡迎程度的重要指標(biāo)。較高的平均評(píng)分通常意味著顧客滿意度高，從而可能成為熱門商家。
評(píng)論數(shù)量: 評(píng)論數(shù)量可以反映商家的活躍度和用戶的參與程度。評(píng)論數(shù)量多的商家更可能受到廣泛關(guān)注。

用戶活躍度

用戶評(píng)分行為: 分析活躍用戶（頻繁評(píng)分的用戶）對(duì)商家的評(píng)分，可以識(shí)別出哪些商家在用戶群體中更受歡迎。
用戶影響力: 一些用戶的評(píng)分會(huì)對(duì)其他用戶的選擇產(chǎn)生較大影響（例如，社交媒體影響者）。識(shí)別這些高影響力用戶對(duì)商家的評(píng)分可以幫助識(shí)別潛在熱門商家。

社交網(wǎng)絡(luò)分析

用戶與商家的關(guān)系網(wǎng)絡(luò): 使用圖神經(jīng)網(wǎng)絡(luò)等算法分析用戶與商家之間的關(guān)系。商家與許多用戶有互動(dòng)，且用戶在網(wǎng)絡(luò)中有較高影響力的商家，可能會(huì)被視為熱門商家。
社區(qū)發(fā)現(xiàn): 通過分析用戶和商家之間的關(guān)系網(wǎng)絡(luò)，識(shí)別出相似用戶群體，進(jìn)而識(shí)別出在這些群體中受歡迎的商家。

多維度評(píng)價(jià)

綜合評(píng)價(jià): 結(jié)合多個(gè)指標(biāo)（如評(píng)分、評(píng)論數(shù)、用戶活躍度、地理位置等），使用加權(quán)方法或多指標(biāo)決策模型來綜合評(píng)估商家的受歡迎程度。

使用的文件

yelp_academic_dataset_business.json:
- 包含商家的基本信息，如商家 ID、名稱、類別、位置等。
yelp_academic_dataset_review.json:
- 包含用戶對(duì)商家的評(píng)論及評(píng)分，可以用來分析商家的受歡迎程度和用戶的行為。
yelp_academic_dataset_user.json:
- 包含用戶的基本信息，比如用戶 ID、注冊(cè)時(shí)間、評(píng)價(jià)數(shù)量等，可以用來分析用戶的活躍度和影響力。

通過圖神經(jīng)網(wǎng)絡(luò)（GNN）來識(shí)別商家的影響力：

先加載必要的庫并讀取數(shù)據(jù)文件：

import pandas as pd
import json# 讀取數(shù)據(jù)
with open('yelp_academic_dataset_business.json', 'r') as f:businesses = pd.DataFrame([json.loads(line) for line in f])with open('yelp_academic_dataset_review.json', 'r') as f:reviews = pd.DataFrame([json.loads(line) for line in f])with open('yelp_academic_dataset_user.json', 'r') as f:users = pd.DataFrame([json.loads(line) for line in f])

清洗數(shù)據(jù)以提取有用的信息：

# 過濾出需要的商家和用戶數(shù)據(jù)
businesses = businesses[['business_id', 'name', 'categories', 'city', 'state', 'review_count', 'stars']]
reviews = reviews[['user_id', 'business_id', 'stars']]
users = users[['user_id', 'review_count', 'average_stars']]# 處理類別數(shù)據(jù)
businesses['categories'] = businesses['categories'].str.split(', ').apply(lambda x: x[0] if x else None)

構(gòu)建商家和用戶之間的圖，節(jié)點(diǎn)為商家和用戶，邊為用戶對(duì)商家的評(píng)分。

    edges = []for _, row in reviews.iterrows():if row['user_id'] in node_mapping and row['business_id'] in node_mapping:edges.append([node_mapping[row['user_id']], node_mapping[row['business_id']]])edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()return node_mapping, edge_index, total_nodes

我們可以通過以下方式計(jì)算商家的影響力：

用戶評(píng)分的平均值: 表示商家的受歡迎程度。
評(píng)論數(shù): 提供商家影響力的直觀指標(biāo)。

business_reviews = reviews.groupby('business_id').agg({'stars': ['mean', 'count']
}).reset_index()
business_reviews.columns = ['business_id', 'average_rating', 'review_count']# 合并商家信息和評(píng)論信息
merged_data = businesses.merge(business_reviews, on='business_id', how='left')# 3. 目標(biāo)變量定義
# 定義熱門商家的標(biāo)準(zhǔn)
merged_data['is_popular'] = ((merged_data['average_rating'] > 4.0) &(merged_data['review_count'] > 10)).astype(int)

使用 GNN 進(jìn)一步分析商家的影響力，可以構(gòu)建 GNN 模型并訓(xùn)練。以下是 GNN 模型的基本示例，使用 PyTorch Geometric：

class GNNModel(torch.nn.Module):def __init__(self, num_node_features):super(GNNModel, self).__init__()self.conv1 = GCNConv(num_node_features, 64)self.conv2 = GCNConv(64, 32)self.conv3 = GCNConv(32, 16)self.fc = torch.nn.Linear(16, 1)self.dropout = torch.nn.Dropout(0.3)def forward(self, x, edge_index):x = F.relu(self.conv1(x, edge_index))x = self.dropout(x)x = F.relu(self.conv2(x, edge_index))x = self.dropout(x)x = F.relu(self.conv3(x, edge_index))x = self.fc(x)return x

使用模型的輸出嵌入來分析商家之間的相似度，識(shí)別潛在的熱門商家。

print("Making predictions...")model.eval()with torch.no_grad():predictions = torch.sigmoid(model(data.x.to(device), data.edge_index.to(device))).cpu()# 將預(yù)測(cè)結(jié)果添加到數(shù)據(jù)框merged_data['predicted_popularity'] = 0.0for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]merged_data.loc[row.name, 'predicted_popularity'] = predictions[idx].item()# 輸出潛在熱門商家potential_hot = merged_data[(merged_data['predicted_popularity'] > 0.5) &(merged_data['is_popular'] == 0)].sort_values('predicted_popularity', ascending=False)print("\nPotential Hot Businesses:")print(potential_hot[['name', 'average_rating', 'review_count', 'predicted_popularity']].head())

使用上面定義流程跑一下訓(xùn)練, 報(bào)錯(cuò)了

Traceback (most recent call last):
? File "/opt/miniconda3/envs/lora/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
? ? return self._engine.get_loc(casted_key)
? File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
? File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
? File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
? File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'review_count'
?

把print('merged_data', merged_data) 加上再試下

[150346 rows x 16 columns]
Index(['business_id', 'name', 'address', 'city', 'state', 'postal_code',
? ? ? ?'latitude', 'longitude', 'stars', 'review_count_x', 'is_open',
? ? ? ?'attributes', 'categories', 'hours', 'average_rating',
? ? ? ?'review_count_y'],
? ? ? dtype='object')?

review_count 列被重命名為 review_count_x 和 review_count_y。這通常是因?yàn)樵诤喜⑦^程中，兩個(gè) DataFrame 中都存在 review_count 列。為了繼續(xù)進(jìn)行需要選擇合適的列來作為評(píng)論數(shù)量的依據(jù)。選擇 review_count_x 或 review_count_y: 通常，review_count_x 是從 businesses DataFrame 中來的，而 review_count_y 是從 business_reviews DataFrame 中來的。

代碼修改下

import torch
import pandas as pd
import numpy as np
import torch.nn.functional as F
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split# 1. 數(shù)據(jù)加載
def load_data():businesses = pd.read_json('yelp_academic_dataset_business.json', lines=True)reviews = pd.read_json('yelp_academic_dataset_review.json', lines=True)users = pd.read_json('yelp_academic_dataset_user.json', lines=True)return businesses, reviews, users# 2. 數(shù)據(jù)預(yù)處理
def preprocess_data(businesses, reviews):# 聚合評(píng)論數(shù)據(jù)business_reviews = reviews.groupby('business_id').agg({'stars': ['mean', 'count'],'useful': 'sum','funny': 'sum','cool': 'sum'}).reset_index()# 修復(fù)列名business_reviews.columns = ['business_id', 'average_rating', 'review_count','total_useful', 'total_funny', 'total_cool']# 合并商家信息# 刪除businesses中的review_count列（如果存在）if 'review_count' in businesses.columns:businesses = businesses.drop('review_count', axis=1)# 合并商家信息merged_data = businesses.merge(business_reviews, on='business_id', how='left')# 填充缺失值merged_data = merged_data.fillna(0)return merged_data# 3. 特征工程
def engineer_features(merged_data):# 確保使用正確的列名創(chuàng)建特征merged_data['engagement_score'] = (merged_data['total_useful'] +merged_data['total_funny'] +merged_data['total_cool']) / (merged_data['review_count'] + 1)  # 加1避免除零# 定義熱門商家merged_data['is_popular'] = ((merged_data['average_rating'] >= 4.0) &(merged_data['review_count'] >= merged_data['review_count'].quantile(0.75))).astype(int)return merged_data# 4. 圖構(gòu)建
def build_graph(merged_data, reviews):# 創(chuàng)建節(jié)點(diǎn)映射business_ids = merged_data['business_id'].unique()user_ids = reviews['user_id'].unique()# 修改索引映射，確保從0開始node_mapping = {user_id: i for i, user_id in enumerate(user_ids)}# 商家節(jié)點(diǎn)的索引接續(xù)用戶節(jié)點(diǎn)的索引business_start_idx = len(user_ids)node_mapping.update({business_id: i + business_start_idx for i, business_id in enumerate(business_ids)})# 獲取節(jié)點(diǎn)總數(shù)total_nodes = len(user_ids) + len(business_ids)# 創(chuàng)建邊edges = []for _, row in reviews.iterrows():if row['user_id'] in node_mapping and row['business_id'] in node_mapping:edges.append([node_mapping[row['user_id']], node_mapping[row['business_id']]])edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()return node_mapping, edge_index, total_nodesdef prepare_node_features(merged_data, node_mapping, num_user_nodes, total_nodes):feature_cols = ['average_rating', 'review_count', 'engagement_score']# 確保所有特征列都是數(shù)值類型for col in feature_cols:merged_data[col] = merged_data[col].astype(float)# 標(biāo)準(zhǔn)化特征scaler = StandardScaler()merged_data[feature_cols] = scaler.fit_transform(merged_data[feature_cols])# 創(chuàng)建特征矩陣，使用總節(jié)點(diǎn)數(shù)num_features = len(feature_cols)x = torch.zeros(total_nodes, num_features, dtype=torch.float)# 用戶節(jié)點(diǎn)特征（使用平均值）mean_values = merged_data[feature_cols].mean().values.astype(np.float32)x[:num_user_nodes] = torch.tensor(mean_values, dtype=torch.float)# 商家節(jié)點(diǎn)特征for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]feature_values = row[feature_cols].values.astype(np.float32)if not np.isfinite(feature_values).all():print(f"警告: 發(fā)現(xiàn)無效值 {feature_values}")feature_values = np.nan_to_num(feature_values, 0)x[idx] = torch.tensor(feature_values, dtype=torch.float)return xdef main():print("Starting the program...")# 設(shè)置設(shè)備device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')print(f"Using device: {device}")# 加載數(shù)據(jù)print("Loading data...")businesses, reviews, users = load_data()# 預(yù)處理數(shù)據(jù)print("Preprocessing data...")merged_data = preprocess_data(businesses, reviews)merged_data = engineer_features(merged_data)# 構(gòu)建圖print("Building graph...")node_mapping, edge_index, total_nodes = build_graph(merged_data, reviews)num_user_nodes = len(reviews['user_id'].unique())# 打印節(jié)點(diǎn)信息print(f"Total nodes: {total_nodes}")print(f"User nodes: {num_user_nodes}")print(f"Business nodes: {total_nodes - num_user_nodes}")print(f"Max node index in mapping: {max(node_mapping.values())}")# 準(zhǔn)備特征print("Preparing node features...")x = prepare_node_features(merged_data, node_mapping, num_user_nodes, total_nodes)# 準(zhǔn)備標(biāo)簽print("Preparing labels...")labels = torch.zeros(total_nodes)business_mask = torch.zeros(total_nodes, dtype=torch.bool)for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]labels[idx] = row['is_popular']business_mask[idx] = True# 創(chuàng)建圖數(shù)據(jù)對(duì)象data = Data(x=x, edge_index=edge_index)# 初始化模型print("Initializing model...")model = GNNModel(num_node_features=x.size(1)).to(device)# 訓(xùn)練模型print("Training model...")train_model(model, data, labels, business_mask, device)# 預(yù)測(cè)print("Making predictions...")model.eval()with torch.no_grad():predictions = torch.sigmoid(model(data.x.to(device), data.edge_index.to(device))).cpu()# 將預(yù)測(cè)結(jié)果添加到數(shù)據(jù)框merged_data['predicted_popularity'] = 0.0for _, row in merged_data.iterrows():if row['business_id'] in node_mapping:idx = node_mapping[row['business_id']]merged_data.loc[row.name, 'predicted_popularity'] = predictions[idx].item()# 輸出潛在熱門商家potential_hot = merged_data[(merged_data['predicted_popularity'] > 0.5) &(merged_data['is_popular'] == 0)].sort_values('predicted_popularity', ascending=False)print("\nPotential Hot Businesses:")print(potential_hot[['name', 'average_rating', 'review_count', 'predicted_popularity']].head())# 6. GNN模型定義
class GNNModel(torch.nn.Module):def __init__(self, num_node_features):super(GNNModel, self).__init__()self.conv1 = GCNConv(num_node_features, 64)self.conv2 = GCNConv(64, 32)self.conv3 = GCNConv(32, 16)self.fc = torch.nn.Linear(16, 1)self.dropout = torch.nn.Dropout(0.3)def forward(self, x, edge_index):x = F.relu(self.conv1(x, edge_index))x = self.dropout(x)x = F.relu(self.conv2(x, edge_index))x = self.dropout(x)x = F.relu(self.conv3(x, edge_index))x = self.fc(x)return x# 7. 訓(xùn)練函數(shù)
def train_model(model, data, labels, business_mask, device, epochs=100):optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)criterion = torch.nn.BCEWithLogitsLoss()model.train()for epoch in range(epochs):optimizer.zero_grad()out = model(data.x.to(device), data.edge_index.to(device))loss = criterion(out[business_mask], labels[business_mask].unsqueeze(1).to(device))loss.backward()optimizer.step()print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')if __name__ == "__main__":main()

開始正式訓(xùn)練，先按照epoch=100做迭代訓(xùn)練測(cè)試，loss向收斂方向滑動(dòng)

識(shí)別出熱門店鋪

Potential Hot Businesses:
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?name ?average_rating ?review_count ?predicted_popularity
100024 ? ? ? ? ? ? ?Mother's Restaurant ? ? ? -0.154731 ? ? 41.821089 ? ? ? ? ? ? ?0.999941
31033 ? ? ? ? ? ? ? ? ? ? ? Royal House ? ? ? ?0.207003 ? ? 40.953749 ? ? ? ? ? ? ?0.999933
113983 ? ? ? ? ? ? Pat's King of Steaks ? ? ? -0.361171 ? ? 34.103369 ? ? ? ? ? ? ?0.999805
64541 ? Felix's Restaurant & Oyster Bar ? ? ? ?0.389155 ? ? 32.023360 ? ? ? ? ? ? ?0.999725
42331 ? ? ? ? ? ? ? ? ? ? ? ?Gumbo Shop ? ? ? ?0.340872 ? ? 31.517411 ? ? ? ? ? ? ?0.999701

查看全文

http://www.risenshineclean.com/news/21530.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

中山商城型網(wǎng)站建設(shè)廣州網(wǎng)站優(yōu)化方式

用戶評(píng)分和評(píng)論分析

用戶活躍度

社交網(wǎng)絡(luò)分析

多維度評(píng)價(jià)

使用的文件

相關(guān)文章：