當(dāng)前位置：首頁 > news >正文

模仿別人的網(wǎng)站東莞關(guān)鍵詞優(yōu)化平臺(tái)

news 2025/7/9 17:27:32

模仿別人的網(wǎng)站,東莞關(guān)鍵詞優(yōu)化平臺(tái),想自己做微信網(wǎng)站的工作,做電商網(wǎng)站前端的技術(shù)選型是本項(xiàng)目純學(xué)習(xí)使用。 1 scrapy 代碼爬取邏輯非常簡(jiǎn)單，根據(jù)url來處理翻頁，然后獲取到詳情頁面的鏈接，再去爬取詳情頁面的內(nèi)容即可，最終數(shù)據(jù)落地到excel中。經(jīng)測(cè)試，總計(jì)獲取 11299條中醫(yī)藥材數(shù)據(jù)。 import pandas as…

本項(xiàng)目純學(xué)習(xí)使用。

1 scrapy 代碼

爬取邏輯非常簡(jiǎn)單，根據(jù)url來處理翻頁，然后獲取到詳情頁面的鏈接，再去爬取詳情頁面的內(nèi)容即可，最終數(shù)據(jù)落地到excel中。
經(jīng)測(cè)試，總計(jì)獲取 11299條中醫(yī)藥材數(shù)據(jù)。

import pandas as pd
import scrapyclass ZhongyaoSpider(scrapy.Spider):name = "zhongyao"start_urls = [f"https://www.zysj.com.cn/zhongyaocai/index__{i}.html" for i in range(1, 27)]def __init__(self, *args, **kwargs):self.data = []def parse(self, response):for li in response.css('div#list-content ul li'):a_tag = li.css('a')title = a_tag.css('::attr(title)').get()href = a_tag.css('::attr(href)').get()if title and href:# 構(gòu)建完整的詳情頁 URLdetail_url = response.urljoin(href)yield scrapy.Request(detail_url, callback=self.parse_detail, meta={'title': title})# 解析邏輯def parse_detail(self, response):title = response.meta['title']pinyin = response.css('div.item.pinyin_name_phonetic div.item-content::text').get(default='').strip()alias = response.css('div.item.alias div.item-content p::text').get(default='').strip()english_name = response.css('div.item.english_name div.item-content::text').get(default='').strip()source = response.css('div.item.alias div.item-content p::text').get(default='').strip()# 性味flavor = response.css('div.item.flavor div.item-content p::text').get(default='').strip()functional_indications = response.css('div.item.flavor div.item-content p::text').get(default='').strip()usage = response.css('div.item.usage div.item-content p::text').get(default='').strip()excerpt = response.css('div.item.excerpt div.item-content::text').get(default='').strip()#habitat = response.css('div.item.habitat div.item-content p::text').get(default='').strip()# 出處provenance = response.css('div.item.provenance div.item-content p::text').get(default='').strip()# 性狀shape_properties = response.css('div.item.shape_properties div.item-content p::text').get(default='').strip()# 歸經(jīng)attribution = response.css('div.item.attribution div.item-content p::text').get(default='').strip()#  原形態(tài)prototype = response.css('div.item.prototype div.item-content p::text').get(default='').strip()# 名家論述discuss = response.css('div.item.discuss div.item-content p::text').get(default='').strip()# 化學(xué)成分chemical_composition = response.css('div.item.chemical_composition div.item-content p::text').get(default='').strip()item = {'title': title,'pinyin': pinyin,'alias': alias,'source': source,'english_name': english_name,'habitat': habitat,'flavor': flavor,'functional_indications': functional_indications,'usage': usage,'excerpt': excerpt,'provenance': provenance,'shape_properties': shape_properties,'attribution':  attribution,'prototype': prototype,'discuss': discuss,'chemical_composition': chemical_composition,}self.data.append(item)yield itemdef closed(self, reason):# 當(dāng)爬蟲關(guān)閉時(shí)，保存數(shù)據(jù)到 Excel 文件df = pd.DataFrame(self.data)df.to_excel('zhongyao_data.xlsx', index=False)

2 爬取截圖

在這里插入圖片描述

3 爬取數(shù)據(jù)截圖

在這里插入圖片描述

查看全文

http://www.risenshineclean.com/news/5615.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

模仿別人的網(wǎng)站東莞關(guān)鍵詞優(yōu)化平臺(tái)

1 scrapy 代碼

2 爬取截圖

3 爬取數(shù)據(jù)截圖

相關(guān)文章：