當前位置：首頁 > news >正文

無錫企業(yè)網(wǎng)站設(shè)計網(wǎng)絡(luò)整合營銷的特點有

news 2025/7/5 13:01:06

無錫企業(yè)網(wǎng)站設(shè)計,網(wǎng)絡(luò)整合營銷的特點有,包裝設(shè)計接單網(wǎng)站,專業(yè)網(wǎng)站建設(shè)多少錢scrapy概述 Scrapy，Python開發(fā)的一個快速、高層次的屏幕抓取和web抓取框架，用于抓取web站點并從頁面中提取結(jié)構(gòu)化的數(shù)據(jù)。Scrapy用途廣泛，可以用于數(shù)據(jù)挖掘、監(jiān)測和自動化測試 scrapy安裝 pip install scrapy -i https://pypi.tuna.tsinghua…

scrapy概述

Scrapy，Python開發(fā)的一個快速、高層次的屏幕抓取和web抓取框架，用于抓取web站點并從頁面中提取結(jié)構(gòu)化的數(shù)據(jù)。Scrapy用途廣泛，可以用于數(shù)據(jù)挖掘、監(jiān)測和自動化測試

scrapy安裝

pip install scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple

最開始安裝了低版本報錯builtins.AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD' 升級到最新版本2.10.0 沒有問題

scrapy使用

scrapy創(chuàng)建項目及結(jié)構(gòu)

創(chuàng)建項目

scrapy startproject 項目名稱

scrapy自定義爬蟲類

創(chuàng)建爬蟲文件

scrapy genspider 爬蟲文件名稱網(wǎng)頁地址

scrapy genspider MyTestSpider www.baidu.com

一般情況下不需要添加http協(xié)議，因為start urls的值是根據(jù)allowed domains修改的，所以添加了http的話，那么start urls就需要我們手動去修改

import scrapyclass MytestSpider(scrapy.Spider):# 爬蟲的名字 用于運行爬蟲的時候 使用的值name = 'MyTestSpider'# 允許訪問的域名allowed_domains = ['www.baidu.com']# 起始的ur]地址 指的是第一次要訪問的域名start_urls = ['http://www.baidu.com/']def parse(self, response):pass

?scrapy response的屬性和方法

response.text? ? ? ?獲取的是響應(yīng)的字符串

response.body? ? ?獲取的是二進制數(shù)據(jù)

response.xpath? ? 可以直接是xpath方法來解析response中的內(nèi)容

response.extract? 提取seletor對象的data屬性值

response.extract_first?提取seletor列表的第一個值

scrapy啟動爬蟲程序

scrapy crawl? 爬蟲名稱

scrapy crawl MyTestSpider

scrapy原理

1、引擎向spiders要url

2、引擎學(xué)將要爬取的url給調(diào)度器

3、調(diào)度器會將url生成請求對象放到指定的隊列中，從隊列中發(fā)起一個請求

4、引擎將請求交給下載器進行處理

5、下載器發(fā)送請求獲取互聯(lián)網(wǎng)數(shù)據(jù)

6、將數(shù)據(jù)返回給下載器

7、下載器將數(shù)據(jù)返回給引擎

8、引擎將數(shù)據(jù)給spiders

9、spiders解析數(shù)據(jù)，交給引擎，如果發(fā)起第二次請求，會再次交給調(diào)度器

10、引擎將數(shù)據(jù)交給管道

scrapy爬蟲案例

創(chuàng)建項目

scrapy startproject movie

創(chuàng)建spider

scrapy genspider mv https://www.dytt8.net/html/gndy/china/index.html

import scrapyclass MvSpider(scrapy.Spider):name = "mv"allowed_domains = ["www.dytt8.net"]start_urls = ["https://www.dytt8.net/html/gndy/china/index.html"]def parse(self, response):pass

items.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass MovieItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()name = scrapy.Field()src = scrapy.Field()

編寫管道?

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html# useful for handling different item types with a single interface
from itemadapter import ItemAdapterclass MoviePipeline:# 執(zhí)行之前執(zhí)行def open_spider(self, spider):self.fp = open('movie.json','w',encoding='utf-8')def process_item(self, item, spider):self.fp.write(str(item))return item# 執(zhí)行之后執(zhí)行def close_spider(self,spider):self.fp.close()

settings.py開啟管道

BOT_NAME = "movie"SPIDER_MODULES = ["movie.spiders"]
NEWSPIDER_MODULE = "movie.spiders"ROBOTSTXT_OBEY = TrueITEM_PIPELINES = {"movie.pipelines.MoviePipeline": 300,
}REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
FEED_EXPORT_ENCODING = "utf-8"

編寫爬蟲程序

import scrapy
from movie.items import MovieItemclass MvSpider(scrapy.Spider):name = "mv"allowed_domains = ["www.dytt8.net"]start_urls = ["https://www.dytt8.net/html/gndy/china/index.html"]def parse(self, response):a_list = response.xpath('//div[@class="co_content8"]//td[2]//a[2]')for a in a_list:name = a.xpath('./text()').extract_first()href = a.xpath('./@href').extract_first()#第二頁的地址是url = 'https://www.dytt8.net' + href# 對第二頁的鏈接發(fā)起訪問yield scrapy.Request(url=url, callback=self.parse_second,meta={'name':name})def parse_second(self,response):src = response.xpath('//div[@id="Zoom"]//img/@src').extract_first()# 接受到請求的那個meta參數(shù)的值name = response.meta['name']movie = MovieItem(src=src, name=name)# 返回給管道yield movie

運行并查看結(jié)果

進入spider目錄下，執(zhí)行?scrapy crawl mv

查看全文

http://www.risenshineclean.com/news/9370.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网