當(dāng)前位置：首頁 > news >正文

做水果網(wǎng)站行營銷推廣的形式包括

news 2025/7/13 8:17:32

做水果網(wǎng)站行,營銷推廣的形式包括,優(yōu)質(zhì)做網(wǎng)站公司,小型網(wǎng)站建設(shè)文章目錄前言一、pdfplumber模塊1.1 pdfplumber的特點(diǎn)1.2 pdfplumber.PDF類1.3pdfplumber.Page類二 pdfplumber的使用2.1 加載PDF2.2 pdfplumber.PDF 類2.3 pdfplumber.Page 類2.4 讀取PDF2.5 讀取PDF文檔信息2.6 查看總頁數(shù)2.7 查看總頁數(shù)讀取第一頁的寬度，頁高等…

文章目錄

前言
一、pdfplumber模塊
- 1.1 pdfplumber的特點(diǎn)
- 1.2 pdfplumber.PDF類
- 1.3pdfplumber.Page類
二 pdfplumber的使用
- 2.1 加載PDF
- 2.2 pdfplumber.PDF 類
- 2.3 pdfplumber.Page 類
- 2.4 讀取PDF
- 2.5 讀取PDF文檔信息
- 2.6 查看總頁數(shù)
- 2.7 查看總頁數(shù)讀取第一頁的寬度，頁高等信息
- 2.8 讀取文本
- 2.9 讀取表格
- 3.1 pdfplumber提取表格數(shù)據(jù)
- 示例
- - 讀取文字
  - 讀取表格

前言

PDF是一種編寫文檔格式，便于跨操作系統(tǒng)傳播文檔。Python的開源庫 pdfplumber，可以較為方便地獲取pdf的各種信息，包含pdf的基本信息（作者、創(chuàng)建時(shí)間、修改時(shí)間…）及表格、文本、圖片等信息，基本可以滿足較為簡單的格式轉(zhuǎn)換功能。

一、pdfplumber模塊

1.1 pdfplumber的特點(diǎn)

1、可以輕松訪問有關(guān)每個(gè)PDF對象的詳細(xì)信息。
2、可以提取文本和表格的更高級別，可以自定義的方法。
3、支持緊密集成的可視化調(diào)試。
4、有通過裁剪框過濾對象等實(shí)用功能。

pdfplumber中有兩個(gè)基礎(chǔ)類，PDF和Page。PDF用來處理整個(gè)文檔，Page用來處理整個(gè)頁面。

1.2 pdfplumber.PDF類

.metadata: 獲取pdf基礎(chǔ)信息，返回字典
pages 一個(gè)包含pdfplumber.Page實(shí)例的列表，每一個(gè)實(shí)例代表pdf每一頁的信息。

1.3pdfplumber.Page類

pdfplumber核心功能，對PDF的大部分操作都是基于這個(gè)類，包括提取文本、表格、尺寸等。

二 pdfplumber的使用

2.1 加載PDF

調(diào)用pdfplumber.open(x)加載PDF, 其中x可以有以下幾種格式:a、PDF文件路徑。b、文件對象, 以字節(jié)流形式加載。c、類文件對象, 以字節(jié)流形式加載。

讀取 PDF代碼：pdfplumber.open("路徑/文件名.pdf",password="test",laparams={"line_overlap":0.7})
解讀：
passworf:加載受密碼保護(hù)的PDF要傳遞password關(guān)鍵字參數(shù)。
laparams：將布局分析參數(shù)設(shè)置為pdfminer.six的布局引擎，傳遞laparams關(guān)鍵字參數(shù)。

2.2 pdfplumber.PDF 類

pdfplumber.PDF 類代表一個(gè)PDF文件,主要有兩個(gè)屬性。

屬性	說明
.metadata	元數(shù)據(jù)鍵值對字典,摘自PDF的“信息”。通常包括“CreationDate"(創(chuàng)建日期)、“ModDate"(修改日期)、Producer"(創(chuàng)建者)等。
.pages	包含pdfplumber . Page(頁實(shí)例)的列表。

2.3 pdfplumber.Page 類

pdfplumber.Page是pdfplumber核心，大部分的操作都是圍繞此類進(jìn)行。

屬性	說
.page_number	頁碼
.width	頁面寬
.height	頁面長
.objects/ .chars /.lines /. rects /. curves / . images	屬性中的每一個(gè)都是一個(gè)列表，每個(gè)列表都是嵌入在頁面上的每個(gè)此類對象包含一個(gè)字典。

2.4 讀取PDF

import pdfplumber
import pandas as pdwith pdfplumber.open("ag-energy-round-up-2017-02-24.pdf") as pdf:

2.5 讀取PDF文檔信息

with pdfplumber.open("ag-energy-round-up-2017-02-24.pdf") as pdf:print(pdf.metadata)

結(jié)果

{‘Title’: ‘National Ag Energy’, ‘Author’: ‘LGMN, Des Moines, IA’, ‘Keywords’: ‘National Ag Energy ethanol biodiesel bioenergy’, ‘CreationDate’: “D:20170224133144-06’00’”, ‘ModDate’: “D:20170224133144-06’00’”, ‘Producer’: ‘Microsoft? Excel? 2013’, ‘Creator’: ‘Microsoft? Excel? 2013’}

2.6 查看總頁數(shù)

len(pdf.pages)

2.7 查看總頁數(shù)讀取第一頁的寬度，頁高等信息

first_page = pdf.pages[0]
# 查看頁碼
print('頁碼：', first_page.page_number)# 查看頁寬print('頁寬：', first_page.width)
# 查看頁高
print('頁高：', first_page.height)

2.8 讀取文本

with pdfplumber.open("繼（吊巖坪）110-2018-05（都吊東線2區(qū)）.pdf") as pdf:# 第一頁pdfplumber.Page實(shí)例first_page = pdf.pages[0]text = first_page.extract_text()print(text)

2.9 讀取表格

import pdfplumber
import pandas as pdwith pdfplumber.open("繼（吊巖坪）110-2018-05（都吊東線2區(qū)）.pdf") as pdf:page_third = pdf.pages[0]table_1 = page_third.extract_table()#table_df = pd.DataFrame(table_1[1:], columns=table_1[0])print(table_1)

##三、示例

3.1 pdfplumber提取表格數(shù)據(jù)

提取表格數(shù)據(jù)主要用到extract_tables()和extract_table()兩種方法，這兩種提取方式各有不同。
extract_tables()方法——輸出文檔所有表格，返回一個(gè)嵌套列表。

with pdfplumber.open(r'繼（吊巖坪）110-2018-05（都吊東線2區(qū)）.pdf') as pdf_info:  # 打開pdfpage_one = pdf_info.pages[0]page_one_table = page_one.extract_tables()  # 獲取pdf第一頁的所有表格數(shù)據(jù)for row in page_one_table:print('第一頁的表格數(shù)據(jù)：', row)

extact_table()方法——不會返回文檔的所有表格，僅返回行數(shù)最多的表格數(shù)據(jù)。如存在多個(gè)行數(shù)相等的表格，則默認(rèn)輸出頂部表格數(shù)據(jù)。表格的每一行都為一個(gè)單獨(dú)的列表，列表中的元素即為原表格的各個(gè)單元格的數(shù)據(jù)。

示例

# 提取pdf表格數(shù)據(jù)并保存到excel中
import pdfplumber
from openpyxl import Workbookclass PDF(object):def __init__(self, file_path):self.pdf_path = file_path# 讀取pdftry:self.pdf_info = pdfplumber.open(self.pdf_path)print('讀取文件完成！')except Exception as e:print('讀取文件失敗：', e)# 打印pdf的基本信息、返回字典，作者、創(chuàng)建時(shí)間、修改時(shí)間/總頁數(shù)def get_pdf(self):pdf_info = self.pdf_info.metadatapdf_page = len(self.pdf_info.pages)print('pdf共%s頁' % pdf_page)print("pdf文件基本信息：\n", pdf_info)self.close_pdf()# 提取表格數(shù)據(jù),并保存到excel中def get_table(self):wb = Workbook()  # 實(shí)例化一個(gè)工作簿對象ws = wb.active  # 獲取第一個(gè)sheetcon = 0try:# 獲取每一頁的表格中的文字，返回table、row、cell格式：[[[row1],[row2]]]for page in self.pdf_info.pages:for table in page.extract_tables():for row in table:# 對每個(gè)單元格的字符進(jìn)行簡單清洗處理row_list = [cell.replace('\n', ' ') if cell else '' for cell in row]ws.append(row_list)  # 寫入數(shù)據(jù)con += 1print('---------------分割線,第%s頁---------------' % con)except Exception as e:print('報(bào)錯(cuò)：', e)finally:wb.save('\\'.join(self.pdf_path.split('\\')[:-1]) + '\pdf_excel.xlsx')print('寫入完成！')self.close_pdf()# 關(guān)閉文件def close_pdf(self):self.pdf_info.close()if __name__ == "__main__":file_path = input('請輸入pdf文件路徑：')pdf_info = PDF(file_path)pdf_info.get_pdf() # 打印pdf基礎(chǔ)信息# 提取pdf表格數(shù)據(jù)并保存到excel中,文件保存到跟pdf同一文件路徑下pdf_info.get_table()

import pdfplumber
text_path = r"D:\Project\MyData\Study\GUI\6_GUI編程（第三部分）\第十一章GUI圖形用戶界面編程.pdf"with pdfplumber.open(text_path) as pdf:print(pdf.pages)#獲取pdf文檔所有的頁，類型是dicttotal_pages = len(pdf.pages)print("total_pages: ",total_pages)page = pdf.pages[0]  #獲取第一頁print(type(page))  #<class 'pdfplumber.page.Page'># print(page.extract_text())  #獲取第一頁的內(nèi)容#fitz讀取pdf全文content=""for i in range(0,len(pdf.pages)):# page=content += pdf.pages[i].extract_text()# print(page.extract_text())# print(page.extract_tables())# print(content)

讀取文字

import pdfplumber
import pandas as pdwith pdfplumber.open("E:\\600aaa_2.pdf") as pdf:page_count = len(pdf.pages)print(page_count)  # 得到頁數(shù)for page in pdf.pages:print('---------- 第[%d]頁 ----------' % page.page_number)# 獲取當(dāng)前頁面的全部文本信息，包括表格中的文字print(page.extract_text())

讀取表格

import pdfplumber
import pandas as pd
import rewith pdfplumber.open("E:\\600aaa_1.pdf") as pdf:page_count = len(pdf.pages)print(page_count)  # 得到頁數(shù)for page in pdf.pages:print('---------- 第[%d]頁 ----------' % page.page_number)for pdf_table in page.extract_tables(table_settings={"vertical_strategy": "text","horizontal_strategy": "lines","intersection_tolerance":20}): # 邊緣相交合并單元格大小# print(pdf_table)for row in pdf_table:# 去掉回車換行print([re.sub('\s+', '', cell) if cell is not None else None for cell in row])

查看全文

http://www.risenshineclean.com/news/60739.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

做水果網(wǎng)站行營銷推廣的形式包括

文章目錄

前言

一、pdfplumber模塊

1.1 pdfplumber的特點(diǎn)

1.2 pdfplumber.PDF類

1.3pdfplumber.Page類

二 pdfplumber的使用

2.1 加載PDF

2.2 pdfplumber.PDF 類

2.3 pdfplumber.Page 類

2.4 讀取PDF

2.5 讀取PDF文檔信息

2.6 查看總頁數(shù)

2.7 查看總頁數(shù)讀取第一頁的寬度，頁高等信息

2.8 讀取文本

2.9 讀取表格

3.1 pdfplumber提取表格數(shù)據(jù)

示例

讀取文字

讀取表格

相關(guān)文章：

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

文章目錄

前言

一、pdfplumber模塊

1.1 pdfplumber的特點(diǎn)

1.2 pdfplumber.PDF類

1.3pdfplumber.Page類

二 pdfplumber的使用

2.1 加載PDF

2.2 pdfplumber.PDF 類

2.3 pdfplumber.Page 類

2.4 讀取PDF

2.5 讀取PDF文檔信息

2.6 查看總頁數(shù)

2.7 查看總頁數(shù)讀取第一頁的寬度，頁高等信息

2.8 讀取文本

2.9 讀取表格

3.1 pdfplumber提取表格數(shù)據(jù)

示例

讀取文字

讀取表格

相關(guān)文章：

一、pdfplumber模塊