當(dāng)前位置：首頁 > news >正文

做網(wǎng)站的目標(biāo)是什么福鼎網(wǎng)站優(yōu)化公司

news 2025/7/7 5:52:59

做網(wǎng)站的目標(biāo)是什么,福鼎網(wǎng)站優(yōu)化公司,wordpress 會(huì)話過期,寧夏企業(yè)網(wǎng)站建設(shè)🚩🚩🚩Hugging Face 實(shí)戰(zhàn)系列總目錄有任何問題歡迎在下面留言本篇文章的代碼運(yùn)行界面均在PyCharm中進(jìn)行本篇文章配套的代碼資源已經(jīng)上傳從零構(gòu)建屬于自己的GPT系列1：數(shù)據(jù)預(yù)處理從零構(gòu)建屬于自己的GPT系列2：模型訓(xùn)…

🚩🚩🚩Hugging Face 實(shí)戰(zhàn)系列總目錄

有任何問題歡迎在下面留言
本篇文章的代碼運(yùn)行界面均在PyCharm中進(jìn)行
本篇文章配套的代碼資源已經(jīng)上傳

從零構(gòu)建屬于自己的GPT系列1：數(shù)據(jù)預(yù)處理
從零構(gòu)建屬于自己的GPT系列2：模型訓(xùn)練1
從零構(gòu)建屬于自己的GPT系列3：模型訓(xùn)練2
從零構(gòu)建屬于自己的GPT系列4：模型訓(xùn)練3
從零構(gòu)建屬于自己的GPT系列5：模型部署1
從零構(gòu)建屬于自己的GPT系列6：模型部署2

1 前端環(huán)境安裝

安裝：

pip install streamlit

測(cè)試：

streamlit hello

安裝完成后，測(cè)試后打印的信息
在這里插入圖片描述

(Pytorch) C:\Users\admin>streamlit hello
Welcome to Streamlit. Check out our demo in your browser.
Local URL: http://localhost:8501 Network URL:
http://192.168.1.187:8501
Ready to create your own Python apps super quickly? Head over to
https://docs.streamlit.io
May you create awesome apps!

接著會(huì)自動(dòng)的彈出一個(gè)頁面
在這里插入圖片描述

2 模型加載函數(shù)

這個(gè)函數(shù)把模型加載進(jìn)來，并且設(shè)置成推理模式

def get_model(device, model_path):tokenizer = CpmTokenizer(vocab_file="vocab/chinese_vocab.model")eod_id = tokenizer.convert_tokens_to_ids("<eod>")  # 文檔結(jié)束符sep_id = tokenizer.sep_token_idunk_id = tokenizer.unk_token_idmodel = GPT2LMHeadModel.from_pretrained(model_path)model.to(device)model.eval()return tokenizer, model, eod_id, sep_id, unk_id

模型加載函數(shù)，加載設(shè)備cuda，已經(jīng)訓(xùn)練好的模型的路徑
加載tokenizer 文件
結(jié)束特殊字符
分隔特殊字符
未知詞特殊字符
加載模型
模型進(jìn)入GPU
開啟推理模式
返回參數(shù)

device_ids = 0
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICE"] = str(device_ids)
device = torch.device("cuda" if torch.cuda.is_available() and int(device_ids) >= 0 else "cpu")
tokenizer, model, eod_id, sep_id, unk_id = get_model(device, "model/zuowen_epoch40")

指定第一個(gè)顯卡
設(shè)置確保 CUDA 設(shè)備的編號(hào)與 PCI 位置相匹配，使得 CUDA 設(shè)備的編號(hào)更加一致且可預(yù)測(cè)
通過設(shè)置為 str(device_ids)（在這個(gè)案例中為 ‘0’），指定了進(jìn)程只能看到并使用編號(hào)為 0 的 GPU
有GPU用GPU作為加載設(shè)備，否則用CPU
調(diào)用get_model函數(shù)，加載模型

3 文本生成函數(shù)

對(duì)于給定的上文，生成下一個(gè)單詞

def generate_next_token(input_ids,args):input_ids = input_ids[:, -200:]outputs = model(input_ids=input_ids)logits = outputs.logitsnext_token_logits = logits[0, -1, :]next_token_logits = next_token_logits / args.temperaturenext_token_logits[unk_id] = -float('Inf')filtered_logits = top_k_top_p_filtering(next_token_logits, top_k=args.top_k, top_p=args.top_p)next_token_id = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=1)return next_token_id

對(duì)輸入進(jìn)行一個(gè)截?cái)嗖僮?#xff0c;相當(dāng)于對(duì)輸入長度進(jìn)行了限制
通過模型得到預(yù)測(cè)，得到輸出，預(yù)測(cè)的一個(gè)詞一個(gè)詞進(jìn)行預(yù)測(cè)的
得到預(yù)測(cè)的結(jié)果值
next_token_logits表示最后一個(gè)token的hidden_state對(duì)應(yīng)的prediction_scores,也就是模型要預(yù)測(cè)的下一個(gè)token的概率
溫度表示讓結(jié)果生成具有多樣性
設(shè)置預(yù)測(cè)的結(jié)果不可以未知字（詞）的Token，防止出現(xiàn)異常的東西
通過top_k_top_p_filtering（）函數(shù)對(duì)預(yù)測(cè)結(jié)果進(jìn)行篩選
通過預(yù)測(cè)值轉(zhuǎn)換為概率，得到實(shí)際的Token ID
返回結(jié)果

每次都是通過這種方式預(yù)測(cè)出下一個(gè)詞是什么

4 多文本生成函數(shù)

到這里就不止是預(yù)測(cè)下一個(gè)詞了，要不斷的預(yù)測(cè)

def predict_one_sample(model, tokenizer, device, args, title, context):title_ids = tokenizer.encode(title, add_special_tokens=False)context_ids = tokenizer.encode(context, add_special_tokens=False)input_ids = title_ids + [sep_id] + context_idscur_len = len(input_ids)last_token_id = input_ids[-1]  input_ids = torch.tensor([input_ids], dtype=torch.long, device=device)while True:next_token_id = generate_next_token(input_ids,args)input_ids = torch.cat((input_ids, next_token_id.unsqueeze(0)), dim=1)cur_len += 1word = tokenizer.convert_ids_to_tokens(next_token_id.item())if cur_len >= args.generate_max_len and last_token_id == 8 and next_token_id == 3:breakif cur_len >= args.generate_max_len and word in [".", "。", "！", "!", "?", "？", ",", "，"]:breakif next_token_id == eod_id:breakresult = tokenizer.decode(input_ids.squeeze(0))content = result.split("<sep>")[1]  # 生成的最終內(nèi)容return content

預(yù)測(cè)一個(gè)樣本的函數(shù)
從用戶獲得輸入標(biāo)題轉(zhuǎn)化為Token ID
從用戶獲得輸入正文轉(zhuǎn)化為Token ID
標(biāo)題和正文連接到一起
獲取輸入長度
獲取已經(jīng)生成的內(nèi)容的最后一個(gè)元素
把輸入數(shù)據(jù)轉(zhuǎn)化為Tensor
while循環(huán)
通過生成函數(shù)生成下一個(gè)詞的token id
把新生成的token id加到原本的數(shù)據(jù)中（原本有5個(gè)詞，預(yù)測(cè)出第6個(gè)詞，將第6個(gè)詞和原來的5個(gè)詞進(jìn)行拼接）
輸入長度增加1
將一個(gè) token ID 轉(zhuǎn)換回其對(duì)應(yīng)的文本 token
如果超過最大長度并且生成換行符
停止生成
如果超過最大長度并且生成標(biāo)點(diǎn)符號(hào)
停止生成
如果生成了結(jié)束符
停止生成
將Token ID轉(zhuǎn)化為文本
將生成的文本按照分隔符進(jìn)行分割
返回生成的內(nèi)容

查看全文

http://www.risenshineclean.com/news/23311.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

做網(wǎng)站的目標(biāo)是什么福鼎網(wǎng)站優(yōu)化公司

🚩🚩🚩Hugging Face 實(shí)戰(zhàn)系列總目錄

1 前端環(huán)境安裝

2 模型加載函數(shù)

3 文本生成函數(shù)

4 多文本生成函數(shù)

相關(guān)文章：

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

🚩🚩🚩Hugging Face 實(shí)戰(zhàn)系列 總目錄

1 前端環(huán)境安裝

2 模型加載函數(shù)

3 文本生成函數(shù)

4 多文本生成函數(shù)

相關(guān)文章：

🚩🚩🚩Hugging Face 實(shí)戰(zhàn)系列總目錄