當(dāng)前位置：首頁(yè) > news >正文

赤峰做企業(yè)網(wǎng)站公司企業(yè)網(wǎng)站建設(shè)方案策劃

news 2025/7/7 2:08:16

赤峰做企業(yè)網(wǎng)站公司,企業(yè)網(wǎng)站建設(shè)方案策劃,網(wǎng)站重新建設(shè)的申請(qǐng),如何給自己做的網(wǎng)站留后門文章目錄前言self querying 簡(jiǎn)介代碼實(shí)現(xiàn)總結(jié) 前言現(xiàn)在比較流行的 RAG 檢索就是通過(guò)大模型 embedding 算法將數(shù)據(jù)嵌入向量數(shù)據(jù)庫(kù)中，然后在將用戶的查詢向量化，從向量數(shù)據(jù)庫(kù)中召回相似性數(shù)據(jù)，構(gòu)造成 context template, 放到 LLM 中進(jìn)行查詢…

文章目錄

前言
self querying 簡(jiǎn)介
代碼實(shí)現(xiàn)
總結(jié)

前言

現(xiàn)在比較流行的 RAG 檢索就是通過(guò)大模型 embedding 算法將數(shù)據(jù)嵌入向量數(shù)據(jù)庫(kù)中，然后在將用戶的查詢向量化，從向量數(shù)據(jù)庫(kù)中召回相似性數(shù)據(jù)，構(gòu)造成 context template, 放到 LLM 中進(jìn)行查詢。

如果說(shuō)將用戶的查詢語(yǔ)句直接轉(zhuǎn)換為向量查詢可能并不會(huì)得到很好的結(jié)果，比如說(shuō)我們往向量數(shù)據(jù)庫(kù)中存入了一些商品向量，現(xiàn)在用戶說(shuō)：“我想要一條價(jià)格低于20塊的黑色羊毛衫”，如果使用傳統(tǒng)的嵌入算法，該查詢語(yǔ)句轉(zhuǎn)換為向量查詢就可能“失幀”，被轉(zhuǎn)換為查詢黑色羊毛衫。

針對(duì)這種情況我們就會(huì)使用一些優(yōu)化檢索查詢語(yǔ)句方式來(lái)優(yōu)化 RAG 查詢，其中 langchain 的 self-querying 就是一種很好的方式，這里使用阿里云的 DashVector 向量數(shù)據(jù)庫(kù)和 DashScope LLM 來(lái)進(jìn)行嘗試，優(yōu)化后的查詢效果還是挺不錯(cuò)的。

現(xiàn)在很多網(wǎng)上的資料都是使用 OpenAI 的 Embedding 和 LLM，但是個(gè)人角色現(xiàn)在國(guó)內(nèi)阿里的 LLM 和向量數(shù)據(jù)庫(kù)已經(jīng)非常好了，而且 OpenAI 已經(jīng)禁用了國(guó)內(nèi)的 API 調(diào)用，國(guó)內(nèi)的云服務(wù)又便宜又好用，真的不嘗試一下么？關(guān)于 DashVector 和 DashScope 我之前寫了幾篇實(shí)踐篇，大家感興趣的可以參考下：

LLM-文本分塊（langchain）與向量化(阿里云DashVector)存儲(chǔ)，嵌入LLM實(shí)踐
LLM-阿里云 DashVector + ModelScope 多模態(tài)向量化實(shí)時(shí)文本搜圖實(shí)戰(zhàn)總結(jié)
LLM-langchain 與阿里 DashScop (通義千問(wèn)大模型) 和 DashVector（向量數(shù)據(jù)庫(kù)）結(jié)合使用總結(jié)

前提條件

確保開通了通義千問(wèn) API key 和向量檢索服務(wù) API KEY
安裝依賴：
pip install langchain
pip install langchain-community
pip install dashVector
pip install dashscope

self querying 簡(jiǎn)介

簡(jiǎn)單來(lái)說(shuō)就是通過(guò) self-querying 的方式我們可以將用戶的查詢語(yǔ)句進(jìn)行結(jié)構(gòu)化轉(zhuǎn)換，轉(zhuǎn)換為包含兩層意思的向量化數(shù)據(jù)：

Query: 和查詢語(yǔ)義相近的向量查詢
Filter: 關(guān)于查詢內(nèi)容的一些 metadata 數(shù)據(jù)

比如說(shuō)上圖中用戶輸入：“bar 說(shuō)了關(guān)于 foo 的什么東西？”，self-querying 結(jié)構(gòu)化轉(zhuǎn)換后就變?yōu)榱藘蓪雍x：

查詢關(guān)于 foo 的數(shù)據(jù)
其中作者為 bar

代碼實(shí)現(xiàn)

將DASHSCOPE_API_KEY, DASHVECTOR_API_KEY, DASHVECTOR_ENDPOINT替換為自己在阿里云開通的。

import osfrom langchain_core.documents import Document
from langchain_community.vectorstores.dashvector import DashVector
from langchain_community.embeddings.dashscope import DashScopeEmbeddings
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_community.chat_models.tongyi import ChatTongyi
from langchain_core.vectorstores import VectorStoreclass SelfQuerying:def __init__(self):# 我們需要同時(shí)開通 DASHSCOPE_API_KEY 和 DASHVECTOR_API_KEYos.environ["DASHSCOPE_API_KEY"] = ""os.environ["DASHVECTOR_API_KEY"] = ""os.environ["DASHVECTOR_ENDPOINT"] = ""self.llm = ChatTongyi(temperature=0)def handle_embeddings(self)->'VectorStore':docs = [Document(page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},),Document(page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},),Document(page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},),Document(page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},),Document(page_content="Toys come alive and have a blast doing so",metadata={"year": 1995, "genre": "animated"},),Document(page_content="Three men walk into the Zone, three men walk out of the Zone",metadata={"year": 1979,"director": "Andrei Tarkovsky","genre": "thriller","rating": 9.9,},),]# 指定向量數(shù)據(jù)庫(kù)中的 Collection namevectorstore = DashVector.from_documents(docs, DashScopeEmbeddings(), collection_name="langchain")return vectorstoredef build_querying_retriever(self, vectorstore: 'VectorStore', enable_limit: bool=False)->'SelfQueryRetriever':"""構(gòu)造優(yōu)化檢索:param vectorstore: 向量數(shù)據(jù)庫(kù):param enable_limit: 是否查詢 Top k:return:"""metadata_field_info = [AttributeInfo(name="genre",description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",type="string",),AttributeInfo(name="year",description="The year the movie was released",type="integer",),AttributeInfo(name="director",description="The name of the movie director",type="string",),AttributeInfo(name="rating", description="A 1-10 rating for the movie", type="float"),]document_content_description = "Brief summary of a movie"retriever = SelfQueryRetriever.from_llm(self.llm,vectorstore,document_content_description,metadata_field_info,enable_limit=enable_limit)return retrieverdef handle_query(self, query: str):"""返回優(yōu)化查詢后的檢索結(jié)果:param query::return:"""# 使用 LLM 優(yōu)化查詢向量，構(gòu)造優(yōu)化后的檢索retriever = self.build_querying_retriever(self.handle_embeddings())response = retriever.invoke(query)return responseif __name__ == '__main__':q = SelfQuerying()# 只通過(guò)查詢屬性過(guò)濾print(q.handle_query("I want to watch a movie rated higher than 8.5"))# 通過(guò)查詢屬性和查詢語(yǔ)義內(nèi)容過(guò)濾print(q.handle_query("Has Greta Gerwig directed any movies about women"))# 復(fù)雜過(guò)濾查詢print(q.handle_query("What's a highly rated (above 8.5) science fiction film?"))# 復(fù)雜語(yǔ)義和過(guò)濾查詢print(q.handle_query("What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"))

上邊的代碼主要步驟有三步：

執(zhí)行 embedding，將帶有 metadata 的 Doc 嵌入 DashVector
構(gòu)造 self-querying retriever，需要預(yù)先提供一些關(guān)于我們的文檔支持的元數(shù)據(jù)字段的信息以及文檔內(nèi)容的簡(jiǎn)短描述。
執(zhí)行查詢語(yǔ)句

執(zhí)行代碼輸出查詢內(nèi)容如下：

# "I want to watch a movie rated higher than 8.5"
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year': 1979}),Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006})]# "Has Greta Gerwig directed any movies about women"
[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019})]# "What's a highly rated (above 8.5) science fiction film?"
[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019})]# "What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

總結(jié)

本文主要講了如何使用 langchain 的 self-query 來(lái)優(yōu)化向量檢索，我們使用的是阿里云的 DashVector 和 DashScope LLM 進(jìn)行的代碼演示，讀者可以開通下，體驗(yàn)嘗試一下。

查看全文

http://www.risenshineclean.com/news/44378.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

赤峰做企業(yè)網(wǎng)站公司企業(yè)網(wǎng)站建設(shè)方案策劃

文章目錄

前言

self querying 簡(jiǎn)介

代碼實(shí)現(xiàn)

總結(jié)

相關(guān)文章：