文章目录
- 关于 Weaviate
- 核心功能
- 部署方式
- 使用场景
- 快速上手 (Python)
- 1、创建 Weaviate 数据库
- 2、安装
- 3、连接到 Weaviate
- 4、定义数据集
- 5、添加对象
- 6、查询
- 1)Semantic search
- 2) Semantic search with a filter
- 使用示例
- Similarity search
- LLMs and search
- Classification
- Other use cases
关于 Weaviate
Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.
- 官网:https://weaviate.io
- github : https://github.com/weaviate/weaviate
- 官方文档:https://weaviate.io/developers/weaviate
核心功能
部署方式
Multiple deployment options are available to cater for different users and use cases.
All options offer vectorizer and RAG module integration.
使用场景
Weaviate is flexible and can be used in many contexts and scenarios.
快速上手 (Python)
参考:https://weaviate.io/developers/weaviate/quickstart
1、创建 Weaviate 数据库
你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例
方式如:https://weaviate.io/developers/wcs/quickstart
从WCS 的Details
tab 拿到 API key 和 URL。
2、安装
使用 v4 client, Weaviate 1.23.7
及以上:
pip install -U weaviate-client
使用 v3
pip install "weaviate-client==3.*"
3、连接到 Weaviate
使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup
运行以下代码:
V4
import weaviate
import weaviate.classes as wvc
import os
import requests
import jsonclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"] # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.finally:client.close() # Close client gracefully
V3
import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network", # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Replace with your inference API key}
)
4、定义数据集
Next, we define a data collection (a “class” in Weaviate) to store objects in.
This is analogous to creating a table in relational (SQL) databases.
The following code:
- Configures a class object with:
- Name
Question
- Vectorizer module
text2vec-openai
- Generative module
generative-openai
- Name
- Then creates the class.
V4
questions = client.collections.create(name="Question",vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(), # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.generative_config=wvc.config.Configure.Generative.openai() # Ensure the `generative-openai` module is used for generative queries)
V3
class_obj = {"class": "Question","vectorizer": "text2vec-openai", # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also."moduleConfig": {"text2vec-openai": {},"generative-openai": {} # Ensure the `generative-openai` module is used for generative queries}
}client.schema.create_class(class_obj)
5、添加对象
You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.
The guide covers using the vectorizer
defined for the class to create a vector embedding for each object.
The above code:
- Loads objects, and
- Adds objects to the target class (
Question
) one by one.
V4
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')data = json.loads(resp.text) # Load dataquestion_objs = list()for i, d in enumerate(data):question_objs.append({"answer": d["Answer"],"question": d["Question"],"category": d["Category"],})questions = client.collections.get("Question")questions.data.insert_many(question_objs) # This uses batching under the hood
V3
import requests
import json
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text) # Load dataclient.batch.configure(batch_size=100) # Configure batch
with client.batch as batch: # Initialize a batch processfor i, d in enumerate(data): # Batch import dataprint(f"importing question: {i+1}")properties = {"answer": d["Answer"],"question": d["Question"],"category": d["Category"],}batch.add_data_object(data_object=properties,class_name="Question")
6、查询
1)Semantic search
Let’s start with a similarity search. A nearText
search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.
Run the following code to search for objects whose vectors are most similar to that of biology
.
V4
import weaviate
import weaviate.classes as wvc
import osclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"] # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2)print(response.objects[0].properties) # Inspect the first objectfinally:client.close() # Close client gracefully
V3
import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network", # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"), # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY" # Replace with your inference API key}
)response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_limit(2).do()
)print(json.dumps(response, indent=4))
结果如下
{"data": {"Get": {"Question": [{"answer": "DNA","category": "SCIENCE","question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"},{"answer": "Liver","category": "SCIENCE","question": "This organ removes excess glucose from the blood & stores it as glycogen"}]}}
}
2) Semantic search with a filter
You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:
V4
questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2,filters=wvc.query.Filter.by_property("category").equal("ANIMALS"))print(response.objects[0].properties) # Inspect the first object
V3
response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_where({"path": ["category"],"operator": "Equal","valueText": "ANIMALS"}).with_limit(2).do()
)print(json.dumps(response, indent=4))
结果如下:
{"data": {"Get": {"Question": [{"answer": "Elephant","category": "ANIMALS","question": "It's the only living mammal in the order Proboseidea"},{"answer": "the nose or snout","category": "ANIMALS","question": "The gavial looks very much like a crocodile except for this bodily feature"}]}}
}
更多可见:https://weaviate.io/developers/weaviate/quickstart
使用示例
This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.
If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.
Similarity search
https://weaviate.io/developers/weaviate/more-resources/example-use-cases#similarity-search
A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.
Title | Description | Modality | Code |
---|---|---|---|
Plant search | Semantic search over plants. | Text | Javascript |
Wine search | Semantic search over wines. | Text | Python |
Book recommender system (Video, Demo) | Find book recommendations based on search query. | Text | TypeScript |
Movie recommender system (Blog) | Find similar movies. | Text | Javascript |
Multilingual Wikipedia Search | Search through Wikipedia in multiple languages. | Text | TypeScript |
Podcast search | Semantic search over podcast episodes. | Text | Python |
Video Caption Search | Find the timestamp of the answer to your question in a video. | Text | Python |
Facial Recognition | Identify people in images | Image | Python |
Image Search over dogs (Blog) | Find images of similar dog breeds based on uploaded image. | Image | Python |
Text to image search | Find images most similar to a text query. | Multimodal | Javascript |
Text to image and image to image search | Find images most similar to a text or image query. | Multimodal | Python |
LLMs and search
https://weaviate.io/developers/weaviate/more-resources/example-use-cases#llms-and-search
Vector databases and LLMs go together like cookies and milk!
Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.
Title | Description | Modality | Code |
---|---|---|---|
Verba, the golden RAGtriever (Video, Demo) | Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts. | Text | Python |
HealthSearch (Blog, Demo) | Recommendation system of health products based on symptoms. | Text | Python |
Magic Chat | Search through Magic The Gathering cards | Text | Python |
AirBnB Listings (Blog) | Generation of customized advertisements for AirBnB listings with Generative Feedback Loops | Text | Python |
Distyll | Summarize text or video content. | Text | Python |
Learn more in our LLMs and Search blog post.
Classification
https://weaviate.io/developers/weaviate/more-resources/example-use-cases#classification
Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.
Title | Description | Modality | Code |
---|---|---|---|
Toxic Comment Classification | Clasify whether a comment is toxic or non-toxic. | Text | Python |
Audio Genre Classification | Classify the music genre of an audio file. | Image | Python |
Other use cases
https://weaviate.io/developers/weaviate/more-resources/example-use-cases#other-use-cases
Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.
Title | Description | Code |
---|---|---|
Named Entity Recognition (NER) | tbd | Python |
2024-03-27(三)