关于 Weaviate

Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

  • 官网:https://weaviate.io
  • github : https://github.com/weaviate/weaviate
  • 官方文档:https://weaviate.io/developers/weaviate




Multiple deployment options are available to cater for different users and use cases.

All options offer vectorizer and RAG module integration.



Weaviate is flexible and can be used in many contexts and scenarios.


快速上手 (Python)


1、创建 Weaviate 数据库

你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例


从WCS 的Details tab 拿到 API keyURL


使用 v4 client, Weaviate 1.23.7 及以上:

pip install -U weaviate-client

3、连接到 Weaviate

使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup



import weaviate
import weaviate.classes as wvc
import os
import requests
import jsonclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.finally:client.close()  # Close client gracefully


Next, we define a data collection (a “class” in Weaviate) to store objects in.

This is analogous to creating a table in relational (SQL) databases.

The following code:

  • Configures a class object with:
    • Name Question
    • Vectorizer module text2vec-openai
    • Generative module generative-openai
  • Then creates the class.


    questions = client.collections.create(name="Question",vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.generative_config=wvc.config.Configure.Generative.openai()  # Ensure the `generative-openai` module is used for generative queries)


class_obj = {"class": "Question","vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also."moduleConfig": {"text2vec-openai": {},"generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries}


You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the class to create a vector embedding for each object.

The above code:

  • Loads objects, and
  • Adds objects to the target class (Question) one by one.


    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')data = json.loads(resp.text)  # Load dataquestion_objs = list()for i, d in enumerate(data):question_objs.append({"answer": d["Answer"],"question": d["Question"],"category": d["Category"],})questions = client.collections.get("Question")questions.data.insert_many(question_objs)  # This uses batching under the hood


1)Semantic search

Let’s start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.


import weaviate
import weaviate.classes as wvc
import osclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2)print(response.objects[0].properties)  # Inspect the first objectfinally:client.close()  # Close client gracefully


{"data": {"Get": {"Question": [{"answer": "DNA","category": "SCIENCE","question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"},{"answer": "Liver","category": "SCIENCE","question": "This organ removes excess glucose from the blood & stores it as glycogen"}]}}

2) Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:


    questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2,filters=wvc.query.Filter.by_property("category").equal("ANIMALS"))print(response.objects[0].properties)  # Inspect the first object


{"data": {"Get": {"Question": [{"answer": "Elephant","category": "ANIMALS","question": "It's the only living mammal in the order Proboseidea"},{"answer": "the nose or snout","category": "ANIMALS","question": "The gavial looks very much like a crocodile except for this bodily feature"}]}}



This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.

If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.

Similarity search


A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.

Plant searchSemantic search over plants.TextJavascript
Wine searchSemantic search over wines.TextPython
Book recommender system (Video, Demo)Find book recommendations based on search query.TextTypeScript
Movie recommender system (Blog)Find similar movies.TextJavascript
Multilingual Wikipedia SearchSearch through Wikipedia in multiple languages.TextTypeScript
Podcast searchSemantic search over podcast episodes.TextPython
Video Caption SearchFind the timestamp of the answer to your question in a video.TextPython
Facial RecognitionIdentify people in imagesImagePython
Image Search over dogs (Blog)Find images of similar dog breeds based on uploaded image.ImagePython
Text to image searchFind images most similar to a text query.MultimodalJavascript
Text to image and image to image searchFind images most similar to a text or image query.MultimodalPython

LLMs and search


Vector databases and LLMs go together like cookies and milk!

Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.

Verba, the golden RAGtriever (Video, Demo)Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts.TextPython
HealthSearch (Blog, Demo)Recommendation system of health products based on symptoms.TextPython
Magic ChatSearch through Magic The Gathering cardsTextPython
AirBnB Listings (Blog)Generation of customized advertisements for AirBnB listings with Generative Feedback LoopsTextPython
DistyllSummarize text or video content.TextPython

Learn more in our LLMs and Search blog post.



Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.

Toxic Comment ClassificationClasify whether a comment is toxic or non-toxic.TextPython
Audio Genre ClassificationClassify the music genre of an audio file.ImagePython

Other use cases


Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.

Named Entity Recognition (NER)tbdPython





