Weaviate

在这里插入图片描述


文章目录

    • 关于 Weaviate
      • 核心功能
      • 部署方式
      • 使用场景
    • 快速上手 (Python)
      • 1、创建 Weaviate 数据库
      • 2、安装
      • 3、连接到 Weaviate
      • 4、定义数据集
      • 5、添加对象
      • 6、查询
        • 1)Semantic search
        • 2) Semantic search with a filter
    • 使用示例
      • Similarity search
      • LLMs and search
      • Classification
      • Other use cases


关于 Weaviate

Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

  • 官网:https://weaviate.io
  • github : https://github.com/weaviate/weaviate
  • 官方文档:https://weaviate.io/developers/weaviate

核心功能

在这里插入图片描述


部署方式

Multiple deployment options are available to cater for different users and use cases.

All options offer vectorizer and RAG module integration.

在这里插入图片描述


使用场景

Weaviate is flexible and can be used in many contexts and scenarios.

在这里插入图片描述


快速上手 (Python)

参考:https://weaviate.io/developers/weaviate/quickstart


1、创建 Weaviate 数据库

你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例

方式如:https://weaviate.io/developers/wcs/quickstart

从WCS 的Details tab 拿到 API keyURL


2、安装

使用 v4 client, Weaviate 1.23.7 及以上:

pip install -U weaviate-client

使用 v3

pip install "weaviate-client==3.*"

3、连接到 Weaviate

使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup


运行以下代码:

V4

import weaviate
import weaviate.classes as wvc
import os
import requests
import jsonclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.finally:client.close()  # Close client gracefully

V3

import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network",  # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key}
)

4、定义数据集

Next, we define a data collection (a “class” in Weaviate) to store objects in.

This is analogous to creating a table in relational (SQL) databases.


The following code:

  • Configures a class object with:
    • Name Question
    • Vectorizer module text2vec-openai
    • Generative module generative-openai
  • Then creates the class.

V4

    questions = client.collections.create(name="Question",vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.generative_config=wvc.config.Configure.Generative.openai()  # Ensure the `generative-openai` module is used for generative queries)

V3

class_obj = {"class": "Question","vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also."moduleConfig": {"text2vec-openai": {},"generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries}
}client.schema.create_class(class_obj)

5、添加对象

You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the class to create a vector embedding for each object.


The above code:

  • Loads objects, and
  • Adds objects to the target class (Question) one by one.

V4

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')data = json.loads(resp.text)  # Load dataquestion_objs = list()for i, d in enumerate(data):question_objs.append({"answer": d["Answer"],"question": d["Question"],"category": d["Category"],})questions = client.collections.get("Question")questions.data.insert_many(question_objs)  # This uses batching under the hood

V3

import requests
import json
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)  # Load dataclient.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Initialize a batch processfor i, d in enumerate(data):  # Batch import dataprint(f"importing question: {i+1}")properties = {"answer": d["Answer"],"question": d["Question"],"category": d["Category"],}batch.add_data_object(data_object=properties,class_name="Question")

6、查询

1)Semantic search

Let’s start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.


V4

import weaviate
import weaviate.classes as wvc
import osclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2)print(response.objects[0].properties)  # Inspect the first objectfinally:client.close()  # Close client gracefully

V3

import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network",  # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key}
)response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_limit(2).do()
)print(json.dumps(response, indent=4))

结果如下

{"data": {"Get": {"Question": [{"answer": "DNA","category": "SCIENCE","question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"},{"answer": "Liver","category": "SCIENCE","question": "This organ removes excess glucose from the blood & stores it as glycogen"}]}}
}

2) Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:


V4

    questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2,filters=wvc.query.Filter.by_property("category").equal("ANIMALS"))print(response.objects[0].properties)  # Inspect the first object

V3

response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_where({"path": ["category"],"operator": "Equal","valueText": "ANIMALS"}).with_limit(2).do()
)print(json.dumps(response, indent=4))

结果如下:

{"data": {"Get": {"Question": [{"answer": "Elephant","category": "ANIMALS","question": "It's the only living mammal in the order Proboseidea"},{"answer": "the nose or snout","category": "ANIMALS","question": "The gavial looks very much like a crocodile except for this bodily feature"}]}}
}

更多可见:https://weaviate.io/developers/weaviate/quickstart


使用示例

This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.

If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.


Similarity search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#similarity-search

A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.

TitleDescriptionModalityCode
Plant searchSemantic search over plants.TextJavascript
Wine searchSemantic search over wines.TextPython
Book recommender system (Video, Demo)Find book recommendations based on search query.TextTypeScript
Movie recommender system (Blog)Find similar movies.TextJavascript
Multilingual Wikipedia SearchSearch through Wikipedia in multiple languages.TextTypeScript
Podcast searchSemantic search over podcast episodes.TextPython
Video Caption SearchFind the timestamp of the answer to your question in a video.TextPython
Facial RecognitionIdentify people in imagesImagePython
Image Search over dogs (Blog)Find images of similar dog breeds based on uploaded image.ImagePython
Text to image searchFind images most similar to a text query.MultimodalJavascript
Text to image and image to image searchFind images most similar to a text or image query.MultimodalPython

LLMs and search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#llms-and-search

Vector databases and LLMs go together like cookies and milk!

Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.

TitleDescriptionModalityCode
Verba, the golden RAGtriever (Video, Demo)Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts.TextPython
HealthSearch (Blog, Demo)Recommendation system of health products based on symptoms.TextPython
Magic ChatSearch through Magic The Gathering cardsTextPython
AirBnB Listings (Blog)Generation of customized advertisements for AirBnB listings with Generative Feedback LoopsTextPython
DistyllSummarize text or video content.TextPython

Learn more in our LLMs and Search blog post.


Classification

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#classification

Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.

TitleDescriptionModalityCode
Toxic Comment ClassificationClasify whether a comment is toxic or non-toxic.TextPython
Audio Genre ClassificationClassify the music genre of an audio file.ImagePython

Other use cases

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#other-use-cases

Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.

TitleDescriptionCode
Named Entity Recognition (NER)tbdPython

2024-03-27(三)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/571754.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

2024 年 15 个最佳自动化 UI 测试工具【建议收藏】

Web 开发行业正在不断发展,许多最佳自动化 UI 测试工具可用于测试基于 Web 的项目,以确保它没有错误并且每个用户都可以轻松访问。这些工具可帮助您测试您的 Web 项目并使其完全兼容用户端的要求和需求。 UI 自动化测试工具可测试基于 Web 的应用程序或软…

【Java多线程】2——synchronized底层原理

2 synchronized底层原理 ⭐⭐⭐⭐⭐⭐ Github主页👉https://github.com/A-BigTree 笔记仓库👉https://github.com/A-BigTree/tree-learning-notes 个人主页👉https://www.abigtree.top ⭐⭐⭐⭐⭐⭐ 如果可以,麻烦各位看官顺手点…

淘宝APP详情数据抓取技术揭秘:用Python实现自动化数据获取(附代码实例)

获取淘宝APP详情数据接口通常涉及到网络爬虫技术,因为淘宝作为一个大型电商平台,其数据并不直接对外公开提供API接口供第三方开发者使用。然而,通过模拟浏览器行为或使用淘宝开放平台提供的API(如果有的话)&#xff0c…

SpringBoot3的RabbitMQ消息服务

目录 预备工作和配置 1.发送消息 实现类 控制层 效果 2.收消息 3.异步读取 效果 4.Work queues --工作队列模式 创建队列text2 实体类 效果 5.Subscribe--发布订阅模式 效果 6.Routing--路由模式 效果 7.Topics--通配符模式 效果 异步处理、应用解耦、流量削…

Java八股文(SpringCloud Alibaba)

Java八股文のSpringCloud Alibaba SpringCloud Alibaba SpringCloud Alibaba Spring Cloud Alibaba与Spring Cloud有什么区别? Spring Cloud Alibaba是Spring Cloud的衍生版本,它是由Alibaba开发和维护的,相比于Spring Cloud,它在…

JavaScript Uncaught ReferenceError: WScript is not defined

项目场景: 最近在Visual Studio 2019上编译libmodbus库,出现了很多问题,一一解决特此记录下来。 问题描述 首先就是configure.js文件的问题,它会生成两个很重要的头文件modbus_version.h和config.h,这两个头文件其中…

如何使用Docker轻松构建和管理应用程序(二)

上一篇文章介绍了 Docker 基本概念,其中镜像、容器和 Dockerfile 。我们使用 Dockerfile 定义镜像,依赖镜像来运行容器,因此 Dockerfile 是镜像和容器的关键,Dockerfile 可以非常容易的定义镜像内容,同时在我们后期的微…

day53 动态规划part10

121. 买卖股票的最佳时机 简单 给定一个数组 prices ,它的第 i 个元素 prices[i] 表示一支给定股票第 i 天的价格。 你只能选择 某一天 买入这只股票,并选择在 未来的某一个不同的日子 卖出该股票。设计一个算法来计算你所能获取的最大利润。 返回你可…

【Redis】Redis 介绍Redis 为什么这么快?Redis数据结构Redis 和Memcache区别 ?为何Redis单线程效率也高?

目录 Redis 介绍 Redis 为什么这么快? Redis数据结构 Redis 和Memcache区别 ? 为何Redis单线程效率也高? Redis 介绍 Redis 是一个开源(BSD 许可)、基于内存、支持多种数据结构的存储系统,可以作为数据…

如何本地部署Elasticsearch+cpolar实现公网查询与管理内网数据

文章目录 系统环境1. Windows 安装Elasticsearch2. 本地访问Elasticsearch3. Windows 安装 Cpolar4. 创建Elasticsearch公网访问地址5. 远程访问Elasticsearch6. 设置固定二级子域名 正文开始前给大家推荐个网站,前些天发现了一个巨牛的 人工智能学习网站&#xff…

音视频处理 - 音频概念详解,码率,采样率,位深度,声道,编码

1. 音频采样 与视频不同,音频的最小单位不是一帧,而是一个采样。 采样是当前一刻声音的声音样本,样本需要经过数字转换才能存储为样本数据。 真实声音是连续的,但是在计算机中,声音是离散且均匀的声音样本。 2. 位深…

电阻的妙用:限流、分压、滤波,助力电路设计!

电阻可以降低电压,这是通过电阻的分压来实现的。事实上,利用电阻来降低电压只是电阻的多种功能之一。电路中的电阻与其他元件(电容、电感)结合用于限流、滤波等。(本文素材来源:https://www.icdhs.com/news…