Guides
Store & query vectors
֍
Store and query vectors using Substrate's built-in vector storage

Substrate comes with built-in vector storage, which you can use to store and query generated embeddings. It's performant, colocated with the rest of your workload, and much more cost-effective than alternative vector database providers, like Pinecone, Supabase, or Weaviate.

This guide embeds a set of common phrases used to enhance image generation prompts – and then queries the vector store, recommending phrases to enhance a given prompt.

To create a new vector store, use FindOrCreateVectorStore. Give your store a collection_name, and specify the embedding model.

Learn more: embedding models
  • jina-v2 is a popular model for embedding text.
  • clip is a popular model for embedding text and images.
Python
TypeScript

create = FindOrCreateVectorStore(
collection_name="image_prompt_enhancements",
model="jina-v2",
);
create_res = substrate.run(create)

To embed data, use EmbedText. Provide the data to embed, the collection_name, and the embedding model. Below, we create an array of embedding nodes – Substrate automatically runs these nodes in parallel because they have no upstream dependencies.

Python
TypeScript

enhancements = [
"highly detailed",
# ...
"sharp focus",
]
nodes = []
for e in enhancements:
embed = EmbedText(
text=e,
collection_name="image_prompt_enhancements",
model="jina-v2",
)
nodes.append(embed)
embed_res = substrate.run(*nodes)

To query a vector store, use QueryVectorStore. Provide the query string, collection_name, and embedding model.

Learn more: QueryVectorStore parameters
  • Set include_metadata to True to include metadata in the response. The metadata includes the embedded text in the doc field.
  • Set top_k to 3 to retrieve only the top 3 most similar results.
  • To query images using a multimodal embedding model like clip, provide query_image_uris.
  • Multiple queries can be run in a batch – simply pass multiple query strings or images.
Python
TypeScript

query = QueryVectorStore(
query_strings=["a towering shell the size of a city skyscraper"],
collection_name="image_prompt_enhancements",
model="jina-v2",
include_metadata=True,
top_k=3,
)
query_res = substrate.run(query)
query_out = query_res.get(query)

The output of QueryVectorStore has query results in the results field, which is a list of lists. In this example, it contains a single list of results. If we instead provided two query_strings, it would contain two lists of results, one for each query string.

Output

{
"results": [
[
{
"id": "079ee5765c8c4df98b50bdb7b5cbdd29",
"distance": -0.723642945289612,
"vector": null,
"metadata": {
"doc": "cell shaded cartoon",
"doc_id": "079ee5765c8c4df98b50bdb7b5cbdd29"
}
},
{
"id": "98ec8bb1da1243d88721645fc0a8899b",
"distance": -0.717301785945892,
"vector": null,
"metadata": {
"doc": "cinematic",
"doc_id": "98ec8bb1da1243d88721645fc0a8899b"
}
},
{
"id": "158f2fc695e648878d245fdf93fa2917",
"distance": -0.715586066246033,
"vector": null,
"metadata": {
"doc": "wide shot",
"doc_id": "158f2fc695e648878d245fdf93fa2917"
}
}
]
],
"collection_name": null,
"model": "jina-v2",
"metric": "inner"
}