Introduction

Substrate is a powerful SDK for building with AI, with batteries included (opens in a new tab): language models, image generation, built-in vector storage, sandboxed code execution, and more. To use Substrate, you simply connect tasks, and then run the workflow. With this simple approach, we can create AI systems (from RAG, to agents, to multi-modal generative experiences) by simply describing the computation, with zero additional abstractions.

Substrate is also a workflow execution and inference engine, optimized for running compound AI workloads. Wiring together multiple inference APIs is inherently slow – whether you do it yourself, or use a framework like LangChain. Substrate lets you ditch the framework, write less code, and run compound AI fast.

Why Substrate?

1. Simple abstractions that run fast.

Principled API design runs deep in our DNA: Ben (opens in a new tab) was at Stripe for nearly a decade. At Substrate, we believe the new discipline of building AI-integrated software (opens in a new tab) needs simpler abstractions. Compare the cognitive overhead (opens in a new tab) of using Substrate vs. LangChain (opens in a new tab) in this example:

Substrate
LangChain

from substrate import ComputeText, Substrate, sb
s = Substrate(api_key="SUBSTRATE_API_KEY")
topic1 = "a magical forest"
topic2 = "a futuristic city"
story1 = ComputeText(prompt=f"Tell me a story about {topic1}")
story2 = ComputeText(prompt=f"Tell me a story about {topic2}")
summary = ComputeText(prompt=sb.format(
"Summarize these two stories:\nStory 1: {story1}\nStory 2: {story2}",
story1=story1.future.text,
story2=story2.future.text)
)
response = s.run(summary)

Chaining a few LLM calls is surprisingly complex using LangChain. This example requires 3x more code, and several extra abstractions:

  • ChatPromptTemplate – Substrate lets you simply use format strings, or even jinja.
  • StrOutputParser – Substrate uses idiomatic tools (like format strings) to transform outputs, follows the Unix Philosophy: "Don't clutter output with extraneous information".
  • RunnableParallel – Substrate automatically parallelizes your workflow. Because story1 and story2 don't depend on anything, Substrate runs them in parallel.

Substrate's unique approach lets you work with simple abstractions at the conceptual layer, and trust Substrate to run your workload fast when you call substrate.run:

  • First, we analyze your workload as a directed acyclic graph (opens in a new tab) and optimize the graph: for example, merging nodes that can be run in a batch.
  • Then, we schedule the graph with optimized parallelism. No more async programming: just connect nodes, and let Substrate parallelize your workload.
  • Our infrastructure guarantees optimized data locality. Your entire workload runs in the same cluster (often on the same machine), so you won't spend fractions of a second per task on unnecessary data roundtrips and cross-region HTTP transport.

2. Unified platform to build fast.

Substrate is a unified platform for building compound AI systems, and hosts a comprehensive suite of high-performance tools. But you don't have to run everything on Substrate – we also let you connect external providers. Substrate includes:

ComponentReplaces
Curated library of optimized AI modelsModal (opens in a new tab), Baseten (opens in a new tab), Replicate (opens in a new tab), Together (opens in a new tab), Fireworks (opens in a new tab), Fal (opens in a new tab)
Simple abstractions for compound AILangGraph (opens in a new tab), LangChain (opens in a new tab), Martian (opens in a new tab), Baseten (opens in a new tab), Modal (opens in a new tab)
Built-in vector storagePinecone (opens in a new tab), Weaviate (opens in a new tab), Vespa (opens in a new tab), Zilliz (opens in a new tab), Meilisearch (opens in a new tab)
Built-in code interpreterE2B (opens in a new tab), Modal (opens in a new tab)

Our unified approach makes Substrate the best choice for building fast. And because everything is colocated, your workloads also run fast.

3. Zero infrastructure to manage.

Some inference APIs (Baseten (opens in a new tab), Replicate (opens in a new tab), Fal (opens in a new tab)) are actually thin wrappers over serverless GPUs. The performance of serverless providers is unreliable: you'll run into unpredictable cold starts. And serverless compute is inherently more expensive: you'll pay for GPU time at the highest market rates, and you're billed for idle compute time (during spinup, and waiting for the idle timeout to trigger a scale down). When you're thinking about these things, you're still managing infra.

All inference API providers impose rate limits. You may have seen or implemented code like the snippet below, to handle rate limit errors with exponential backoff. Here we are, managing infra again.

Others
Substrate

for sleep_time in [1, 2, 4]:
try:
# ...
response = await async_client.chat.completions.create(
model=model,
messages=messages,
)
break
except together.error.RateLimitError as e:
print(e)
await asyncio.sleep(sleep_time)

Instead of rate limits, Substrate uses a concurrency limit – the number of nodes that you can run in parallel. Just call substrate.run: Substrate knows your concurrency limit, and manages scheduling accordingly.