The TranscribeSpeech node transcribes speech from audio or video files. Supported input types include:
- Base-64 encoded data strings (if your media is small enough to fit in a request payload). Be sure to include the
data:
prefix with a mime type (opens in a new tab). - Hosted media URLs (with a wide range of supported formats)
- YouTube URLs
TranscribeSpeech
also includes these built-in capabilities:
- segmentation by sentence
- diarization (speaker identification)
- alignment to word-level timestamps
- automatic chapter detection
To simply transcribe input without further processing, provide an audio_uri
. This can be a publicly-hosted audio or video file, base-64-encoded audio or video data, or a privately-hosted external file (opens in a new tab). For best results, you may also provide a prompt
that describes the content of the audio or video.
Output
{ "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that ..."}
To enable additional capabilities, set:
segment: True
to return a list of sentencesegments
withstart
andend
timestamps.align: True
to return a list of alignedwords
within sentencesegments
.diarize: True
to includespeaker
IDs withinsegments
andwords
.suggest_chapters: True
to return a list of suggestedchapters
with titles andstart
timestamps.
Output
{ "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that ...", "segments": [ { "start": 0.874, "end": 15.353, "speaker": "SPEAKER_00", "text": "language like that, the wounded inner child, the inner pain, is part of a kind of pop psychological movement in the United States that is a sort of popular Freudianism that", "words": [ { "word": "language", "start": 0.874, "end": 1.275, "speaker": "SPEAKER_00" }, { "word": "like", "start": 1.295, "end": 1.455, "speaker": "SPEAKER_00" } ] } ], "chapters": [ { "title": "Introduction to the Wounded Inner Child and Popular Psychology in US", "start": 0.794 }, { "title": "The Paradox of Popular Psychology and Anger in America", "start": 16.186 } ]}
You can customize the chapter summarization feature by implementing your own pipeline. To learn how to do this, and see example of how to use text segments to create an animated captions experience, check out our runnable example on val.town (opens in a new tab). You can also find this example in the examples/descript
directory of the substrate-python (opens in a new tab) and substrate-typescript (opens in a new tab) SDK repositories.