Speech

Transcribe and align speech from any audio file, converting spoken content into textual form. This model is optimized for speech transcription. For singing/lyrics transcription, please use the Lyrics Transcription module.

Get Started Contact Sales

Documentation

Settings

Name
language
Type
string
Description
Expected spoken language of the audio. Defaults to Auto-detection.
Name
alignment
Type
string
Description
The alignment that should be applied.
Name
diarization
Type
boolean
Description
Apply speaker identification for each subtitle sentence based on input audio.

Input

Name
inputFileUrl
Type
string
Description
Audio to process.

Output

Name
alignment
Type
string
Description
Transcribed and aligned speech.

Related Modules

Beats

Transcribe beats and calculate the tempo (beats per minute) from an input audio file, providing you with useful information for reproducing the rhythm of the track.

Details

Chords

Transcribe chords and root key from audio, providing a timeline of chord annotations in different classes (e.g., Complex Jazz, Simple Jazz, Complex Pop, and Simple Pop) and bass detection.

Details

Lyrics

Transcribe and align lyrics from any audio file, converting sung content into textual form. This model is optimized for transcribing singing. For speech transcription, please use the Speech Transcription module.

Details

Sections

Segment audio files into sections with annotated labels.

Details

Translation

Translate transcriptions (speech or lyrics) to the desired language.

Details

Ready to innovate your business?

Get Started Contact Sales