First month for free!
Get started
Just $0.50 per 3 hours of speech - Lowest price on the market.
The WhisperAPI.com transcription API is a high-quality speech-to-text API powered by the Whisper v3 model. It is the same technology as our Speech-to-Text API, but hosted on WhisperAPI.com.
When starting from scratch, we recommend using the Speech-to-Text API as it is more feature-rich and has a more user-friendly interface.
If you haven't already, you will need to create an API key to authenticate your requests.
const body = new FormData();
body.append('url', 'https://output.lemonfox.ai/brownfox.mp3');
// instead of providing a URL you can also upload a file object:
// body.append('file', new Blob([await fs.readFile('/path/to/audio.mp3')]));
body.append('language', 'english');
fetch('https://transcribe.whisperapi.com', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
},
body: body
})
.then(response => response.json()).then(data => {
console.log(data);
})
{
"language": "en",
"text": "The quick brown fox jumps over the lazy dog.",
"segments": [
{
"start": 0,
"end": 2.4,
"text": " The quick brown fox jumps over the lazy dog.",
"whole_word_timestamps": [
{"word": "The", "start": 0, "end": 0.16, "timestamp": 0.16, "probability": 0.77197265625},
{"word": "quick", "start": 0.16, "end": 0.34, "timestamp": 0.34, "probability": 0.90283203125},
{"word": "brown", "start": 0.34, "end": 0.64, "timestamp": 0.64, "probability": 0.8623046875},
{"word": "fox", "start": 0.64, "end": 0.98, "timestamp": 0.98, "probability": 0.982421875},
{"word": "jumps", "start": 0.98, "end": 1.32, "timestamp": 1.32, "probability": 0.99658203125},
{"word": "over", "start": 1.32, "end": 1.64, "timestamp": 1.64, "probability": 0.99951171875},
{"word": "the", "start": 1.64, "end": 1.78, "timestamp": 1.78, "probability": 0.98974609375},
{"word": "lazy", "start": 1.78, "end": 2.02, "timestamp": 2.02, "probability": 0.9716796875},
{"word": "dog.", "start": 2.02, "end": 2.4, "timestamp": 2.4, "probability": 0.994140625}
]
}
],
"diarization": [
{
"startTime": 0.5034129692832765,
"stopTime": 2.6194539249146755,
"speaker": "SPEAKER_00"
}
]
}
The API POST https://transcribe.whisperapi.com
takes the following parameters:
The audio file to transcribe. This allows you to upload a file object to the API. The upload size is limited to 100MB.
Only one of file
or url
can be provided.
Supported audio and video file formats: mp3
, wav
, flac
, aac
, opus
, ogg
, m4a
, mp4
, mpeg
, mov
, webm
, and more.
Provide a public URL to an audio file. The model will download the file and transcribe it. The maximum file size is 500MB.
Only one of file
or url
can be provided.
Supported audio and video file formats: mp3
, wav
, flac
, aac
, opus
, ogg
, m4a
, mp4
, mpeg
, mov
, webm
, and more.
Set this parameter to true
to enable speaker diarization. This will add speaker labels to the transcript.
Number of speakers for diarization. If this parameter is blank, the model will auto-detect the number of speakers. This parameter is only used when diarization
is set to true
.
The language of the input audio. If no language is provided we detect the language automatically. Supplying the input language can improve accuracy and latency.
Supported languages: english
, chinese
, german
, spanish
, russian
, korean
, french
, japanese
, portuguese
, turkish
, polish
, catalan
, dutch
, arabic
, swedish
, italian
, indonesian
, hindi
, finnish
, vietnamese
, hebrew
, ukrainian
, greek
, malay
, czech
, romanian
, danish
, hungarian
, tamil
, norwegian
, thai
, urdu
, croatian
, bulgarian
, lithuanian
, latin
, maori
, malayalam
, welsh
, slovak
, telugu
, persian
, latvian
, bengali
, serbian
, azerbaijani
, slovenian
, kannada
, estonian
, macedonian
, breton
, basque
, icelandic
, armenian
, nepali
, mongolian
, bosnian
, kazakh
, albanian
, swahili
, galician
, marathi
, punjabi
, sinhala
, khmer
, shona
, yoruba
, somali
, afrikaans
, occitan
, georgian
, belarusian
, tajik
, sindhi
, gujarati
, amharic
, yiddish
, lao
, uzbek
, faroese
, haitian creole
, pashto
, turkmen
, nynorsk
, maltese
, sanskrit
, luxembourgish
, myanmar
, tibetan
, tagalog
, malagasy
, assamese
, tatar
, hawaiian
, lingala
, hausa
, bashkir
, javanese
, sundanese
, cantonese
, burmese
, valencian
, flemish
, haitian
, letzeburgesch
, pushto
, panjabi
, moldavian
, moldovan
, sinhalese
, castilian
, mandarin
transcribe
(default) or translate
. If set to translate
, the API will translate the transcript to English.A text to guide the transcript's style or continue a previous audio transcript. The prompt should be in the same language as the audio.
Examples
The transcript is about blockchain technology, including terms like NFTs and DeFi.
. Alternately, the prompt can be a simple list of words: NFT, DeFi, DAO, DApp
Hello, welcome to the podcast.
Umm, let's see, hmm... Okay, here's what I'm, like, thinking.
A URL to which the API will send a POST request with the transcription results when the transcription is complete.