For AI agents: a documentation index is available at /llms.txt. Markdown versions of all pages can be requested by appending `.md` to the URL, or by setting the `Accept` header to `text/markdown`.
Skip to main content
Speech to Text

Languages

See which languages Speechmatics supports for transcription and translation, including bilingual packs.

To choose a transcription model, refer to Models.

The languages, packs, and options on this page apply to the Enhanced and Standard models. The Melia 1 model is multilingual: it transcribes the individual languages listed here and switches between them automatically, without language selection. You can use their language codes as language hints. Melia 1 does not support the auto option, the bilingual and multi-language pack codes, or translation. For Melia 1, refer to Models.

Transcription languages

To automatically identify the language in an audio file, use the Language Identification feature.

To dynamically update your system with the latest languages and features offered by Speechmatics, use the Feature Discovery endpoint.

Speechmatics supports the following languages. Your ability to use any or all of them depends on the languages you are contracted to use.

Speechmatics takes a global-first approach to languages. A single language pack supports many accents and dialects, so you do not need to know which accent is in your audio before selecting a language. This approach achieves high accuracy compared to accent-specific language packs.

LanguageLanguage codeDescription
AutomaticautoAutomatically detect the language using the Language Identification feature. Currently supported with Batch transcription only.
ArabicarGlobal Arabic gives high-accuracy transcription across many accents and dialects, including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt, and the Levant.
Arabic & English bilingualar_enIdeal when transcribing Arabic and English in the same media file or stream. Supports all accents and dialects listed under Arabic and English.
Bashkirba
Basqueeu
Belarusianbe
Bengalibn
Bulgarianbg
Cantoneseyue
Catalanca
Croatianhr
Czechcs
Danishda
Dutchnl
EnglishenGlobal English gives high-accuracy transcription across many accents, including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand, and by non-native speakers. To standardize spelling, specify the output locale.
Esperantoeo
Estonianet
Finnishfi
FrenchfrGlobal French gives high-accuracy transcription across many accents, including (but not limited to) French spoken in France, Canada, and Belgium.
Galiciangl
GermandeGlobal German gives high-accuracy transcription across many accents, including (but not limited to) German spoken in Germany, Austria, and Switzerland.
Greekel
Hebrewhe
Hindihi
Hungarianhu
Indonesianid
Interlinguaia
Irishga
Italianit
Japaneseja
Koreanko
Latvianlv
Lithuanianlt
Malayms
Malay & English bilingualen_msIdeal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English.
Maltesemt
MandarincmnGlobal Mandarin can output Traditional or Simplified characters and gives high-accuracy transcription across many accents, including (but not limited to) China, Taiwan, Singapore, and Malaysia.
Mandarin & English bilingualcmn_enIdeal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English.
Mandarin Malay Tamil & Englishcmn_en_ms_taIdeal when transcribing Mandarin, Malay, Tamil, and English in the same media file or stream. Supports all accents and dialects listed under Mandarin, Malay, Tamil, and English.
Marathimr
Mongolianmn
Norwegianno
Persianfa
Polishpl
PortugueseptGlobal Portuguese gives high-accuracy transcription across many accents, including (but not limited to) Portuguese spoken in Portugal and Brazil.
Romanianro
Russianru
Slovakiansk
Sloveniansl
SpanishesGlobal Spanish gives high-accuracy transcription across many accents, including (but not limited to) Spanish spoken in Spain, the US, Mexico, Colombia, Argentina, Venezuela, Chile, and Peru.
Spanish & English bilinguales (with domain=bilingual-en)Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. Requires the domain config to be set.
Swahilisw
Swedishsv
Tagalog (Filipino) & English bilingualtlIdeal when transcribing Tagalog (Filipino) and English in the same media file or stream. Supports all accents and dialects listed under English.
Tamilta
Tamil & English bilingualen_taIdeal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English.
Thaith
Turkishtr
Ukrainianuk
Urduur
Uyghurug
Vietnamesevi
WelshcyWelsh must be explicitly added to the expected languages list when using the Language Identification feature. Otherwise a language not supported for transcription error is returned.

Each language is uniquely identified by a two-letter code (ISO 639-1) or three-letter code (ISO 639-3) in API requests and responses.

Translation languages

Translation is available with the Enhanced and Standard models. It is supported for most Speechmatics languages, with the supported translation pairs listed below. For more details, see Translation.

Audio languageTranslation target language
English (en)Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)English (en)
Norwegian Bokmål (no)Norwegian Nynorsk (nn)

Bilingual and multi-language packs

These packs handle a fixed set of languages that you select in advance. To transcribe audio without selecting languages, including spontaneous switching across all supported languages, use the Melia 1 multilingual model. Refer to Models.

The Enhanced and Standard models can transcribe a selected combination of languages in one media file or stream, including speakers who switch between the languages in that pack. Each pack covers a fixed set of languages that you select with the language property.

Supported packs are:

Language packTranscription config
Arabic and English{"language": "ar_en"}
Malay and English{"language": "en_ms"}
Mandarin and English{"language": "cmn_en"}
Mandarin Malay Tamil and English{"language": "cmn_en_ms_ta"}
Spanish and English{"language": "es", "domain": "bilingual-en"}
Tamil and English{"language": "en_ta"}
Tagalog (Filipino) and English{"language": "tl"}

This config selects the Mandarin and English pack:

{
"type": "transcription",
"transcription_config": {
"language": "cmn_en"
}
}

This config selects the Spanish and English pack, which requires the domain property:

{
"type": "transcription",
"transcription_config": {
"language": "es",
"domain": "bilingual-en"
}
}

Healthcare domain

Speechmatics offers a medical domain that provides high accuracy for healthcare use cases such as ambient scribes and dictation tools. The medical domain is available with the Enhanced model only. It does not apply to the Standard or Melia 1 models.

The medical domain is kept up to date using officially maintained data sources. This improves recognition of medical terminology such as procedures, medications, conditions, and anatomy.

For languages without medical domain support, the Enhanced model still gives high accuracy in the healthcare domain.

Set the domain property to medical:

{
"type": "transcription",
"transcription_config": {
"model": "enhanced",
"language": "en",
"domain": "medical"
}
}
LanguageRealtimeBatch
Arabic EnglishAvailableAvailable
DanishAvailableAvailable
DutchAvailableAvailable
EnglishAvailableAvailable
FinnishAvailableAvailable
FrenchAvailableAvailable
GermanAvailableAvailable
NorwegianAvailableAvailable
SpanishAvailableAvailable
SwedishAvailableAvailable
Additional languagesContact us