AccueilGlossaire › Speech-to-Text (STT, ASR)

Speech-to-Text (STT, ASR)

AI/ML

Conversion automatique de parole en texte via Automatic Speech Recognition.

Speech-to-Text (STT, aussi ASR — Automatic Speech Recognition) est la conversion automatique de parole audio en transcription textuelle. Technologies fondamentales modernes basées sur deep learning end-to-end.

Modèles et services leaders : (1) **OpenAI Whisper** — open source modèle multilingue (large-v3, turbo), excellent quality, run local ou API ; (2) **Google Speech-to-Text** (Cloud Speech API) — Chirp et Chirp 2 modèles ; (3) **Azure AI Speech** — Microsoft enterprise STT ; (4) **AWS Transcribe** — médical, call analytics variants ; (5) **AssemblyAI** — speaker diarization fort ; (6) **Deepgram** — low-latency streaming ; (7) **Rev.ai**, **Speechmatics** ; (8) **NVIDIA Parakeet, Canary** — open models ; (9) **Distil-Whisper** — faster Whisper distillation.

Features avancées : (1) **Speaker diarization** — identify who said what ("speaker 1", "speaker 2") ; (2) **Timestamps** — word ou segment level ; (3) **Punctuation et casing** ; (4) **Real-time streaming** vs batch ; (5) **Custom vocabulary / phrase boosting** pour domain-specific terms ; (6) **Language identification** automatic ; (7) **Translation** intégrée (Whisper translates non-English audio to English text natively) ; (8) **Profanity filtering** ; (9) **Emotion/sentiment** detection ; (10) **Speech recognition for media** (subtitles, captions).

Use cases : transcription meetings (Otter.ai, Fireflies, Granola), captions YouTube/Zoom, voice assistants, call centers analytics, dictation médicale (Nuance Dragon Medical), accessibility (deaf users), podcast transcripts, voice search.

Metrics : WER (Word Error Rate) — % mots incorrects. Excellents systèmes 2024 : ~3-5% WER English clear audio, 8-15% noisy/accented, 10-25% specific domains/languages. Compétences AI-102, AIF-C01.

Certifications qui couvrent ce concept
AI-102 AIF-C01
Termes liés
Text-to-Speech (TTS) Multi-modal AI GPT (Generative Pre-trained Transformer)

Préparez vos certifications IT gratuitement

200+ certifications, 400 000+ questions, examens blancs chronométrés.

Voir le catalogue →
← Retour au glossaire