Posts by Tags

ASR

A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published:

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

LLM

A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published:

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

LM

UNDERSTANDING KENLM et MOSES

3 minute read

Published:

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

llm

Getting started with Whisper by OPENAI

3 minute read

Published:

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.

openai

Getting started with Whisper by OPENAI

3 minute read

Published:

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.

speech

A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published:

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

subword

UNDERSTANDING KENLM et MOSES

3 minute read

Published:

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

translation

UNDERSTANDING KENLM et MOSES

3 minute read

Published:

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

whisper

Getting started with Whisper by OPENAI

3 minute read

Published:

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.