A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published: February 21, 2024

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

How AI Turns Noise into Art

8 minute read

Published: February 09, 2025

Welcome to the wonderfully wacky world of AI image generation, where noise isn’t just unwanted static—it’s the secret sauce that transforms raw data into breathtaking visuals.
If you’ve ever wondered how an algorithm goes from a garbled mess of pixels to a photorealistic masterpiece, buckle up: we’re about to diffuse some serious knowledge!

Foundations of the Future : Trying to get behind Research Papers That Changed the Game

3 minute read

Published: September 23, 2025

CNNs ,RNNs and LSTM

How AI Turns Noise into Art

8 minute read

Published: February 09, 2025

Welcome to the wonderfully wacky world of AI image generation, where noise isn’t just unwanted static—it’s the secret sauce that transforms raw data into breathtaking visuals.
If you’ve ever wondered how an algorithm goes from a garbled mess of pixels to a photorealistic masterpiece, buckle up: we’re about to diffuse some serious knowledge!

Foundations of the Future : The Papers That Built ABCs of LLMs

18 minute read

Published: September 25, 2025

The Papers That Built ABCs of LLMs

A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published: February 21, 2024

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

Foundations of the Future : The Papers That Built ABCs of LLMs

18 minute read

Published: September 25, 2025

The Papers That Built ABCs of LLMs

How AI Turns Noise into Art

8 minute read

Published: February 09, 2025

Welcome to the wonderfully wacky world of AI image generation, where noise isn’t just unwanted static—it’s the secret sauce that transforms raw data into breathtaking visuals.
If you’ve ever wondered how an algorithm goes from a garbled mess of pixels to a photorealistic masterpiece, buckle up: we’re about to diffuse some serious knowledge!

UNDERSTANDING Automatic Speech Recognition Technology (HUBERT) (contd.)

2 minute read

Published: August 09, 2024

I’ve been looking into getting started with using transformers for speech. I’ve been doing some reading and attended a talk where I learned about using Hubert for the encoder in most articles.

UNDERSTANDING KENLM et MOSES

3 minute read

Published: April 11, 2024

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

Foundations of the Future : Trying to get behind Research Papers That Changed the Game

3 minute read

Published: September 23, 2025

CNNs ,RNNs and LSTM

Foundations of the Future : Trying to get behind Research Papers That Changed the Game

3 minute read

Published: September 23, 2025

CNNs ,RNNs and LSTM

Foundations of the Future : The Papers That Built ABCs of LLMs

18 minute read

Published: September 25, 2025

The Papers That Built ABCs of LLMs

Foundations of the Future : The Papers That Built ABCs of LLMs

18 minute read

Published: September 25, 2025

The Papers That Built ABCs of LLMs

Is My RAG System Over-Engineered? I Tested It.

5 minute read

Published: October 15, 2025

Is My RAG System Over-Engineered? I Tested It.

I Benchmarked 6 Vector Databases for RAG — Here’s What Surprised Me Most

5 minute read

Published: October 09, 2025

I Benchmarked 6 Vector Databases for RAG — Here’s What Surprised Me Most

Getting started with Whisper by OPENAI

3 minute read

Published: December 12, 2023

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.

Getting started with Whisper by OPENAI

3 minute read

Published: December 12, 2023

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.

A Deep Dive into Automatic Speech Recognition Technology

4 minute read

Published: February 21, 2024

A Deep Dive into Automatic Speech Recognition Technology

ASR, or automatic speech recognition, is a technology that aims to convert spoken utterances into a textual representation such as words, syllables, or phonemes. Speech recognition technology involves three models: the lexicon model which understands how words are pronounced, the acoustic model which analyzes speech patterns, and the language model which predicts word sequences. These models work together in decoding to produce accurate transcriptions of spoken language.

UNDERSTANDING Automatic Speech Recognition Technology (HUBERT) (contd.)

2 minute read

Published: August 09, 2024

I’ve been looking into getting started with using transformers for speech. I’ve been doing some reading and attended a talk where I learned about using Hubert for the encoder in most articles.

UNDERSTANDING KENLM et MOSES

3 minute read

Published: April 11, 2024

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

UNDERSTANDING Automatic Speech Recognition Technology (HUBERT) (contd.)

2 minute read

Published: August 09, 2024

I’ve been looking into getting started with using transformers for speech. I’ve been doing some reading and attended a talk where I learned about using Hubert for the encoder in most articles.

UNDERSTANDING KENLM et MOSES

3 minute read

Published: April 11, 2024

====== I was checking out few repositories for Language translation and came across set of following keywords which got me more interested towards checking these out …

Getting started with Whisper by OPENAI

3 minute read

Published: December 12, 2023

Getting started with Whisper by OPENAI

Whisper is a cutting-edge speech recognition model developed by OpenAI in October 2022. Its primary purpose is to convert audio files into text with remarkable accuracy, supporting up to 99 languages, including Japanese. The model’s encoder was trained through a technique called weakly supervised learning, leveraging a vast dataset of over 68,000 hours of speech. This approach enabled the model to surpass the accuracy of traditional academic data sets.

Somil Jain

Posts by Tags

ASR

A Deep Dive into Automatic Speech Recognition Technology

CLIP

CNN

CNNs ,RNNs and LSTM

DALLE

GPT

The Papers That Built ABCs of LLMs

LLM

A Deep Dive into Automatic Speech Recognition Technology

LLMs

The Papers That Built ABCs of LLMs

LM

LSTM

CNNs ,RNNs and LSTM

RNN

CNNs ,RNNs and LSTM

SEQ2SEQ

The Papers That Built ABCs of LLMs

TRANFORMERS

The Papers That Built ABCs of LLMs

VectorDB

Is My RAG System Over-Engineered? I Tested It.

I Benchmarked 6 Vector Databases for RAG — Here’s What Surprised Me Most

llm

Getting started with Whisper by OPENAI

openai

Getting started with Whisper by OPENAI

speech

A Deep Dive into Automatic Speech Recognition Technology

subword

translation

whisper

Getting started with Whisper by OPENAI