Archives

All the articles I've archived.

2022 ⁸

September ⁸

Why Embeddings Matter

Updated: 9 Jan, 2026

A deep dive into what embeddings are, why they matter, and how they power modern AI, semantic search, and RAG-based systems.
What LLMs Do at Inference: A Deep Dive Under the Hood

Updated: 9 Jan, 2026

A step-by-step, reference-backed explanation of what happens during LLM inference: tokenization, embeddings, prefill & decode phases, KV caching, decoding strategies, bottlenecks and optimizations like quantization, FlashAttention and speculative decoding.
Transformers in AI

Updated: 9 Jan, 2026

The Architecture That Revolutionized Machine Learning
Top-k vs. Nucleus Sampling

Updated: 9 Jan, 2026

Decoding the Secrets of AI Text Generation
GPU vs TPU

Updated: 9 Jan, 2026

Decoding the Battle of AI Accelerators in 2025
Why Does Retrieval-Augmented Generation (RAG) Exist?

Updated: 9 Jan, 2026

In the rapidly evolving world of artificial intelligence, large language models (LLMs) like GPT-4 or Grok have transformed how we interact with technology.
Understanding Tokenizers in AI — A Deep Dive into ChatGPT, Grok, and Gemini

Updated: 9 Jan, 2026

A complete guide to tokenizers in modern LLMs, covering BPE, WordPiece, SentencePiece, Unigram, and how ChatGPT, Grok, and Gemini tokenize text. Includes examples, real-world impact, and why tokenization is the foundation of AI.
KV Cache Explained

Updated: 9 Jan, 2026

A Deep Dive into Transformer Optimization

Why Embeddings Matter