Tools: How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

Tools: How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

Source: Dev.to

1. What Is a Token? ## 2. What Is Tokenization? ## 3. Encoding & Decoding Tokens in Python ## 4. Vector Embeddings – Giving Words Meaning ## 5. Positional Encoding – Order Matters ## 6. Self-Attention – Words Talking to Each Other ## 7. Multi-Head Attention – Looking at Many Angles ## 8. Final Flow of an LLM ## Final Thoughts In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally. A token is a piece of text converted into a number that the model understands. So if you type: B D E → it becomes → 2 4 5 LLMs don’t understand words. They understand numbers. This process of converting text → numbers is called tokenization. Converting user input into a sequence of numbers that the model can process. These numbers go into the transformer, which predicts the next token again and again. 👉Note: Every model has its own mechanism for generating tokens. Using the tiktoken library: This is exactly how ChatGPT works internally. Tokens alone are just numbers. Embeddings give them meaning. An embedding is a vector (list of numbers) that represents the semantic meaning of a word. Words with similar meaning are placed near each other in vector space. That’s how LLMs understand relationships like: This is called semantic similarity. Same words. Different meaning. Embeddings alone don’t know position. So the model adds positional encoding. Positional encoding tells the model: So the model understands order and structure. Self-attention lets tokens influence each other. Same word: bank Different meaning. Self-attention allows: So context decides meaning. Multi-head attention means the model looks at: Like a human observing many things at once. This gives the model a deep understanding of the sentence. Tokenization → numbers Positional encoding → order Self + Multi-head attention → context Linear + Softmax → probability of next token Decode → readable output LLMs don’t know language. They predict tokens based on probability and patterns. Yet the result feels intelligent because: And that’s the magic behind ChatGPT. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: A → 1 B → 2 C → 3 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: A → 1 B → 2 C → 3 CODE_BLOCK: A → 1 B → 2 C → 3 CODE_BLOCK: Text → Tokens → Model → Tokens → Text Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Text → Tokens → Model → Tokens → Text CODE_BLOCK: Text → Tokens → Model → Tokens → Text CODE_BLOCK: "Hey there, my name is Piyush" Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: "Hey there, my name is Piyush" CODE_BLOCK: "Hey there, my name is Piyush" CODE_BLOCK: [20264, 1428, 225216, 3274, ...] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [20264, 1428, 225216, 3274, ...] CODE_BLOCK: [20264, 1428, 225216, 3274, ...] CODE_BLOCK: import tiktoken encoder = tiktoken.encoding_for_model("gpt-4o") text = "Hey there, my name is Prabhas Kumar" tokens = encoder.encode(text) print(tokens) decoded = encoder.decode(tokens) print(decoded) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: import tiktoken encoder = tiktoken.encoding_for_model("gpt-4o") text = "Hey there, my name is Prabhas Kumar" tokens = encoder.encode(text) print(tokens) decoded = encoder.decode(tokens) print(decoded) CODE_BLOCK: import tiktoken encoder = tiktoken.encoding_for_model("gpt-4o") text = "Hey there, my name is Prabhas Kumar" tokens = encoder.encode(text) print(tokens) decoded = encoder.decode(tokens) print(decoded) CODE_BLOCK: Paris → Eiffel Tower India → Taj Mahal Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Paris → Eiffel Tower India → Taj Mahal CODE_BLOCK: Paris → Eiffel Tower India → Taj Mahal - What a token really is - How tokenization works - Encoding & decoding with Python - Vector embeddings - Positional encoding - Self-attention & multi-head attention - encode() → converts text → tokens - decode() → converts tokens → readable text - Dog and Cat → close together - Paris and India → close together - Eiffel Tower and India Gate → close together - "Dog ate cat" - "Cat ate dog" - This word is first - This word is second - This word is third - "river bank" - "ICICI bank" - "river" → changes meaning of "bank" - "ICICI" → changes meaning of "bank" - Relationship - Tokens carry meaning (embeddings) - Order is preserved (positional encoding) - Context is understood (attention)