Tools
Tools: How I Explained LLMs, SLMs & VLMs at Microsoft
2026-01-25
0 views
admin
Why This Talk Mattered ## Slide 1: Not All AI Models Are Built the Same — And That’s the Point ## Slide 2: Who Am I and Why This Perspective Matters ## Slide 3: What Even Are LLMs? ## Slide 4: Why the AI Landscape Is Not One-Size-Fits-All ## Slide 5–6: What LLMs Can Do Really Well ## Slide 7: The LLM Trade-Offs No One Talks About Enough ## Slide 8: Enter SLMs — The Efficiency Experts ## Slide 9: SLMs Are Not Weak, They Are Strategic ## Slide 10: When Should You Use SLMs? ## Slide 11: VLMs — When AI Learns to See ## Slide 12: How VLMs Actually Work ## Slide 13: VLMs in the Real World ## Slide 14: Comparing LLMs, SLMs, and VLMs ## Slide 15: Speed and Latency Reality Check ## Slide 16: Choosing the Right Tool ## Final Takeaways ## Closing Thoughts ## Let’s Continue the Conversation I recently had the opportunity to present my thoughts on LLMs, SLMs, and VLMs at the Microsoft office during a community event. This wasn’t just another AI talk filled with buzzwords and hype. The goal was simple but powerful: help students and professionals understand why not all AI models are built the same—and why that’s actually a good thing. This blog is a written walkthrough of that presentation. I’ll be embedding the same slides I used and expanding on the thinking behind them—what I wanted the audience to feel, question, and take back with them. Key idea: AI diversity is a feature, not a flaw. Most AI conversations start with the wrong question:
Which model is the best? I wanted to flip that narrative early and replace it with a better one:
Which model fits the problem we are actually trying to solve? That framing sets the foundation for everything that follows. Before diving into models, I briefly introduced myself and my background—working as an AI Data Scientist, building SaaS products, publishing research, and deploying production-grade AI systems. This mattered because the talk wasn’t theoretical. It was grounded in real-world AI, where cost, latency, privacy, and infrastructure constraints are non-negotiable. Large Language Models (LLMs) are neural networks trained on massive datasets. They represent the most powerful and versatile AI systems available today. In simple terms, LLMs: Examples include GPT-style models, Claude, Gemini, and LLaMA-based systems. I used a smartphone analogy to make this intuitive. Just like we have flagship phones, budget phones, and specialized devices, AI models exist for different needs. Different tools exist because different problems demand different trade-offs. LLMs are incredibly versatile. They can: This is where most of the AI hype comes from—and rightly so. All that power comes at a cost. Running LLMs is like driving a Ferrari. It’s impressive, but not always practical. Real-world limitations include: This is where many production systems start to struggle. Small Language Models (SLMs) are often underestimated, but they are having their moment. They are designed for practicality, not bragging rights. I highlighted several modern SLMs to make this concrete: These models prove that intelligence is not just about size—it’s about optimization. SLMs shine in scenarios where: SLMs are not budget LLMs—they are the right choice for the right job. Vision-Language Models (VLMs) take things a step further. They don’t just read text—they understand images as well. This is where AI becomes truly multimodal. Under the hood, VLMs combine: This allows AI systems to see and reason at the same time. VLMs are already transforming industries such as: Multimodal AI is no longer optional—it’s becoming standard. There is no single best model. The right choice depends entirely on context. Response time matters more than ever. Approximate latency expectations: For real-time applications, speed is not optional. Choosing a model is like choosing a vehicle. A monster truck is overkill for city driving. The most important lessons from this talk: Choose models like an engineer, not like a fan. Presenting this at the Microsoft office was special—not because of the venue, but because the audience asked implementation-focused questions, not hype-driven ones. If you’re building AI systems today, understanding LLMs, SLMs, and VLMs isn’t optional—it’s foundational. If this resonated with you, feel free to connect with me on LinkedIn or reach out directly. I’d love to hear how you’re thinking about model selection in your own AI stack.
https://www.linkedin.com/in/jaskiratai Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Contain billions to trillions of parameters
- Use transformer architectures with attention mechanisms
- Can reason, generate text, write code, translate languages, and analyze data - LLMs are the heavyweights
- SLMs are the efficiency experts
- VLMs are the multimodal specialists - Generate long-form content
- Perform complex reasoning
- Write and debug code across multiple languages
- Translate between dozens of languages
- Hold natural, human-like conversations
- Assist with deep research and analysis - High computational requirements
- Expensive inference costs
- Higher latency
- Heavy cloud dependency
- Significant energy consumption - Smaller and more focused
- Optimized for specific tasks
- Fast and cost-efficient
- Capable of running on phones, laptops, and edge devices - Phi-3 (Microsoft)
- Gemini Nano - Speed matters
- Privacy is critical
- Offline capability is required
- Budgets are limited
- Edge deployment is necessary
- Tasks are domain-specific - Process images and text together
- Understand visual context
- Answer questions about photos
- Generate descriptions from images - A vision encoder for images
- A language model for text
- A fusion layer to connect meaning across modalities - Medical imaging and diagnostics
- Autonomous vehicles
- Accessibility tools
- Visual search engines
- Content moderation
- AR and VR experiences - LLMs excel at reasoning and versatility
- SLMs excel at efficiency and speed
- VLMs excel at multimodal understanding - LLMs: 1–5 seconds
- SLMs: under 0.5 seconds
- VLMs: 2–8 seconds depending on image complexity - Need complex reasoning → LLM
- Need speed and efficiency → SLM
- Need visual understanding → VLM
- Need offline capability → SLM - There is no universally best AI model
- Context beats capability
- Efficiency matters as much as intelligence
- Hybrid systems often outperform single-model setups
how-totutorialguidedev.toaineural networkllmgptnetwork