Tools: Gemini 3.1 Flash-Lite: Developer guide and use cases

Tools: Gemini 3.1 Flash-Lite: Developer guide and use cases

Source: Dev.to

1. Translation ## 2. Transcription ## 3. Lightweight Agentic Tasks and Data Extraction ## 4. Document Processing & Summarization ## 5. Model routing ## 6. Thinking with Gemini Flash-Lite ## 7. Batch API ## Conclusion Gemini 3.1 Flash-Lite is the high-volume, affordable powerhouse of the Gemini family. It’s purpose-built for large-scale tasks where speed and cost-efficiency are the main priorities, making it the ideal engine for background processing. Whether you're handling a constant stream of user interactions or need to process massive datasets with tasks like translation, transcription, or extraction, Flash-Lite provides the optimal balance of speed and capability. This guide walks through seven practical use cases for Flash-Lite using the google-genai Python SDK. Install the SDK and configure your API key: If you're processing user-generated content at scale, such as chat messages, reviews, or support tickets, you need fast, cheap translation. Flash-Lite handles high-volume translation well, and you can use system instructions to constrain it to output only the translated text with no extra commentary. Flash-Lite supports multimodal inputs and handles speech-to-text tasks fast and at scale, allowing you to pass audio files such as recordings, memos, or voice inputs directly for transcription. Furthermore, you have the option to leverage prompting in the same step to get the transcript in a specific format, making it ready for downstream tasks like agent hand-offs or other workflows. Flash-Lite supports structured JSON output, which makes it a good fit for entity extraction, classification, and lightweight data processing pipelines. You define your output schema (here using Pydantic) and the model returns valid JSON that conforms to it. In this example, we extract structured data from an e-commerce customer review, including the specific product aspect mentioned, a summary quote, a sentiment score, and the customer's likelihood of returning. Flash-Lite handles high-volume document tasks with ease, from parsing PDFs for concise summaries to performing cross-source comparisons. It is also an ideal fit for document processing pipelines that require quick triage, enabling you to categorize incoming files, run simple pass/fail checks, or perform standard data extraction. You don't want to send every request to your most expensive model. A common pattern is to use a fast, cheap model as a classifier that routes queries to the appropriate model based on task complexity. Flash-Lite works well for this because the routing call itself needs to be low-latency and low-cost. A real-world example of this pattern is the open-source Gemini CLI, which uses Flash-Lite to classify task complexity and route to Gemini Flash or Pro. The following example is adapted from the CLI’s classifier strategy. Flash-Lite supports configurable thinking levels, allowing the model to allocate additional compute to internal reasoning before producing a final response. This is ideal for tasks that benefit from step-by-step logic, such as math, coding, or multi-constraint problems, where you need higher accuracy while maintaining the efficiency of the Flash-Lite model. By default, Flash-Lite’s thinking level is set to minimal, but it can be adjusted to low, medium, or high depending on the complexity of your task. For more on configuring thinking levels, see the Gemini API docs. If you have large volumes of data to process and low latency isn't a priority, the Gemini Batch API is the perfect companion for Flash-Lite. It is designed specifically for asynchronous, high-throughput tasks at 50% of the standard cost. The target turnaround time is 24 hours, but in the majority of cases, it is much quicker. You can implement the Batch API in your workflow using the following pattern: Gemini 3.1 Flash-Lite excels at the "boring but big" tasks that define high-scale production. It serves as a versatile workhorse for everything from data extraction to agentic routing, enabling you to build more balanced and efficient AI architectures. By leveraging Flash-Lite for high-volume background processing, you can maximize your impact while keeping operational costs in check. See the following resources to learn more: Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: # pip install -U google-genai from google import genai from google.genai import types client = genai.Client(api_key="YOUR_API_KEY") Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # pip install -U google-genai from google import genai from google.genai import types client = genai.Client(api_key="YOUR_API_KEY") COMMAND_BLOCK: # pip install -U google-genai from google import genai from google.genai import types client = genai.Client(api_key="YOUR_API_KEY") COMMAND_BLOCK: text = "Hey, are you down to grab some pizza later? I'm starving!" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", config={ "system_instruction": "Only output the translated text" }, contents=f"Translate the following text to German: {text}" ) print(response.text) # Hey, hast du Lust, später eine Pizza essen zu gehen? Ich habe riesigen Hunger! Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: text = "Hey, are you down to grab some pizza later? I'm starving!" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", config={ "system_instruction": "Only output the translated text" }, contents=f"Translate the following text to German: {text}" ) print(response.text) # Hey, hast du Lust, später eine Pizza essen zu gehen? Ich habe riesigen Hunger! COMMAND_BLOCK: text = "Hey, are you down to grab some pizza later? I'm starving!" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", config={ "system_instruction": "Only output the translated text" }, contents=f"Translate the following text to German: {text}" ) print(response.text) # Hey, hast du Lust, später eine Pizza essen zu gehen? Ich habe riesigen Hunger! COMMAND_BLOCK: # URL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3" # Upload the audio file to the GenAI File API uploaded_file = client.files.upload(file="sample.mp3") prompt = "Generate a transcript of the audio." # prompt = "Generate a transcript of the audio. Remove filler words such as 'um', 'uh', 'like'." response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, uploaded_file] ) print(response.text) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # URL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3" # Upload the audio file to the GenAI File API uploaded_file = client.files.upload(file="sample.mp3") prompt = "Generate a transcript of the audio." # prompt = "Generate a transcript of the audio. Remove filler words such as 'um', 'uh', 'like'." response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, uploaded_file] ) print(response.text) COMMAND_BLOCK: # URL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3" # Upload the audio file to the GenAI File API uploaded_file = client.files.upload(file="sample.mp3") prompt = "Generate a transcript of the audio." # prompt = "Generate a transcript of the audio. Remove filler words such as 'um', 'uh', 'like'." response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, uploaded_file] ) print(response.text) COMMAND_BLOCK: from pydantic import BaseModel, Field prompt = "Analyze the user review and determine the aspect, sentiment score, summary quote, and return risk" input_text = "The boots look amazing and the leather is high quality, but they run way too small. I'm sending them back." class ReviewAnalysis(BaseModel): aspect: str = Field(description="The feature mentioned (e.g., Price, Comfort, Style, Shipping)") summary_quote: str = Field(description="The specific phrase from the review about this aspect") sentiment_score: int = Field(description="1 to 5 (1=worst, 5=best)") is_return_risk: bool = Field(description="True if the user mentions returning the item") response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, input_text], config={ "response_mime_type": "application/json", "response_json_schema": ReviewAnalysis.model_json_schema(), }, ) print(response.text) # { # "aspect": "Size", # "summary_quote": "they run way too small", # "sentiment_score": 2, # "is_return_risk": true # } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: from pydantic import BaseModel, Field prompt = "Analyze the user review and determine the aspect, sentiment score, summary quote, and return risk" input_text = "The boots look amazing and the leather is high quality, but they run way too small. I'm sending them back." class ReviewAnalysis(BaseModel): aspect: str = Field(description="The feature mentioned (e.g., Price, Comfort, Style, Shipping)") summary_quote: str = Field(description="The specific phrase from the review about this aspect") sentiment_score: int = Field(description="1 to 5 (1=worst, 5=best)") is_return_risk: bool = Field(description="True if the user mentions returning the item") response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, input_text], config={ "response_mime_type": "application/json", "response_json_schema": ReviewAnalysis.model_json_schema(), }, ) print(response.text) # { # "aspect": "Size", # "summary_quote": "they run way too small", # "sentiment_score": 2, # "is_return_risk": true # } COMMAND_BLOCK: from pydantic import BaseModel, Field prompt = "Analyze the user review and determine the aspect, sentiment score, summary quote, and return risk" input_text = "The boots look amazing and the leather is high quality, but they run way too small. I'm sending them back." class ReviewAnalysis(BaseModel): aspect: str = Field(description="The feature mentioned (e.g., Price, Comfort, Style, Shipping)") summary_quote: str = Field(description="The specific phrase from the review about this aspect") sentiment_score: int = Field(description="1 to 5 (1=worst, 5=best)") is_return_risk: bool = Field(description="True if the user mentions returning the item") response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[prompt, input_text], config={ "response_mime_type": "application/json", "response_json_schema": ReviewAnalysis.model_json_schema(), }, ) print(response.text) # { # "aspect": "Size", # "summary_quote": "they run way too small", # "sentiment_score": 2, # "is_return_risk": true # } COMMAND_BLOCK: import httpx # Download PDF document doc_url = "https://storage.googleapis.com/generativeai-downloads/data/med_gemini.pdf" doc_data = httpx.get(doc_url).content prompt = "Summarize this document" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[ types.Part.from_bytes( data=doc_data, mime_type='application/pdf', ), prompt ] ) print(response.text) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import httpx # Download PDF document doc_url = "https://storage.googleapis.com/generativeai-downloads/data/med_gemini.pdf" doc_data = httpx.get(doc_url).content prompt = "Summarize this document" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[ types.Part.from_bytes( data=doc_data, mime_type='application/pdf', ), prompt ] ) print(response.text) COMMAND_BLOCK: import httpx # Download PDF document doc_url = "https://storage.googleapis.com/generativeai-downloads/data/med_gemini.pdf" doc_data = httpx.get(doc_url).content prompt = "Summarize this document" response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=[ types.Part.from_bytes( data=doc_data, mime_type='application/pdf', ), prompt ] ) print(response.text) COMMAND_BLOCK: FLASH_MODEL = 'flash' PRO_MODEL = 'pro' CLASSIFIER_SYSTEM_PROMPT = f""" You are a specialized Task Routing AI. Your sole function is to analyze the user's request and classify its complexity. Choose between `{FLASH_MODEL}` (SIMPLE) or `{PRO_MODEL}` (COMPLEX). 1. `{FLASH_MODEL}`: A fast, efficient model for simple, well-defined tasks. 2. `{PRO_MODEL}`: A powerful, advanced model for complex, open-ended, or multi-step tasks. A task is COMPLEX if it meets ONE OR MORE of the following criteria: 1. High Operational Complexity (Est. 4+ Steps/Tool Calls) 2. Strategic Planning and Conceptual Design 3. High Ambiguity or Large Scope 4. Deep Debugging and Root Cause Analysis A task is SIMPLE if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). """ user_input = "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?" response_schema = { "type": "object", "properties": { "reasoning": { "type": "string", "description": "A brief, step-by-step explanation for the model choice, referencing the rubric." }, "model_choice": { "type": "string", "enum": [FLASH_MODEL, PRO_MODEL] } }, "required": ["reasoning", "model_choice"] } response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=user_input, config={ "system_instruction": CLASSIFIER_SYSTEM_PROMPT, "response_mime_type": "application/json", "response_json_schema": response_schema }, ) print(response.text) # { # "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation to identify the root cause, which falls under 'Deep Debugging & Root Cause Analysis'.", # "model_choice": "pro" # } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: FLASH_MODEL = 'flash' PRO_MODEL = 'pro' CLASSIFIER_SYSTEM_PROMPT = f""" You are a specialized Task Routing AI. Your sole function is to analyze the user's request and classify its complexity. Choose between `{FLASH_MODEL}` (SIMPLE) or `{PRO_MODEL}` (COMPLEX). 1. `{FLASH_MODEL}`: A fast, efficient model for simple, well-defined tasks. 2. `{PRO_MODEL}`: A powerful, advanced model for complex, open-ended, or multi-step tasks. A task is COMPLEX if it meets ONE OR MORE of the following criteria: 1. High Operational Complexity (Est. 4+ Steps/Tool Calls) 2. Strategic Planning and Conceptual Design 3. High Ambiguity or Large Scope 4. Deep Debugging and Root Cause Analysis A task is SIMPLE if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). """ user_input = "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?" response_schema = { "type": "object", "properties": { "reasoning": { "type": "string", "description": "A brief, step-by-step explanation for the model choice, referencing the rubric." }, "model_choice": { "type": "string", "enum": [FLASH_MODEL, PRO_MODEL] } }, "required": ["reasoning", "model_choice"] } response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=user_input, config={ "system_instruction": CLASSIFIER_SYSTEM_PROMPT, "response_mime_type": "application/json", "response_json_schema": response_schema }, ) print(response.text) # { # "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation to identify the root cause, which falls under 'Deep Debugging & Root Cause Analysis'.", # "model_choice": "pro" # } COMMAND_BLOCK: FLASH_MODEL = 'flash' PRO_MODEL = 'pro' CLASSIFIER_SYSTEM_PROMPT = f""" You are a specialized Task Routing AI. Your sole function is to analyze the user's request and classify its complexity. Choose between `{FLASH_MODEL}` (SIMPLE) or `{PRO_MODEL}` (COMPLEX). 1. `{FLASH_MODEL}`: A fast, efficient model for simple, well-defined tasks. 2. `{PRO_MODEL}`: A powerful, advanced model for complex, open-ended, or multi-step tasks. A task is COMPLEX if it meets ONE OR MORE of the following criteria: 1. High Operational Complexity (Est. 4+ Steps/Tool Calls) 2. Strategic Planning and Conceptual Design 3. High Ambiguity or Large Scope 4. Deep Debugging and Root Cause Analysis A task is SIMPLE if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls). """ user_input = "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?" response_schema = { "type": "object", "properties": { "reasoning": { "type": "string", "description": "A brief, step-by-step explanation for the model choice, referencing the rubric." }, "model_choice": { "type": "string", "enum": [FLASH_MODEL, PRO_MODEL] } }, "required": ["reasoning", "model_choice"] } response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents=user_input, config={ "system_instruction": CLASSIFIER_SYSTEM_PROMPT, "response_mime_type": "application/json", "response_json_schema": response_schema }, ) print(response.text) # { # "reasoning": "The user is reporting an error symptom without a known cause. This requires investigation to identify the root cause, which falls under 'Deep Debugging & Root Cause Analysis'.", # "model_choice": "pro" # } CODE_BLOCK: response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents="How does AI work?", config={ "thinking_config": {"thinking_level": "high"} }, ) print(response.text) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents="How does AI work?", config={ "thinking_config": {"thinking_level": "high"} }, ) print(response.text) CODE_BLOCK: response = client.models.generate_content( model="gemini-3.1-flash-lite-preview", contents="How does AI work?", config={ "thinking_config": {"thinking_level": "high"} }, ) print(response.text) COMMAND_BLOCK: # Create a JSONL file with your requests and upload it uploaded_batch_requests = client.files.upload(file="batch_requests.json") # Create the batch job batch_job = client.batches.create( model="gemini-3.1-flash-lite-preview", src=uploaded_batch_requests.name, config={'display_name': "batch_job-1"} ) print(f"Created batch job: {batch_job.name}") # Wait for up to 24 hours if batch_job.state.name == 'JOB_STATE_SUCCEEDED': result_file_name = batch_job.dest.file_name file_content_bytes = client.files.download(file=result_file_name) file_content = file_content_bytes.decode('utf-8') for line in file_content.splitlines(): print(line Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create a JSONL file with your requests and upload it uploaded_batch_requests = client.files.upload(file="batch_requests.json") # Create the batch job batch_job = client.batches.create( model="gemini-3.1-flash-lite-preview", src=uploaded_batch_requests.name, config={'display_name': "batch_job-1"} ) print(f"Created batch job: {batch_job.name}") # Wait for up to 24 hours if batch_job.state.name == 'JOB_STATE_SUCCEEDED': result_file_name = batch_job.dest.file_name file_content_bytes = client.files.download(file=result_file_name) file_content = file_content_bytes.decode('utf-8') for line in file_content.splitlines(): print(line COMMAND_BLOCK: # Create a JSONL file with your requests and upload it uploaded_batch_requests = client.files.upload(file="batch_requests.json") # Create the batch job batch_job = client.batches.create( model="gemini-3.1-flash-lite-preview", src=uploaded_batch_requests.name, config={'display_name': "batch_job-1"} ) print(f"Created batch job: {batch_job.name}") # Wait for up to 24 hours if batch_job.state.name == 'JOB_STATE_SUCCEEDED': result_file_name = batch_job.dest.file_name file_content_bytes = client.files.download(file=result_file_name) file_content = file_content_bytes.decode('utf-8') for line in file_content.splitlines(): print(line - Gemini 3.1 Flash-Lite model card - Gemini 3 developer guide - AI Studio model playground