Tools: Using AutoGen to automate wiki content review

Tools: Using AutoGen to automate wiki content review

Source: Dev.to

What Are AutoGen and Ollama? ## Requirements ## Starting Ollama ## Setting Up the AutoGen Agents ## Step 1: Configure the LLM ## Step 2: Create the Content Evaluation Agent ## Step 3: Execute the Evaluation Prompt ## Analyzing the Results ## Problems Encountered and Solutions ## 1. Prompt Debugging Difficulty ## 2. Unreliable JSON Output ## 3. Conditional Instructions Fail ## Future Improvements ## 1. Vector Database Integration ## 2. Purpose-Aware Evaluation ## Conclusion Using AI to review documentation wikis, identify inconsistencies, and suggest structural improvements. This post explains how to perform this analysis locally without needing API keys. We’ll use AutoGen and Ollama to analyze a documentation wiki, examining both its content and hierarchy, and then ask AI agents to propose improvements. AutoGen is an open-source, multi-agent framework developed by Microsoft designed to simplify the creation and orchestration of applications powered by Large Language Models (LLMs). It enables developers to create AI agent systems where multiple, specialized agents communicate with each other, use tools, and incorporate human feedback to solve complex tasks. Ollama is an open-source tool designed to simplify running and managing Large Language Models (LLMs) directly on your local machine (computer or server). It acts as a bridge between powerful open-source models (such as Llama, Mistral, and Gemma) and your hardware, making it easy to use AI without needing deep technical expertise. To follow this tutorial, you will need: Once you have these prerequisites, proceed to set up your Python environment: First, download and install a model in Ollama. For this tutorial, we'll use the gemma3:4B model: Next, start the Ollama server. This step is essential—the Python script will connect to this server at http://localhost:11434/v1: Important: Ensure the Ollama server is running before executing your Python script. You should see output confirming the server is listening. Now, let's create a Python script to set up the AutoGen agents that will analyze the documentation. First, configure the LLM settings: Setting temperature to 0 ensures deterministic, consistent responses from the model. Next, create an agent to evaluate the quality of individual documentation files: The system prompt makes it clear that this agent should evaluate content, not rewrite it. Now, execute the evaluation prompt for each file. The prompt explicitly requires JSON output, which makes it easy to parse results programmatically. The following code shows how to do that. Note on large files: If you're evaluating large documentation files, consider truncating the content to avoid exceeding token limits. Add this before sending the prompt: The full code that processes results and generates a markdown report is available in my GitHub repository: documentation-advises. You can find the complete implementation in doc_review_agents.py. The script generates a markdown report with: Example report output: During my tests, I faced several challenges: Problem: There's no easy way to debug prompts sent to the LLM. If output is unexpected, testing becomes tedious. Problem: The LLM sometimes returns invalid JSON or mixes JSON with explanatory text (not following the prompt request). Problem: Conditional instructions like "if content is truncated, do X, else do Y" are often ignored by LLMs. Here are some enhancements I'm considering for this approach: Store document embeddings in a local vector database (e.g. ChromaDB) to enable semantic comparison across files (without using the LLM). This would help detect duplicate content or similar documentation that could be consolidated. Create evaluation prompts that understand document purpose: This would improve accuracy of quality assessments. Using AutoGen and Ollama provides a practical way to automate documentation quality checks and structural analysis. While LLMs have limitations (non-deterministic output, occasional errors), these can be mitigated with careful prompt design, validation, and error handling. The approach is particularly valuable for teams maintaining large documentation repositories where manual review is impractical. Start small, validate results, and gradually expand the scope of automation. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: pip install autogen ag2[openai] Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: pip install autogen ag2[openai] COMMAND_BLOCK: pip install autogen ag2[openai] CODE_BLOCK: ollama pull gemma3:4b Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ollama pull gemma3:4b CODE_BLOCK: ollama pull gemma3:4b CODE_BLOCK: ollama serve Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ollama serve CODE_BLOCK: ollama serve COMMAND_BLOCK: OLLAMA_MODEL = "gemma3:4b" OLLAMA_BASE_URL = "http://localhost:11434/v1" llm_config = { "model": OLLAMA_MODEL, "base_url": OLLAMA_BASE_URL, "api_key": "ollama", "temperature": 0, # Set to 0 for deterministic output } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: OLLAMA_MODEL = "gemma3:4b" OLLAMA_BASE_URL = "http://localhost:11434/v1" llm_config = { "model": OLLAMA_MODEL, "base_url": OLLAMA_BASE_URL, "api_key": "ollama", "temperature": 0, # Set to 0 for deterministic output } COMMAND_BLOCK: OLLAMA_MODEL = "gemma3:4b" OLLAMA_BASE_URL = "http://localhost:11434/v1" llm_config = { "model": OLLAMA_MODEL, "base_url": OLLAMA_BASE_URL, "api_key": "ollama", "temperature": 0, # Set to 0 for deterministic output } CODE_BLOCK: from autogen import AssistantAgent DOC_TYPE = "setup guide" DOC_LANGUAGE = "English" content_agent = AssistantAgent( name="ContentAgent", llm_config=llm_config, system_message=f""" You evaluate individual markdown files as follows: - document type is {DOC_TYPE} - language is {DOC_LANGUAGE} - the evaluation should return a score between 0 and 1, where 1 is best - this is an evaluation task; do not suggest rewrites """ ) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: from autogen import AssistantAgent DOC_TYPE = "setup guide" DOC_LANGUAGE = "English" content_agent = AssistantAgent( name="ContentAgent", llm_config=llm_config, system_message=f""" You evaluate individual markdown files as follows: - document type is {DOC_TYPE} - language is {DOC_LANGUAGE} - the evaluation should return a score between 0 and 1, where 1 is best - this is an evaluation task; do not suggest rewrites """ ) CODE_BLOCK: from autogen import AssistantAgent DOC_TYPE = "setup guide" DOC_LANGUAGE = "English" content_agent = AssistantAgent( name="ContentAgent", llm_config=llm_config, system_message=f""" You evaluate individual markdown files as follows: - document type is {DOC_TYPE} - language is {DOC_LANGUAGE} - the evaluation should return a score between 0 and 1, where 1 is best - this is an evaluation task; do not suggest rewrites """ ) COMMAND_BLOCK: ... for path in files: with open(path, "r", encoding="utf-8") as f: content = f.read() content_prompt = f""" You are a documentation-quality evaluator. Evaluate this markdown file and return ONLY valid JSON (either a raw JSON object or a fenced ``` {% endraw %} json block). Do NOT include any extra text, commentary, or explanations. Output requirements (MANDATORY): - Reply with exactly one JSON object with these top-level keys and types: - path (string): must equal the provided path. - score (number): 0.00 to 1.00 (float). Holistic quality combining clarity, correctness, and completeness. Round to two decimal places. - status (string): one of "OK", "WARN", or "FAIL" determined by score as follows: - score >= 0.70 -> "OK" - 0.50 <= score < 0.70 -> "WARN" - score < 0.50 -> "FAIL" - notes (string, optional): up to 300 characters with concise diagnostic observations (do NOT include rewritten text or long examples). Validation rules: - The 'path' value must exactly match the provided path. - Numeric fields must be within [0.00, 1.00] and formatted with two decimal places. - Do not include any additional top-level keys beyond path, score, status, notes. Example valid response: {{"path":"{path}","score":0.78,"status":"WARN","notes":"Clear structure but missing prerequisites section."}} Input (do not modify): - path: {path} - content: {content} """ reply = content_agent.generate_reply( messages=[{"role": "user", "content": content_prompt}] ) ... {% raw %} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: ... for path in files: with open(path, "r", encoding="utf-8") as f: content = f.read() content_prompt = f""" You are a documentation-quality evaluator. Evaluate this markdown file and return ONLY valid JSON (either a raw JSON object or a fenced ``` {% endraw %} json block). Do NOT include any extra text, commentary, or explanations. Output requirements (MANDATORY): - Reply with exactly one JSON object with these top-level keys and types: - path (string): must equal the provided path. - score (number): 0.00 to 1.00 (float). Holistic quality combining clarity, correctness, and completeness. Round to two decimal places. - status (string): one of "OK", "WARN", or "FAIL" determined by score as follows: - score >= 0.70 -> "OK" - 0.50 <= score < 0.70 -> "WARN" - score < 0.50 -> "FAIL" - notes (string, optional): up to 300 characters with concise diagnostic observations (do NOT include rewritten text or long examples). Validation rules: - The 'path' value must exactly match the provided path. - Numeric fields must be within [0.00, 1.00] and formatted with two decimal places. - Do not include any additional top-level keys beyond path, score, status, notes. Example valid response: {{"path":"{path}","score":0.78,"status":"WARN","notes":"Clear structure but missing prerequisites section."}} Input (do not modify): - path: {path} - content: {content} """ reply = content_agent.generate_reply( messages=[{"role": "user", "content": content_prompt}] ) ... {% raw %} COMMAND_BLOCK: ... for path in files: with open(path, "r", encoding="utf-8") as f: content = f.read() content_prompt = f""" You are a documentation-quality evaluator. Evaluate this markdown file and return ONLY valid JSON (either a raw JSON object or a fenced ``` {% endraw %} json block). Do NOT include any extra text, commentary, or explanations. Output requirements (MANDATORY): - Reply with exactly one JSON object with these top-level keys and types: - path (string): must equal the provided path. - score (number): 0.00 to 1.00 (float). Holistic quality combining clarity, correctness, and completeness. Round to two decimal places. - status (string): one of "OK", "WARN", or "FAIL" determined by score as follows: - score >= 0.70 -> "OK" - 0.50 <= score < 0.70 -> "WARN" - score < 0.50 -> "FAIL" - notes (string, optional): up to 300 characters with concise diagnostic observations (do NOT include rewritten text or long examples). Validation rules: - The 'path' value must exactly match the provided path. - Numeric fields must be within [0.00, 1.00] and formatted with two decimal places. - Do not include any additional top-level keys beyond path, score, status, notes. Example valid response: {{"path":"{path}","score":0.78,"status":"WARN","notes":"Clear structure but missing prerequisites section."}} Input (do not modify): - path: {path} - content: {content} """ reply = content_agent.generate_reply( messages=[{"role": "user", "content": content_prompt}] ) ... {% raw %} COMMAND_BLOCK: python MAX_CONTENT_LENGTH = 4000 if len(content) > MAX_CONTENT_LENGTH: content = content[:MAX_CONTENT_LENGTH] + "\n... [content truncated] ..." # Note this in your prompt so the evaluator knows Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: python MAX_CONTENT_LENGTH = 4000 if len(content) > MAX_CONTENT_LENGTH: content = content[:MAX_CONTENT_LENGTH] + "\n... [content truncated] ..." # Note this in your prompt so the evaluator knows COMMAND_BLOCK: python MAX_CONTENT_LENGTH = 4000 if len(content) > MAX_CONTENT_LENGTH: content = content[:MAX_CONTENT_LENGTH] + "\n... [content truncated] ..." # Note this in your prompt so the evaluator knows - Ollama installed on your local machine. You can download it from Ollama's official website. - A sample documentation repository (or use your own). In my case, I used the Kubernetes official documentation. - Python 3.13 (note: Python 3.14 may not yet be fully supported by all dependencies) - The ag2[openai] dependency is only needed because if not installed, autogen raises runtime errors - Folder & File Moves: Structural improvements recommended by the AI - Document Quality Scores: Individual file assessments with status (OK/WARN/FAIL) - Use Ollama's desktop app to test prompts interactively before integrating them - Log all prompts and responses to a file for analysis - Start with simple, single-purpose prompts before adding complexity - Implement validation: check for required fields before processing - Set temperature: 0 for deterministic output - Avoid conditionals; use explicit, imperative instructions instead - Pre-process data before sending (truncate files yourself rather than asking the model to) - Keep prompts focused on a single task - index.md files should provide an overview of the folder's documentation - Setup guides should explain installation and initial configuration - Tutorial pages should include step-by-step instructions with expected outputs - AutoGen Documentation - Ollama Official Website - Full Code Repository - Complete Implementation