Applying Sidecar 🏎️ pattern to OpenLLMetry using Bob!

Applying Sidecar 🏎️ pattern to OpenLLMetry using Bob!

Source: Dev.to

TL;DR β€” What is traceloop’s OpenLLMetry ## Second TL;DR β€” Sidecar pattern ## Evolving Observability: Moving to a Sidecar Pattern ## The Vision: Pluggable Observability ## Building with Bob ## The Logic and Implementaion ## Architecting for Flexibility: Project Structure and Multi-Platform Support ## Project Overview ## Comparison: Sidecar vs Built-in Tracing ## Final Thoughts: Seamless Deployment Across the Ecosystem Building a sidecar OpenLLMetry by using Bob’s talents πŸ€– Traceloop OpenLLMetry is an open-source observability framework built on top of OpenTelemetry, specifically designed to provide deep visibility into the execution of Large Language Model (LLM) applications. It enables developers to monitor and debug their AI systems by automatically instrumenting popular LLM providers (like OpenAI, Anthropic, and Azure) and vector databases (such as Pinecone, Milvus, or Chroma). By integrating OpenLLMetry, you gain access to high-fidelity distributed tracing, allowing you to visualize the entire lifecycle of a request β€” from the initial prompt and retrieval-augmented generation (RAG) steps to the final model response β€” ensuring you can pinpoint bottlenecks, evaluate model performance, and track token usage across your infrastructure. The sidecar design pattern functions by deploying a secondary β€œsidecar” container alongside a primary application container within the same execution environment, such as a Kubernetes pod or a shared network namespace. The core logic relies on separation of concerns, where the application container remains focused exclusively on its business logic while the sidecar handles cross-cutting tasks like distributed tracing, logging, or proxying traffic. A fundamental pre-requisite for this pattern is the use of container images; both the application and the sidecar must be packaged as independent images to allow them to be β€œplugged” together. This modularity enables the sidecar to intercept requests β€” such as LLM API calls β€” and add OpenTelemetry instrumentation without requiring any code changes to the primary application image. In one of my previous explorations, I demonstrated a standalone implementation of OpenLLMetry. We saw how straightforward it is to integrate into a Python application, requiring just a few lines of code to unlock deep visibility into LLM calls. While powerful, that approach requires modifying the core application code. Basically doing this (excerpt from traceloop documentation) ‡️ This time, I wanted to push the architecture further by decoupling the monitoring logic from the business logic. The goal was to implement OpenLLMetry using a β€œsidecar” design pattern. This approach makes the observability layer almost entirely pluggable, allowing it to be attached to virtually any application container without cluttering the primary codebase. To bring this modular architecture to life, I teamed up with my new AI partner, IBM Bob. I tasked Bob with building a robust application that could serve as the primary service, while I focused on engineering the side-car to capture traces, monitor performance, and manage the OpenTelemetry export pipeline. The project demonstrates a sidecar design pattern for implementing LLM observability using OpenLLMetry, allowing to add distributed tracing to applications without modifying any core code. By deploying an independent TraceLoop sidecar proxy alongside a primary service β€” such as the one built by IBM Bob β€” all HTTP traffic to the LLM engine (e.g., Ollama) is intercepted and instrumented with OpenTelemetry spans. This architecture ensures a clean separation of concerns: the application remains focused on business logic while the sidecar captures high-fidelity metadata, including prompts, responses, and token usage, before forwarding traces to a collector and visualization tool like Jaeger. The main idea of this logic is the seperation of concerns! πŸͺ‚ Let’s jump into some practical example. The main sample application (a very basic chat application using Ollama and Granite). Now, we build a β€œproxy” πŸ–‡οΈ application which would be the sidecar implementation. The project is engineered with a modular structure to support various deployment strategies, ensuring that observability remains β€œpluggable” regardless of the environment. Because my personal development workflow relies on Podman and Minikube, I collaborated with Bob to design several implementation types. Bob helped architect a structure that separates the pure business logic of the application from the tracing infrastructure. This resulted in a comprehensive setup where OpenLLMetry operates as a transparent proxy, intercepting traffic between the application and the LLM engine. Whether deploying via Docker Compose for quick local testing or using Kubernetes (Minikube) for a production-grade simulation, the sidecar pattern remains consistent. The application could be deployed and tested using Docker/Podman, Minikube and adaptable to other Kuberntes flavors! Several scripts are provided to start/stop the applications, trace logs in case of errors. A thorough Podman dcumentation was generated according to my request from Bob.πŸ˜‰ Last but not least, as bonus, a sample application in Python to test OpenLLMetry directly is provided as well πŸ‘¨β€πŸ’», with the objective to provide pros and cons of such implementation! To wrap up, this implementation demonstrates that advanced LLM observability doesn’t require complex code changes. By leveraging the sidecar pattern, the application is ready for immediate deployment and testing across a wide range of environments. Whether you are using Docker or Podman for local development, or orchestrating via Minikube, this setup is designed to be effortlessly adaptable to any Kubernetes flavor. Thanks to Bob’s help in structuring the project, you can now plug high-fidelity tracing into your AI workflows with a single command, regardless of your infrastructure. Thanks for reading 🍻 Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: ####################### pip install traceloop-sdk ####################### #... import os from openai import OpenAI from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation") def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: ####################### pip install traceloop-sdk ####################### #... import os from openai import OpenAI from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation") def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content COMMAND_BLOCK: ####################### pip install traceloop-sdk ####################### #... import os from openai import OpenAI from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation") def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content CODE_BLOCK: Application Code: Observability Infrastructure: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Business β”‚ β”‚ TraceLoop β”‚ β”‚ Logic Only │───────▢│ Sidecar β”‚ β”‚ β”‚ β”‚ (Tracing Proxy) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OTel Collector β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Jaeger (Storage) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Application Code: Observability Infrastructure: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Business β”‚ β”‚ TraceLoop β”‚ β”‚ Logic Only │───────▢│ Sidecar β”‚ β”‚ β”‚ β”‚ (Tracing Proxy) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OTel Collector β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Jaeger (Storage) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ CODE_BLOCK: Application Code: Observability Infrastructure: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Business β”‚ β”‚ TraceLoop β”‚ β”‚ Logic Only │───────▢│ Sidecar β”‚ β”‚ β”‚ β”‚ (Tracing Proxy) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OTel Collector β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Jaeger (Storage) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ COMMAND_BLOCK: #!/usr/bin/env python3 """ Simple Ollama Application - No Tracing Code This application uses Ollama for LLM inference without any built-in tracing. Tracing will be handled by the TraceLoop sidecar. """ import os import sys import time import logging from datetime import datetime from pathlib import Path import ollama from flask import Flask, request, jsonify app = Flask(__name__) # Configuration OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest") LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory LOG_DIR = Path("./logs") LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ] ) logger = logging.getLogger(__name__) # Log startup information logger.info("=" * 60) logger.info("Simple Ollama Application Starting") logger.info("=" * 60) logger.info(f"Ollama Host: {OLLAMA_HOST}") logger.info(f"Model: {OLLAMA_MODEL}") logger.info(f"Log Level: {LOG_LEVEL}") logger.info(f"Log File: {log_file}") logger.info("=" * 60) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST']) def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST']) def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: #!/usr/bin/env python3 """ Simple Ollama Application - No Tracing Code This application uses Ollama for LLM inference without any built-in tracing. Tracing will be handled by the TraceLoop sidecar. """ import os import sys import time import logging from datetime import datetime from pathlib import Path import ollama from flask import Flask, request, jsonify app = Flask(__name__) # Configuration OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest") LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory LOG_DIR = Path("./logs") LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ] ) logger = logging.getLogger(__name__) # Log startup information logger.info("=" * 60) logger.info("Simple Ollama Application Starting") logger.info("=" * 60) logger.info(f"Ollama Host: {OLLAMA_HOST}") logger.info(f"Model: {OLLAMA_MODEL}") logger.info(f"Log Level: {LOG_LEVEL}") logger.info(f"Log File: {log_file}") logger.info("=" * 60) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST']) def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST']) def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK: #!/usr/bin/env python3 """ Simple Ollama Application - No Tracing Code This application uses Ollama for LLM inference without any built-in tracing. Tracing will be handled by the TraceLoop sidecar. """ import os import sys import time import logging from datetime import datetime from pathlib import Path import ollama from flask import Flask, request, jsonify app = Flask(__name__) # Configuration OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest") LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory LOG_DIR = Path("./logs") LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ] ) logger = logging.getLogger(__name__) # Log startup information logger.info("=" * 60) logger.info("Simple Ollama Application Starting") logger.info("=" * 60) logger.info(f"Ollama Host: {OLLAMA_HOST}") logger.info(f"Model: {OLLAMA_MODEL}") logger.info(f"Log Level: {LOG_LEVEL}") logger.info(f"Log File: {log_file}") logger.info("=" * 60) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST']) def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST']) def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK: #!/usr/bin/env python3 """ TraceLoop Sidecar - OpenLLMetry Tracing Proxy This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls. It acts as a transparent proxy that adds LLM-specific tracing. """ import os import json import requests from flask import Flask, request, Response from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow, task from opentelemetry import trace app = Flask(__name__) # Configuration OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434") OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar") TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70) print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy") print("=" * 70) print(f"Upstream Ollama: {OLLAMA_UPSTREAM}") print(f"OTEL Endpoint: {OTEL_ENDPOINT}") print(f"Service Name: {SERVICE_NAME}") print(f"Traced Service: {TRACED_SERVICE_NAME}") print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("βœ“ OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup init_tracing() tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call") def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy") @app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) @app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: #!/usr/bin/env python3 """ TraceLoop Sidecar - OpenLLMetry Tracing Proxy This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls. It acts as a transparent proxy that adds LLM-specific tracing. """ import os import json import requests from flask import Flask, request, Response from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow, task from opentelemetry import trace app = Flask(__name__) # Configuration OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434") OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar") TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70) print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy") print("=" * 70) print(f"Upstream Ollama: {OLLAMA_UPSTREAM}") print(f"OTEL Endpoint: {OTEL_ENDPOINT}") print(f"Service Name: {SERVICE_NAME}") print(f"Traced Service: {TRACED_SERVICE_NAME}") print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("βœ“ OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup init_tracing() tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call") def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy") @app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) @app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK: #!/usr/bin/env python3 """ TraceLoop Sidecar - OpenLLMetry Tracing Proxy This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls. It acts as a transparent proxy that adds LLM-specific tracing. """ import os import json import requests from flask import Flask, request, Response from traceloop.sdk import Traceloop from traceloop.sdk.decorators import workflow, task from opentelemetry import trace app = Flask(__name__) # Configuration OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434") OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar") TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70) print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy") print("=" * 70) print(f"Upstream Ollama: {OLLAMA_UPSTREAM}") print(f"OTEL Endpoint: {OTEL_ENDPOINT}") print(f"Service Name: {SERVICE_NAME}") print(f"Traced Service: {TRACED_SERVICE_NAME}") print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("βœ“ OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup init_tracing() tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET']) def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call") def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy") @app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) @app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH']) def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK: . β”œβ”€β”€ ollama-simple-app/ # Application WITHOUT tracing code β”‚ β”œβ”€β”€ app.py # Pure Flask app using Ollama β”‚ β”œβ”€β”€ requirements.txt # No OpenTelemetry dependencies! β”‚ └── Dockerfile β”œβ”€β”€ traceloop-sidecar/ # Independent tracing sidecar β”‚ β”œβ”€β”€ proxy.py # Transparent tracing proxy β”‚ β”œβ”€β”€ requirements.txt # OpenTelemetry dependencies here β”‚ └── Dockerfile β”œβ”€β”€ ollama-app/ # (Optional) App with built-in tracing β”‚ └── ... # For comparison purposes β”œβ”€β”€ collector/ # OpenTelemetry Collector β”‚ β”œβ”€β”€ otel-collector-config.yaml β”‚ └── Dockerfile β”œβ”€β”€ k8s/ # Kubernetes manifests β”‚ β”œβ”€β”€ 00-namespace.yaml β”‚ β”œβ”€β”€ 01-otel-collector.yaml β”‚ β”œβ”€β”€ 02-ollama.yaml β”‚ β”œβ”€β”€ 04-jaeger.yaml β”‚ └── 05-ollama-simple-app.yaml # Sidecar deployment β”œβ”€β”€ docker-compose/ β”‚ └── docker-compose.yaml β”œβ”€β”€ start-all.sh # Utility: Start all services β”œβ”€β”€ stop-all.sh # Utility: Stop all services β”œβ”€β”€ push-to-github.sh # Utility: Push to GitHub └── README.md Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: . β”œβ”€β”€ ollama-simple-app/ # Application WITHOUT tracing code β”‚ β”œβ”€β”€ app.py # Pure Flask app using Ollama β”‚ β”œβ”€β”€ requirements.txt # No OpenTelemetry dependencies! β”‚ └── Dockerfile β”œβ”€β”€ traceloop-sidecar/ # Independent tracing sidecar β”‚ β”œβ”€β”€ proxy.py # Transparent tracing proxy β”‚ β”œβ”€β”€ requirements.txt # OpenTelemetry dependencies here β”‚ └── Dockerfile β”œβ”€β”€ ollama-app/ # (Optional) App with built-in tracing β”‚ └── ... # For comparison purposes β”œβ”€β”€ collector/ # OpenTelemetry Collector β”‚ β”œβ”€β”€ otel-collector-config.yaml β”‚ └── Dockerfile β”œβ”€β”€ k8s/ # Kubernetes manifests β”‚ β”œβ”€β”€ 00-namespace.yaml β”‚ β”œβ”€β”€ 01-otel-collector.yaml β”‚ β”œβ”€β”€ 02-ollama.yaml β”‚ β”œβ”€β”€ 04-jaeger.yaml β”‚ └── 05-ollama-simple-app.yaml # Sidecar deployment β”œβ”€β”€ docker-compose/ β”‚ └── docker-compose.yaml β”œβ”€β”€ start-all.sh # Utility: Start all services β”œβ”€β”€ stop-all.sh # Utility: Stop all services β”œβ”€β”€ push-to-github.sh # Utility: Push to GitHub └── README.md COMMAND_BLOCK: . β”œβ”€β”€ ollama-simple-app/ # Application WITHOUT tracing code β”‚ β”œβ”€β”€ app.py # Pure Flask app using Ollama β”‚ β”œβ”€β”€ requirements.txt # No OpenTelemetry dependencies! β”‚ └── Dockerfile β”œβ”€β”€ traceloop-sidecar/ # Independent tracing sidecar β”‚ β”œβ”€β”€ proxy.py # Transparent tracing proxy β”‚ β”œβ”€β”€ requirements.txt # OpenTelemetry dependencies here β”‚ └── Dockerfile β”œβ”€β”€ ollama-app/ # (Optional) App with built-in tracing β”‚ └── ... # For comparison purposes β”œβ”€β”€ collector/ # OpenTelemetry Collector β”‚ β”œβ”€β”€ otel-collector-config.yaml β”‚ └── Dockerfile β”œβ”€β”€ k8s/ # Kubernetes manifests β”‚ β”œβ”€β”€ 00-namespace.yaml β”‚ β”œβ”€β”€ 01-otel-collector.yaml β”‚ β”œβ”€β”€ 02-ollama.yaml β”‚ β”œβ”€β”€ 04-jaeger.yaml β”‚ └── 05-ollama-simple-app.yaml # Sidecar deployment β”œβ”€β”€ docker-compose/ β”‚ └── docker-compose.yaml β”œβ”€β”€ start-all.sh # Utility: Start all services β”œβ”€β”€ stop-all.sh # Utility: Stop all services β”œβ”€β”€ push-to-github.sh # Utility: Push to GitHub └── README.md COMMAND_BLOCK: #!/usr/bin/env python3 """ Sample Ollama Application with OpenLLMetry Tracing This application demonstrates how to use Ollama with OpenTelemetry tracing """ import os import time from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from traceloop.sdk import Traceloop import ollama # Initialize OpenTelemetry with Traceloop def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: #!/usr/bin/env python3 """ Sample Ollama Application with OpenLLMetry Tracing This application demonstrates how to use Ollama with OpenTelemetry tracing """ import os import time from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from traceloop.sdk import Traceloop import ollama # Initialize OpenTelemetry with Traceloop def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob COMMAND_BLOCK: #!/usr/bin/env python3 """ Sample Ollama Application with OpenLLMetry Tracing This application demonstrates how to use Ollama with OpenTelemetry tracing """ import os import time from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from traceloop.sdk import Traceloop import ollama # Initialize OpenTelemetry with Traceloop def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob CODE_BLOCK: | Aspect | Sidecar (This Project) | Built-in Tracing | | ---------------- | ---------------------- | -------------------- | | Code changes | ❌ None | βœ… Required | | Dependencies | ❌ None in app | βœ… OpenTelemetry libs | | Language support | βœ… Any | ⚠️ Language-specific | | Maintenance | βœ… Centralized | ⚠️ Per application | | Performance | ⚠️ Extra hop | βœ… Direct | | Flexibility | ⚠️ HTTP only | βœ… Any protocol | Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: | Aspect | Sidecar (This Project) | Built-in Tracing | | ---------------- | ---------------------- | -------------------- | | Code changes | ❌ None | βœ… Required | | Dependencies | ❌ None in app | βœ… OpenTelemetry libs | | Language support | βœ… Any | ⚠️ Language-specific | | Maintenance | βœ… Centralized | ⚠️ Per application | | Performance | ⚠️ Extra hop | βœ… Direct | | Flexibility | ⚠️ HTTP only | βœ… Any protocol | CODE_BLOCK: | Aspect | Sidecar (This Project) | Built-in Tracing | | ---------------- | ---------------------- | -------------------- | | Code changes | ❌ None | βœ… Required | | Dependencies | ❌ None in app | βœ… OpenTelemetry libs | | Language support | βœ… Any | ⚠️ Language-specific | | Maintenance | βœ… Centralized | ⚠️ Per application | | Performance | ⚠️ Extra hop | βœ… Direct | | Flexibility | ⚠️ HTTP only | βœ… Any protocol | - ollama-simple-app/: The core application built by Bob, containing pure business logic with zero tracing code or OpenTelemetry dependencies. - traceloop-sidecar/: The independent tracing proxy that provides the "sidecar" functionality. - k8s/ & docker-compose/: Deployment manifests specifically tailored for different container engines, including specialized support for Podman users. - Utility Scripts: A suite of automated tools (like deploy-podman.sh) to streamline building, loading, and deploying images across these diverse environments. By leveraging independent container images for both the application and the sidecar, we can ensure that the tracing layer can be updated, scaled, or swapped out without ever needing to modify the β€˜original’ application code. - GitHub Code Repository: https://github.com/aairom/OpenLLMetry-SideCar - Traceloop OpenLLMetry: https://www.traceloop.com/docs/openllmetry/introduction - IBM Project Bob: https://www.ibm.com/products/bob - Sidecar Pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/sidecar - Sidecar containers: https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/