Tools
Applying Sidecar ποΈ pattern to OpenLLMetry using Bob!
2026-01-02
0 views
admin
TL;DR β What is traceloopβs OpenLLMetry ## Second TL;DR β Sidecar pattern ## Evolving Observability: Moving to a Sidecar Pattern ## The Vision: Pluggable Observability ## Building with Bob ## The Logic and Implementaion ## Architecting for Flexibility: Project Structure and Multi-Platform Support ## Project Overview ## Comparison: Sidecar vs Built-in Tracing ## Final Thoughts: Seamless Deployment Across the Ecosystem Building a sidecar OpenLLMetry by using Bobβs talents π€ Traceloop OpenLLMetry is an open-source observability framework built on top of OpenTelemetry, specifically designed to provide deep visibility into the execution of Large Language Model (LLM) applications. It enables developers to monitor and debug their AI systems by automatically instrumenting popular LLM providers (like OpenAI, Anthropic, and Azure) and vector databases (such as Pinecone, Milvus, or Chroma). By integrating OpenLLMetry, you gain access to high-fidelity distributed tracing, allowing you to visualize the entire lifecycle of a request β from the initial prompt and retrieval-augmented generation (RAG) steps to the final model response β ensuring you can pinpoint bottlenecks, evaluate model performance, and track token usage across your infrastructure. The sidecar design pattern functions by deploying a secondary βsidecarβ container alongside a primary application container within the same execution environment, such as a Kubernetes pod or a shared network namespace. The core logic relies on separation of concerns, where the application container remains focused exclusively on its business logic while the sidecar handles cross-cutting tasks like distributed tracing, logging, or proxying traffic. A fundamental pre-requisite for this pattern is the use of container images; both the application and the sidecar must be packaged as independent images to allow them to be βpluggedβ together. This modularity enables the sidecar to intercept requests β such as LLM API calls β and add OpenTelemetry instrumentation without requiring any code changes to the primary application image. In one of my previous explorations, I demonstrated a standalone implementation of OpenLLMetry. We saw how straightforward it is to integrate into a Python application, requiring just a few lines of code to unlock deep visibility into LLM calls. While powerful, that approach requires modifying the core application code. Basically doing this (excerpt from traceloop documentation) β€΅οΈ This time, I wanted to push the architecture further by decoupling the monitoring logic from the business logic. The goal was to implement OpenLLMetry using a βsidecarβ design pattern. This approach makes the observability layer almost entirely pluggable, allowing it to be attached to virtually any application container without cluttering the primary codebase. To bring this modular architecture to life, I teamed up with my new AI partner, IBM Bob. I tasked Bob with building a robust application that could serve as the primary service, while I focused on engineering the side-car to capture traces, monitor performance, and manage the OpenTelemetry export pipeline. The project demonstrates a sidecar design pattern for implementing LLM observability using OpenLLMetry, allowing to add distributed tracing to applications without modifying any core code. By deploying an independent TraceLoop sidecar proxy alongside a primary service β such as the one built by IBM Bob β all HTTP traffic to the LLM engine (e.g., Ollama) is intercepted and instrumented with OpenTelemetry spans. This architecture ensures a clean separation of concerns: the application remains focused on business logic while the sidecar captures high-fidelity metadata, including prompts, responses, and token usage, before forwarding traces to a collector and visualization tool like Jaeger. The main idea of this logic is the seperation of concerns! πͺ Letβs jump into some practical example. The main sample application (a very basic chat application using Ollama and Granite). Now, we build a βproxyβ ποΈ application which would be the sidecar implementation. The project is engineered with a modular structure to support various deployment strategies, ensuring that observability remains βpluggableβ regardless of the environment. Because my personal development workflow relies on Podman and Minikube, I collaborated with Bob to design several implementation types. Bob helped architect a structure that separates the pure business logic of the application from the tracing infrastructure. This resulted in a comprehensive setup where OpenLLMetry operates as a transparent proxy, intercepting traffic between the application and the LLM engine. Whether deploying via Docker Compose for quick local testing or using Kubernetes (Minikube) for a production-grade simulation, the sidecar pattern remains consistent. The application could be deployed and tested using Docker/Podman, Minikube and adaptable to other Kuberntes flavors! Several scripts are provided to start/stop the applications, trace logs in case of errors. A thorough Podman dcumentation was generated according to my request from Bob.π Last but not least, as bonus, a sample application in Python to test OpenLLMetry directly is provided as well π¨βπ», with the objective to provide pros and cons of such implementation! To wrap up, this implementation demonstrates that advanced LLM observability doesnβt require complex code changes. By leveraging the sidecar pattern, the application is ready for immediate deployment and testing across a wide range of environments. Whether you are using Docker or Podman for local development, or orchestrating via Minikube, this setup is designed to be effortlessly adaptable to any Kubernetes flavor. Thanks to Bobβs help in structuring the project, you can now plug high-fidelity tracing into your AI workflows with a single command, regardless of your infrastructure. Thanks for reading π» Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
#######################
pip install traceloop-sdk
####################### #...
import os from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation")
def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
#######################
pip install traceloop-sdk
####################### #...
import os from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation")
def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content COMMAND_BLOCK:
#######################
pip install traceloop-sdk
####################### #...
import os from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow Traceloop.init(app_name="joke_generation_service") @workflow(name="joke_creation")
def create_joke(): client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], ) return completion.choices[0].message.content CODE_BLOCK:
Application Code: Observability Infrastructure:
ββββββββββββββββ ββββββββββββββββββββ
β Business β β TraceLoop β
β Logic Only βββββββββΆβ Sidecar β
β β β (Tracing Proxy) β
ββββββββββββββββ ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β OTel Collector β ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β Jaeger (Storage) β ββββββββββββββββββββ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Application Code: Observability Infrastructure:
ββββββββββββββββ ββββββββββββββββββββ
β Business β β TraceLoop β
β Logic Only βββββββββΆβ Sidecar β
β β β (Tracing Proxy) β
ββββββββββββββββ ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β OTel Collector β ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β Jaeger (Storage) β ββββββββββββββββββββ CODE_BLOCK:
Application Code: Observability Infrastructure:
ββββββββββββββββ ββββββββββββββββββββ
β Business β β TraceLoop β
β Logic Only βββββββββΆβ Sidecar β
β β β (Tracing Proxy) β
ββββββββββββββββ ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β OTel Collector β ββββββββββ¬ββββββββββ β βΌ ββββββββββββββββββββ β Jaeger (Storage) β ββββββββββββββββββββ COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Simple Ollama Application - No Tracing Code
This application uses Ollama for LLM inference without any built-in tracing.
Tracing will be handled by the TraceLoop sidecar.
"""
import os
import sys
import time
import logging
from datetime import datetime
from pathlib import Path
import ollama
from flask import Flask, request, jsonify app = Flask(__name__) # Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest")
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory
LOG_DIR = Path("./logs")
LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging
logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ]
) logger = logging.getLogger(__name__) # Log startup information
logger.info("=" * 60)
logger.info("Simple Ollama Application Starting")
logger.info("=" * 60)
logger.info(f"Ollama Host: {OLLAMA_HOST}")
logger.info(f"Model: {OLLAMA_MODEL}")
logger.info(f"Log Level: {LOG_LEVEL}")
logger.info(f"Log File: {log_file}")
logger.info("=" * 60) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST'])
def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST'])
def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Simple Ollama Application - No Tracing Code
This application uses Ollama for LLM inference without any built-in tracing.
Tracing will be handled by the TraceLoop sidecar.
"""
import os
import sys
import time
import logging
from datetime import datetime
from pathlib import Path
import ollama
from flask import Flask, request, jsonify app = Flask(__name__) # Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest")
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory
LOG_DIR = Path("./logs")
LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging
logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ]
) logger = logging.getLogger(__name__) # Log startup information
logger.info("=" * 60)
logger.info("Simple Ollama Application Starting")
logger.info("=" * 60)
logger.info(f"Ollama Host: {OLLAMA_HOST}")
logger.info(f"Model: {OLLAMA_MODEL}")
logger.info(f"Log Level: {LOG_LEVEL}")
logger.info(f"Log File: {log_file}")
logger.info("=" * 60) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST'])
def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST'])
def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Simple Ollama Application - No Tracing Code
This application uses Ollama for LLM inference without any built-in tracing.
Tracing will be handled by the TraceLoop sidecar.
"""
import os
import sys
import time
import logging
from datetime import datetime
from pathlib import Path
import ollama
from flask import Flask, request, jsonify app = Flask(__name__) # Configuration
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite3:latest")
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO") # Setup logging directory
LOG_DIR = Path("./logs")
LOG_DIR.mkdir(exist_ok=True) # Create timestamped log file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = LOG_DIR / f"ollama_app_{timestamp}.log" # Configure logging
logging.basicConfig( level=getattr(logging, LOG_LEVEL.upper()), format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ]
) logger = logging.getLogger(__name__) # Log startup information
logger.info("=" * 60)
logger.info("Simple Ollama Application Starting")
logger.info("=" * 60)
logger.info(f"Ollama Host: {OLLAMA_HOST}")
logger.info(f"Model: {OLLAMA_MODEL}")
logger.info(f"Log Level: {LOG_LEVEL}")
logger.info(f"Log File: {log_file}")
logger.info("=" * 60) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" logger.debug("Health check requested") return jsonify({ "status": "healthy", "model": OLLAMA_MODEL, "ollama_host": OLLAMA_HOST, "log_file": str(log_file) }), 200 @app.route('/chat', methods=['POST'])
def chat(): """ Chat endpoint - accepts a prompt and returns a response Request body: { "prompt": "Your question here", "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Chat request received") try: data = request.get_json() if not data or 'prompt' not in data: logger.warning(f"[{request_id}] Missing 'prompt' in request body") return jsonify({"error": "Missing 'prompt' in request body"}), 400 prompt = data['prompt'] model = data.get('model', OLLAMA_MODEL) logger.info(f"[{request_id}] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") logger.info(f"[{request_id}] Model: {model}") logger.info(f"[{request_id}] Ollama Host: {OLLAMA_HOST}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise # Send the prompt start_time = time.time() logger.info(f"[{request_id}] Sending request to Ollama...") try: response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) logger.debug(f"[{request_id}] Received response from Ollama") except Exception as e: logger.error(f"[{request_id}] Ollama request failed: {e}") raise response_text = response['message']['content'] duration = time.time() - start_time logger.info(f"[{request_id}] Response received in {duration:.2f}s") logger.info(f"[{request_id}] Response length: {len(response_text)} characters") logger.debug(f"[{request_id}] Response preview: {response_text[:100]}...") return jsonify({ "prompt": prompt, "response": response_text, "model": model, "duration_seconds": duration, "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing chat request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 @app.route('/batch', methods=['POST'])
def batch_chat(): """ Batch chat endpoint - accepts multiple prompts Request body: { "prompts": ["Question 1", "Question 2", ...], "model": "optional-model-override" } """ request_id = datetime.now().strftime("%Y%m%d_%H%M%S_%f") logger.info(f"[{request_id}] Batch request received") try: data = request.get_json() if not data or 'prompts' not in data: logger.warning(f"[{request_id}] Missing 'prompts' in request body") return jsonify({"error": "Missing 'prompts' in request body"}), 400 prompts = data['prompts'] model = data.get('model', OLLAMA_MODEL) if not isinstance(prompts, list): logger.warning(f"[{request_id}] 'prompts' is not a list") return jsonify({"error": "'prompts' must be a list"}), 400 logger.info(f"[{request_id}] Processing {len(prompts)} prompts") logger.info(f"[{request_id}] Model: {model}") # Create Ollama client try: client = ollama.Client(host=OLLAMA_HOST) logger.debug(f"[{request_id}] Ollama client created successfully") except Exception as e: logger.error(f"[{request_id}] Failed to create Ollama client: {e}") raise results = [] total_start = time.time() for i, prompt in enumerate(prompts, 1): logger.info(f"[{request_id}] Batch {i}/{len(prompts)} - Prompt: {prompt[:50]}...") try: start_time = time.time() response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] duration = time.time() - start_time results.append({ "prompt": prompt, "response": response_text, "duration_seconds": duration }) logger.info(f"[{request_id}] Batch {i}/{len(prompts)} completed in {duration:.2f}s") except Exception as e: logger.error(f"[{request_id}] Batch {i}/{len(prompts)} failed: {e}") results.append({ "prompt": prompt, "error": str(e), "duration_seconds": 0 }) total_duration = time.time() - total_start logger.info(f"[{request_id}] Batch request completed in {total_duration:.2f}s") return jsonify({ "results": results, "model": model, "total_duration_seconds": total_duration, "count": len(results), "request_id": request_id }), 200 except Exception as e: logger.error(f"[{request_id}] Error processing batch request: {str(e)}", exc_info=True) return jsonify({ "error": str(e), "request_id": request_id }), 500 def run_sample_queries(): """Run some sample queries on startup""" logger.info("=" * 60) logger.info("Running sample queries...") logger.info("=" * 60) sample_prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] try: client = ollama.Client(host=OLLAMA_HOST) logger.info(f"Connected to Ollama at {OLLAMA_HOST}") except Exception as e: logger.error(f"Failed to connect to Ollama: {e}") return for i, prompt in enumerate(sample_prompts, 1): logger.info(f"Sample {i}/{len(sample_prompts)} - Prompt: {prompt}") try: start_time = time.time() response = client.chat( model=OLLAMA_MODEL, messages=[{'role': 'user', 'content': prompt}], ) duration = time.time() - start_time response_text = response['message']['content'] logger.info(f"Sample {i}/{len(sample_prompts)} - Completed in {duration:.2f}s") logger.debug(f"Sample {i}/{len(sample_prompts)} - Response: {response_text[:100]}...") except Exception as e: logger.error(f"Sample {i}/{len(sample_prompts)} - Error: {e}", exc_info=True) time.sleep(1) logger.info("=" * 60) logger.info("Sample queries completed!") logger.info("=" * 60) if __name__ == "__main__": # Run sample queries if in standalone mode if os.getenv("RUN_SAMPLES", "true").lower() == "true": try: run_sample_queries() except Exception as e: logger.error(f"Sample queries failed: {e}", exc_info=True) # Start Flask server port = int(os.getenv("PORT", "8080")) logger.info(f"Starting Flask server on port {port}...") logger.info(f"Logs are being written to: {log_file}") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK:
#!/usr/bin/env python3
"""
TraceLoop Sidecar - OpenLLMetry Tracing Proxy
This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls.
It acts as a transparent proxy that adds LLM-specific tracing.
"""
import os
import json
import requests
from flask import Flask, request, Response
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow, task
from opentelemetry import trace app = Flask(__name__) # Configuration
OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434")
OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar")
TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70)
print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy")
print("=" * 70)
print(f"Upstream Ollama: {OLLAMA_UPSTREAM}")
print(f"OTEL Endpoint: {OTEL_ENDPOINT}")
print(f"Service Name: {SERVICE_NAME}")
print(f"Traced Service: {TRACED_SERVICE_NAME}")
print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("β OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup
init_tracing()
tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call")
def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy")
@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
#!/usr/bin/env python3
"""
TraceLoop Sidecar - OpenLLMetry Tracing Proxy
This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls.
It acts as a transparent proxy that adds LLM-specific tracing.
"""
import os
import json
import requests
from flask import Flask, request, Response
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow, task
from opentelemetry import trace app = Flask(__name__) # Configuration
OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434")
OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar")
TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70)
print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy")
print("=" * 70)
print(f"Upstream Ollama: {OLLAMA_UPSTREAM}")
print(f"OTEL Endpoint: {OTEL_ENDPOINT}")
print(f"Service Name: {SERVICE_NAME}")
print(f"Traced Service: {TRACED_SERVICE_NAME}")
print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("β OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup
init_tracing()
tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call")
def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy")
@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK:
#!/usr/bin/env python3
"""
TraceLoop Sidecar - OpenLLMetry Tracing Proxy
This sidecar uses OpenLLMetry (Traceloop SDK) to automatically instrument Ollama API calls.
It acts as a transparent proxy that adds LLM-specific tracing.
"""
import os
import json
import requests
from flask import Flask, request, Response
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow, task
from opentelemetry import trace app = Flask(__name__) # Configuration
OLLAMA_UPSTREAM = os.getenv("OLLAMA_UPSTREAM", "http://ollama:11434")
OTEL_ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "traceloop-sidecar")
TRACED_SERVICE_NAME = os.getenv("TRACED_SERVICE_NAME", "ollama-app") print("=" * 70)
print("TraceLoop Sidecar - OpenLLMetry Tracing Proxy")
print("=" * 70)
print(f"Upstream Ollama: {OLLAMA_UPSTREAM}")
print(f"OTEL Endpoint: {OTEL_ENDPOINT}")
print(f"Service Name: {SERVICE_NAME}")
print(f"Traced Service: {TRACED_SERVICE_NAME}")
print("=" * 70) def init_tracing(): """Initialize OpenLLMetry (Traceloop SDK)""" Traceloop.init( app_name=TRACED_SERVICE_NAME, # Use the application name, not sidecar name disable_batch=False, exporter_otlp_endpoint=OTEL_ENDPOINT, # Enable LLM-specific instrumentation should_enrich_metrics=True, ) print("β OpenLLMetry (Traceloop SDK) initialized successfully") # Initialize tracing on startup
init_tracing()
tracer = trace.get_tracer(__name__) @app.route('/health', methods=['GET'])
def health(): """Health check endpoint""" return {"status": "healthy", "service": SERVICE_NAME}, 200 @task(name="ollama_api_call")
def proxy_ollama_request(method, path, headers, data, query_string): """ Proxy request to Ollama with OpenLLMetry tracing. The @task decorator automatically creates spans and adds LLM attributes. """ # Build upstream URL upstream_url = f"{OLLAMA_UPSTREAM}/{path}" if query_string: upstream_url += f"?{query_string.decode()}" # Parse request data for logging request_data = None if data: try: request_data = json.loads(data) except: request_data = data.decode('utf-8', errors='ignore') # Get current span to add custom attributes current_span = trace.get_current_span() # Add custom attributes current_span.set_attribute("http.method", method) current_span.set_attribute("http.url", upstream_url) current_span.set_attribute("http.target", f"/{path}") current_span.set_attribute("llm.system", "ollama") # Extract and add LLM-specific attributes if request_data and isinstance(request_data, dict): if "model" in request_data: current_span.set_attribute("llm.model", request_data["model"]) # For chat API if "messages" in request_data: messages = request_data["messages"] if messages and len(messages) > 0: last_message = messages[-1] if "content" in last_message: prompt = last_message["content"] current_span.set_attribute("llm.prompts", prompt[:1000]) current_span.set_attribute("llm.request.type", "chat") # For generate API if "prompt" in request_data: current_span.set_attribute("llm.prompts", request_data["prompt"][:1000]) current_span.set_attribute("llm.request.type", "completion") # Log the request print(f"\n[PROXY] {method} /{path}") if request_data and isinstance(request_data, dict): if "model" in request_data: print(f"[PROXY] Model: {request_data['model']}") if "messages" in request_data and request_data["messages"]: print(f"[PROXY] Prompt: {request_data['messages'][-1].get('content', '')[:100]}...") elif "prompt" in request_data: print(f"[PROXY] Prompt: {request_data['prompt'][:100]}...") try: # Forward request to upstream Ollama upstream_response = requests.request( method=method, url=upstream_url, headers={k: v for k, v in headers if k.lower() != 'host'}, data=data, allow_redirects=False, timeout=300 # 5 minutes for model operations ) # Parse response response_data = None try: response_data = upstream_response.json() except: response_data = upstream_response.text # Add response attributes current_span.set_attribute("http.status_code", upstream_response.status_code) # Extract response content if response_data and isinstance(response_data, dict): # For chat API if "message" in response_data: message = response_data["message"] if "content" in message: response_text = message["content"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") # For generate API elif "response" in response_data: response_text = response_data["response"] current_span.set_attribute("llm.responses", response_text[:1000]) current_span.set_attribute("llm.response_length", len(response_text)) print(f"[PROXY] Response: {response_text[:100]}...") print(f"[PROXY] Status: {upstream_response.status_code}") # Return response return Response( upstream_response.content, status=upstream_response.status_code, headers=dict(upstream_response.headers) ) except Exception as e: current_span.set_attribute("error", True) current_span.set_attribute("error.message", str(e)) current_span.record_exception(e) print(f"[PROXY ERROR] {str(e)}") raise @workflow(name="ollama_proxy")
@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def proxy(path): """ Main proxy endpoint. The @workflow decorator creates a parent span for the entire request. """ try: return proxy_ollama_request( method=request.method, path=path, headers=request.headers, data=request.data, query_string=request.query_string ) except Exception as e: return {"error": str(e)}, 500 if __name__ == "__main__": port = int(os.getenv("PORT", "11434")) print(f"\nStarting TraceLoop Sidecar on port {port}...") print(f"Proxying to: {OLLAMA_UPSTREAM}") print(f"Using OpenLLMetry for automatic LLM tracing") print("=" * 70 + "\n") app.run(host='0.0.0.0', port=port, debug=False) # Made with Bob COMMAND_BLOCK:
.
βββ ollama-simple-app/ # Application WITHOUT tracing code
β βββ app.py # Pure Flask app using Ollama
β βββ requirements.txt # No OpenTelemetry dependencies!
β βββ Dockerfile
βββ traceloop-sidecar/ # Independent tracing sidecar
β βββ proxy.py # Transparent tracing proxy
β βββ requirements.txt # OpenTelemetry dependencies here
β βββ Dockerfile
βββ ollama-app/ # (Optional) App with built-in tracing
β βββ ... # For comparison purposes
βββ collector/ # OpenTelemetry Collector
β βββ otel-collector-config.yaml
β βββ Dockerfile
βββ k8s/ # Kubernetes manifests
β βββ 00-namespace.yaml
β βββ 01-otel-collector.yaml
β βββ 02-ollama.yaml
β βββ 04-jaeger.yaml
β βββ 05-ollama-simple-app.yaml # Sidecar deployment
βββ docker-compose/
β βββ docker-compose.yaml
βββ start-all.sh # Utility: Start all services
βββ stop-all.sh # Utility: Stop all services
βββ push-to-github.sh # Utility: Push to GitHub
βββ README.md Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
.
βββ ollama-simple-app/ # Application WITHOUT tracing code
β βββ app.py # Pure Flask app using Ollama
β βββ requirements.txt # No OpenTelemetry dependencies!
β βββ Dockerfile
βββ traceloop-sidecar/ # Independent tracing sidecar
β βββ proxy.py # Transparent tracing proxy
β βββ requirements.txt # OpenTelemetry dependencies here
β βββ Dockerfile
βββ ollama-app/ # (Optional) App with built-in tracing
β βββ ... # For comparison purposes
βββ collector/ # OpenTelemetry Collector
β βββ otel-collector-config.yaml
β βββ Dockerfile
βββ k8s/ # Kubernetes manifests
β βββ 00-namespace.yaml
β βββ 01-otel-collector.yaml
β βββ 02-ollama.yaml
β βββ 04-jaeger.yaml
β βββ 05-ollama-simple-app.yaml # Sidecar deployment
βββ docker-compose/
β βββ docker-compose.yaml
βββ start-all.sh # Utility: Start all services
βββ stop-all.sh # Utility: Stop all services
βββ push-to-github.sh # Utility: Push to GitHub
βββ README.md COMMAND_BLOCK:
.
βββ ollama-simple-app/ # Application WITHOUT tracing code
β βββ app.py # Pure Flask app using Ollama
β βββ requirements.txt # No OpenTelemetry dependencies!
β βββ Dockerfile
βββ traceloop-sidecar/ # Independent tracing sidecar
β βββ proxy.py # Transparent tracing proxy
β βββ requirements.txt # OpenTelemetry dependencies here
β βββ Dockerfile
βββ ollama-app/ # (Optional) App with built-in tracing
β βββ ... # For comparison purposes
βββ collector/ # OpenTelemetry Collector
β βββ otel-collector-config.yaml
β βββ Dockerfile
βββ k8s/ # Kubernetes manifests
β βββ 00-namespace.yaml
β βββ 01-otel-collector.yaml
β βββ 02-ollama.yaml
β βββ 04-jaeger.yaml
β βββ 05-ollama-simple-app.yaml # Sidecar deployment
βββ docker-compose/
β βββ docker-compose.yaml
βββ start-all.sh # Utility: Start all services
βββ stop-all.sh # Utility: Stop all services
βββ push-to-github.sh # Utility: Push to GitHub
βββ README.md COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Sample Ollama Application with OpenLLMetry Tracing
This application demonstrates how to use Ollama with OpenTelemetry tracing
"""
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from traceloop.sdk import Traceloop
import ollama # Initialize OpenTelemetry with Traceloop
def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Sample Ollama Application with OpenLLMetry Tracing
This application demonstrates how to use Ollama with OpenTelemetry tracing
"""
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from traceloop.sdk import Traceloop
import ollama # Initialize OpenTelemetry with Traceloop
def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob COMMAND_BLOCK:
#!/usr/bin/env python3
"""
Sample Ollama Application with OpenLLMetry Tracing
This application demonstrates how to use Ollama with OpenTelemetry tracing
"""
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from traceloop.sdk import Traceloop
import ollama # Initialize OpenTelemetry with Traceloop
def init_tracing(): """Initialize OpenTelemetry tracing with OTLP exporter""" # Get configuration from environment variables otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317") service_name = os.getenv("OTEL_SERVICE_NAME", "ollama-app") print(f"Initializing tracing with endpoint: {otlp_endpoint}") print(f"Service name: {service_name}") # Initialize Traceloop SDK Traceloop.init( app_name=service_name, disable_batch=False, exporter_otlp_endpoint=otlp_endpoint ) print("Tracing initialized successfully") def chat_with_ollama(model: str, prompt: str) -> str: """ Send a prompt to Ollama and get a response Args: model: The model to use (e.g., 'granite3:latest') prompt: The prompt to send to the model Returns: The model's response """ tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("ollama_chat") as span: span.set_attribute("llm.model", model) span.set_attribute("llm.prompt", prompt) try: # Get Ollama host from environment ollama_host = os.getenv("OLLAMA_HOST", "http://ollama:11434") print(f"\nSending prompt to Ollama ({model})...") print(f"Prompt: {prompt}") # Create Ollama client client = ollama.Client(host=ollama_host) # Send the prompt response = client.chat( model=model, messages=[ { 'role': 'user', 'content': prompt, }, ], ) response_text = response['message']['content'] span.set_attribute("llm.response", response_text) span.set_attribute("llm.response_length", len(response_text)) print(f"Response: {response_text}\n") return response_text except Exception as e: span.set_attribute("error", True) span.set_attribute("error.message", str(e)) print(f"Error: {e}") raise def main(): """Main application loop""" print("=" * 60) print("Ollama Application with OpenLLMetry Tracing") print("=" * 60) # Initialize tracing init_tracing() # Get model from environment model = os.getenv("OLLAMA_MODEL", "granite3:latest") # Sample prompts to demonstrate tracing prompts = [ "What is OpenTelemetry?", "Explain distributed tracing in one sentence.", "What are the benefits of observability?", ] print(f"\nUsing model: {model}") print(f"Running {len(prompts)} sample queries...\n") # Run sample queries for i, prompt in enumerate(prompts, 1): print(f"Query {i}/{len(prompts)}") try: chat_with_ollama(model, prompt) time.sleep(2) # Small delay between requests except Exception as e: print(f"Failed to process query: {e}") print("\n" + "=" * 60) print("All queries completed. Check your tracing backend for traces!") print("=" * 60) # Keep the application running to allow traces to be exported print("\nKeeping application alive for trace export...") time.sleep(10) if __name__ == "__main__": main() # Made with Bob CODE_BLOCK:
| Aspect | Sidecar (This Project) | Built-in Tracing |
| ---------------- | ---------------------- | -------------------- |
| Code changes | β None | β
Required |
| Dependencies | β None in app | β
OpenTelemetry libs |
| Language support | β
Any | β οΈ Language-specific |
| Maintenance | β
Centralized | β οΈ Per application |
| Performance | β οΈ Extra hop | β
Direct |
| Flexibility | β οΈ HTTP only | β
Any protocol | Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
| Aspect | Sidecar (This Project) | Built-in Tracing |
| ---------------- | ---------------------- | -------------------- |
| Code changes | β None | β
Required |
| Dependencies | β None in app | β
OpenTelemetry libs |
| Language support | β
Any | β οΈ Language-specific |
| Maintenance | β
Centralized | β οΈ Per application |
| Performance | β οΈ Extra hop | β
Direct |
| Flexibility | β οΈ HTTP only | β
Any protocol | CODE_BLOCK:
| Aspect | Sidecar (This Project) | Built-in Tracing |
| ---------------- | ---------------------- | -------------------- |
| Code changes | β None | β
Required |
| Dependencies | β None in app | β
OpenTelemetry libs |
| Language support | β
Any | β οΈ Language-specific |
| Maintenance | β
Centralized | β οΈ Per application |
| Performance | β οΈ Extra hop | β
Direct |
| Flexibility | β οΈ HTTP only | β
Any protocol | - ollama-simple-app/: The core application built by Bob, containing pure business logic with zero tracing code or OpenTelemetry dependencies.
- traceloop-sidecar/: The independent tracing proxy that provides the "sidecar" functionality.
- k8s/ & docker-compose/: Deployment manifests specifically tailored for different container engines, including specialized support for Podman users.
- Utility Scripts: A suite of automated tools (like deploy-podman.sh) to streamline building, loading, and deploying images across these diverse environments.
By leveraging independent container images for both the application and the sidecar, we can ensure that the tracing layer can be updated, scaled, or swapped out without ever needing to modify the βoriginalβ application code. - GitHub Code Repository: https://github.com/aairom/OpenLLMetry-SideCar
- Traceloop OpenLLMetry: https://www.traceloop.com/docs/openllmetry/introduction
- IBM Project Bob: https://www.ibm.com/products/bob
- Sidecar Pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/sidecar
- Sidecar containers: https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/
how-totutorialguidedev.toaimlopenaillmgptservernetworkdockerpythonssldatabase