Tools: Complete Guide to Dhrishti Part 1 - Building Runtime Observability for Distributed Systems
— written by a human! Recently at work, I worked on a major project - Multitenancy. Initially, we used to provide one virtual machine to every customer that we aquired. This meant a lot of manual configuration, multiple deployments for a small hot-fix, and more importantly, a lot of time spent in connecting to a remote SSH session and debugging network issues. Multitenancy would fix this by basically alloting all customers to a single machine. This didn’t sound bad, but now think about the legacy code - all the MongoDB connections, for example, or my .env files - everything was customized to an individual instance, and I had to make it so that the application for each customer worked within the scope of their own organization. In short, I did not want data from one organization to be visible in another. The code itself was difficult to conceptualize, but not impossible. What I felt was harder were the migrations themselves. My team and I spent countless hours pouring over connection errors, debugging Docker containerization issues, pointing our code to the correct env files - we almost gave up on this massive undertaking multiple times! Once we pulled through and this project was done, I began to wonder - what if there was some way to make this process easier? What if, through some coding magic, I could ACTUALLY make a graph to visualize all the network connections in an application? I could simply point my program to a docker container, and it would dive into the Kernel and reverse engineer its own architecture from system-calls to network events. I began doing some research, and I found the main character in this story - eBPF. eBPF is a program that would allow me to run sandbox programs inside the Linux KERNEL. It would do so without modifying kernel sources or loading any kernel modules that were potentially unsafe. The Kernel in Linux handles all the cool stuff - TCP connections, when a process starts, how much memory is allocated, etc. eBPF would allow me to send a small “probe” into this Linux Kernel Space, and observe what happens around it. Then, any important or significant information would be emitted back to me. I like to think of it like Voyager 1 . (I love reading about space exploration!). This is a space probe that happens to be the FARTHEST human made object from us - and we can still communicate with it! So, all I had to do was create a probe, send it out on an adventure into Kernal space, and have it emit events back to me. Simple. How would I capture the events it sent? Well, Claude suggested using a receiver, which I would write in Go, to collect these events. So I started. I opened up Zed and made 2 files - a server.py, and then a client.py. The client would simply send a request to the server every 3 seconds, and the server would return a Hello, world! response. Next, I put both of them into their own docker containers, with the client being dependent on the server container. And boom, I had just created a sandbox environment wherein TCP connections were being made, and a real application was running. Now, I had to build a probe to venture out into the vast expanse of (Linux Kernel) space and emit discoveries! For this, I used the help of ChatGPT. I asked it to make me a probe that would run and collect TCP events. It made a probe using C, and also said: I always knew that space exploration could be dangerous, and I would never understand everything fully. But, at a high level, the code did the following: my probe would hook onto the Kernel, look for tcp_connect events, extract the meta-data and emit it out. Also, to make sure I followed CO-RE principles (Code Once, Run Everywhere), I had to make a vmlinux.h file with my kernel’s actual type definitions, extracted from BTF metadata, specifically for BPF programs. For those like me who didn’t understand a word of the above, basically, I knew my probes would run on MY kernel space, but I could not guarantee that they would run on another type of Linux Kernel, or that they would not break if the libraries I was using god updated. So, I had to make a file to store all metadata about how to run my probes in every (known) situation. I compiled the probe and ran the probe.o file. Now, I had a probe sent into the docker sandbox application, and it was already emitting events. I now needed to make a receiver that would receive these events. For this, I wanted to collect the telemetry in a language that was fast, efficient and easily compiled, so that my activity of listening to the probe did not slow down the application that I was supposed to observe. Hence, I selected Go. Go is truly a beautiful language, and I really wanted to use it in a project after having learnt it a little while ago. I also came across some really cool quirks of Go which I had to work around (stay tuned, this is for Part 2!) In Go, I built a struct to collect the events that were being sent by the probe: I also built a resolver that would resolve a Docker Client and return the Client: Now, I was ready to connect to my probe and get some data! The concept was as follows: After a lot of experimentation, referring to docs and to ChatGPT, I managed to code out the steps exactly like this. My code was being orchestrated by a file called loader.go so, I turned it into an executable. Then, I ran my docker service and also my executable. Oh. My. God. I could talk to my PROBE!! I sat there for a good 15 minutes just looking at my telemetry logs. This was beautiful. It was also insufficient. This did not tell me EVERYTHING I wanted to know about my containers. But now, the basic idea was built. All I had to do was send out multiple probes that specialized in multiple types of data gathering, and make sure I collected ALL of that data. When this was done, all I had to do was make a beautiful (AI Generated) front-end to show this graph by polling an API repeatedly. With the help of ChatGPT, I built the following probes: I wanted to start with this, but first, I needed to improve my coding architecture. I had a loader.go that was basically handling everything, and that would not be scalable as I added more probes. So I had to come up with a better architecture for my code, but the project wasn’t just an idea anymore! Stay tuned for the second part, or feel free to check out my full project here! https://github.com/IdiotCoffee/dhrishti Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse