kubeshark — When AI agents can finally query your cluster network directly

hero

An incident hits, and your AI can't see the network

The pager goes off at 2 AM. Checkout has been failing since 2:15 PM. You dig through logs, open dashboards, kubectl into pods. Yet the one thing that matters — what the service-to-service traffic actually looked like in that moment — is nowhere clean. Where did the request drop? Which service timed out? Those packets are already gone.

These days you want to point an AI assistant at the investigation. But while you can feed it logs, there's rarely a clean way to show it the live network traffic flowing inside the cluster. kubeshark solves exactly this gap.

The problem it solves

kubeshark bills itself, in the README's own words, as "Network Observability for SREs & AI Agents." The core idea: it indexes cluster-wide network traffic at the kernel level using eBPF. No code instrumentation required. Without touching your applications, it captures every conversation crossing nodes and turns it into something queryable.

The traditional options were two. Instrument every application with tracing code, or run sidecars to intercept traffic. Both carry operational weight, and both struggle against encrypted traffic. kubeshark pushes that weight down into the kernel instead.

How it works

kubeshark parses traffic according to protocol specifications and indexes it — HTTP, gRPC, Redis, Kafka, DNS, and more. A single KFL (Kubeshark Filter Language) query can combine three semantic layers at once: Kubernetes identity, API context, and network attributes.

# One KFL query spans three semantic layers
# Kubernetes identity + API context + network attributes

On top of that, it automatically decrypts TLS/mTLS traffic via eBPF — with no key management and no sidecars. Traffic that was opaque because it was encrypted now reads in plain text.

Setup

Getting started is a Helm three-liner.

helm repo add kubeshark https://helm.kubeshark.com
helm install kubeshark kubeshark/kubeshark
kubectl port-forward svc/kubeshark-front 8899:80

Open http://localhost:8899 and you're already capturing traffic. For production, the README recommends an ingress controller over port-forward.

Connecting an AI agent

The real differentiator is that kubeshark exposes this data over MCP (Model Context Protocol). AI agents can query traffic, investigate API calls, and run root cause analysis through natural language.

brew install kubeshark
claude mcp add kubeshark -- kubeshark mcp

That unlocks questions like these — all straight from the README:

"Why did checkout fail at 2:15 PM?"
"Which services have error rates above 1%?"
"Show TCP retransmission rates across all node-to-node paths"
"Trace request abc123 through all services"

It works with Claude Code, Cursor, and any MCP-compatible AI. There are also open-source AI skills: a Network RCA skill for retrospective root cause analysis — snapshots, dissection, PCAP extraction, trend comparison — and a KFL skill that writes and debugs traffic filters.

When not to use it

kubeshark assumes a Kubernetes cluster. If you're on a single VM or a non-Kubernetes setup, it isn't the fit. Because eBPF relies on kernel features, heavily restricted or some managed environments may limit worker deployment. Cluster-wide capture is powerful, but it brings a policy question about what to retain and for how long.

Conversely, if your goal is precise business-transaction-level tracing, an instrumentation-based APM may serve better. kubeshark's strength is observability from the network's vantage point.

Alternatives in the same category

Wireshark is the standard for packet analysis, but it doesn't index a whole cluster in real time. kubeshark complements it — you can export PCAPs scoped by time, nodes, workloads, and IPs and hand them to Wireshark. Many service-mesh observability tools require sidecars; kubeshark aims for the same visibility via eBPF without them.

Wrap-up

kubeshark's one-line value: it turns cluster network traffic into data your AI can question. It's 100% on-premises with air-gapped support, so it runs without external dependencies. If you're already attaching AI to incident response, it's worth sketching the combo where the network layer joins logs in the AI's field of view. Start on a demo cluster and watch encrypted traffic resolve into plain text — that single moment makes the adoption call faster.

🐦 Faster updates on X: @baegseungh7061
📚 More in this series: AI Insights
💌 Subscribe: Follow on X or grab the RSS

Seunghyeon's Agentic Lab

이 블로그 검색