Kubernetes Networking with Cilium: From CNI to Service Mesh

Kubernetes networking has a dirty secret: on a default cluster, every Service is implemented by kube-proxy writing iptables rules, and on a busy cluster that becomes thousands of sequential rules the kernel walks per packet. It works, but it does not scale gracefully, and it gives you almost no visibility into what is actually talking to what.

Cilium throws that out and builds the dataplane on eBPF. The practical results: Service load balancing without the iptables tax, network policy that follows workloads instead of IP addresses, and flow logs that finally answer “who is connecting to this pod.”

Why eBPF Changes the Model

kube-proxy thinks in IPs and iptables rules. Pods churn, IPs recycle, and rules pile up. Cilium attaches eBPF programs at the socket and driver level and thinks in identities — a label-derived identity assigned to each workload. Policy is expressed against identities, so a rule like “frontend may talk to backend” keeps holding as pods are rescheduled and IPs change underneath.

That is the conceptual leap: stop writing policy about where a workload is (IP) and write it about what it is (labels/identity).

Installing as the CNI

Cilium is a CNI plugin; install it on a cluster with no other CNI (or migrate carefully). Via Helm, turning on the two features that matter most — kube-proxy replacement and Hubble:

Terminal window
helm install cilium cilium/cilium --namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=10.0.0.1 \
--set k8sServicePort=6443 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true

kubeProxyReplacement=true means Cilium handles Service load balancing in eBPF and you can remove kube-proxy entirely. Check status:

Terminal window
cilium status
cilium status --verbose | grep -i kubeproxy # should report "True"

Network Policy That Follows Workloads

Standard Kubernetes NetworkPolicy is L3/L4 and namespace/label scoped. CiliumNetworkPolicy extends it to L7 (HTTP, DNS, Kafka) and richer selectors. A baseline L3/L4 policy — backend accepts only from frontend, only on 8080:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: backend-allow-frontend
namespace: shop
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP

The L7 layer is where Cilium pulls ahead — restrict not just the port but the HTTP methods and paths:

ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/v1/.*"
- method: "POST"
path: "/api/v1/orders"

Now frontend can GET /api/v1/* and POST only to orders — enforced in the dataplane, no sidecar. This is “service mesh” L7 policy without running Envoy next to every pod.

DNS-Aware Egress

A perennial problem: locking down egress to external services that live behind changing IPs. Cilium policies can match DNS names, and it learns the IPs by snooping the pod’s own DNS lookups:

egress:
- toFQDNs:
- matchName: "api.stripe.com"
toPorts:
- ports:
- port: "443"
protocol: TCP
- toEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": kube-system
k8s-app: kube-dns
toPorts:
- ports: [{ port: "53", protocol: UDP }]

You must allow DNS to kube-dns for toFQDNs to work — Cilium watches those lookups to populate the allowed IP set. Forgetting the DNS rule is the most common “my FQDN policy does nothing” bug.

Hubble: Finally, Visibility

The piece that justifies the migration on its own. Hubble taps the eBPF dataplane and shows real flows, including which policy verdict each got:

Terminal window
# Live flows for a namespace
hubble observe --namespace shop
# Only DROPPED flows — what is policy blocking right now?
hubble observe --verdict DROPPED --namespace shop
# Flows to a specific workload
hubble observe --to-label app=backend

hubble observe --verdict DROPPED is how you debug a policy that is too tight: you see the exact source, destination, and port being denied, instead of guessing why a pod cannot reach a service. The Hubble UI renders the same data as a live service map.

Default-Deny, the Safe Way

Policies are additive and default-allow until any policy selects an endpoint, at which point that endpoint is default-deny for the covered direction. The trap: apply a default-deny before you have allow rules and you cut traffic instantly. Stage it:

  1. Run with Hubble and no enforcement; learn the real flows from hubble observe.
  2. Write allow policies matching those flows.
  3. Apply a namespace default-deny once the allows are in place.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata: { name: default-deny, namespace: shop }
spec:
endpointSelector: {}
ingress: []
egress: []

That empty-rule policy denies everything in the namespace — apply it last, never first.

Debugging When Hubble Isn’t Enough

Hubble answers “what flowed and what verdict.” When a policy should allow traffic that is still dropped, you drop to the endpoint and identity layer. Every pod is a Cilium endpoint with a numeric identity derived from its labels; policy is compiled against those identities, so a stale or wrong identity is a frequent root cause:

Terminal window
# List endpoints with their identity and enforcement state
cilium endpoint list
# ENDPOINT IDENTITY LABELS INGRESS EGRESS
# 1423 12influx k8s:app=backend Enabled Disabled
# What policy is actually loaded for one endpoint
cilium endpoint get 1423
# Identity to labels mapping — confirm two pods you expect to share identity do
cilium identity list

If two backend pods landed on different identities, a label typo split them and your fromEndpoints: app=frontend rule only covers one. When the verdict itself is the mystery, cilium monitor is the dataplane trace — it shows policy decisions per packet with the identities involved:

Terminal window
# Drops only, with the L3/L4 detail and the policy verdict
cilium monitor --type drop
# xx drop (Policy denied) flow 0x0 to endpoint 1423, identity 53713->12influx: \
# 10.0.2.7:51234 -> 10.0.4.9:8080 tcp SYN

identity 53713->12influx is the actual evaluation: source identity 53713 to destination 12influx was denied. Cross-reference 53713 with cilium identity list and you usually find the caller is not the workload you assumed (a sidecar, an init container, or traffic arriving from outside the cluster with the world identity).

The toFQDNs Gotcha and How It Actually Resolves

FQDN policy is the most common source of “it works for an hour then breaks.” The mechanism: Cilium proxies the pod’s DNS request, records the returned IPs, and inserts them into the allowed set with a TTL. Two failure modes follow directly from that.

First, if the application caches DNS longer than Cilium’s recorded TTL, the pod keeps using an IP Cilium has since expired, and traffic drops. Cilium honors a minimum TTL you can raise, and it is worth pinning DNS settings explicitly rather than trusting the upstream record:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: egress-stripe
namespace: shop
spec:
endpointSelector:
matchLabels:
app: backend
egress:
- toEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": kube-system
k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
rules:
dns:
- matchPattern: "*.stripe.com"
- toFQDNs:
- matchPattern: "*.stripe.com"
toPorts:
- ports:
- port: "443"
protocol: TCP

The rules.dns.matchPattern on the port 53 rule is the part people omit. Without it Cilium forwards DNS but does not learn the answers, so toFQDNs never populates — the egress to kube-dns must itself carry the DNS-matching rule for the FQDN intercept to fire. Confirm what Cilium actually learned:

Terminal window
cilium fqdn cache list | grep stripe
# 10.0.3.4 default/backend *.stripe.com api.stripe.com 34.x.x.x TTL 86s

If that cache is empty while the pod resolves the name fine, your DNS visibility rule is wrong, not the FQDN rule.

Second failure mode: a CDN behind the FQDN returns dozens of rotating IPs with short TTLs. The allowed set churns, and a connection opened to an IP that has since aged out gets cut mid-flight. For high-cardinality endpoints, prefer a wildcard matchPattern over per-host matchName so any IP the domain hands back is covered, and confirm the policy compiled by checking it landed in the dataplane rather than just the API server:

Terminal window
cilium policy get
# verify the toFQDNs selector appears in the loaded policy revision
# Bump the revision after applying; if it doesn't increment, the policy was rejected
cilium endpoint get 1423 -o jsonpath='{[0].status.policy.realized.policy-revision}'

A policy that validates against the CRD schema but references a selector matching no identity loads silently and does nothing — the revision incrementing without the expected Enabled enforcement on the endpoint is the tell.

What You Get and What It Costs

The wins are real: Service load balancing that scales past iptables, identity-based and L7 policy without sidecars, DNS-aware egress, and Hubble visibility. The cost is operational depth — Cilium is a bigger, more opinionated component than a basic CNI, and the eBPF dataplane means debugging sometimes drops to cilium monitor and BPF map dumps rather than reading iptables.

For a small cluster running a few apps, a simple CNI is less to operate. For a cluster where you need real network policy, Service scale, and the ability to answer “what talked to what,” Cilium is the current default for good reason — and Hubble alone will change how you debug the cluster.