Chapter 11: Kubernetes Production Practices

Metadata Card

Prerequisites: Chapter 7 (Docker & K8s Basics), Chapter 10 (IaC & GitOps)
Estimated Time: 45 minutes
Core Difficulty: Intermediate
Reading Mode: High focus
Completion Milestone: Understand Ingress, NetworkPolicy, StatefulSet, PV/PVC production usage, design Pod resource quotas and HPA strategies, master multi-cluster isolation and production checklist

Your Progress

The Expeditionary Army's command system has been running on K8s for a while. Stateless services are running well — Deployment rolling updates, HPA auto-scaling, Service load balancing are stable. But more complex scenarios are emerging.

Intelligence Analysis Service needs persistent storage — it must save raw reconnaissance data for subsequent analysis. But after a Pod restart, data is lost. New requirements also arrive: different station communication needs firewall policies, external users need domain-based API access to the command platform instead of IP addresses, and you need to isolate a new K8s namespace for a new theater. Your Task

Production K8s: Ingress (external traffic management), NetworkPolicy (fine-grained access control), StatefulSet + PV/PVC (stateful workloads), resource quotas, multi-cluster patterns.

Ingress: Application-layer traffic entry. Domain-based routing with TLS termination. Annotations for rate limiting, CORS, whitelist.

NetworkPolicy: Kubernetes-native firewall. By default, all pods can communicate freely — insecure. NetworkPolicy defines allow-lists. Requires compatible CNI (Calico, Cilium, Weave).

Service Mesh (Istio/Linkerd): Sidecar proxy for every pod. Provides traffic management, mTLS, observability. Adds 2-5ms latency and operational complexity.

StatefulSet: Stable pod identities, ordered deployment/scaling, each pod gets unique PVC. Uses Headless Service (no cluster IP) for DNS-based pod discovery.

PV/PVC: Storage lifecycle decoupled from pod lifecycle. StorageClass for dynamic provisioning. Reclaim policy: Retain (manual), Delete (auto), Recycle (deprecated).

Resource management: requests (scheduling guarantee) must be <= limits (runtime cap). HPA scales based on CPU/memory or custom metrics. VPA adjusts requests/limits based on historical usage.

Multi-cluster: Hub-and-Spoke (central management), Federation (deprecated), or Independent clusters + unified GitOps.

Production checklist: etcd backup, control plane HA, CNI + NetworkPolicy, pod requests/limits, liveness/readiness probes, PDB, immutable image tags, NetworkPolicy between critical services, Ingress TLS, Prometheus monitoring, log collection, HPA configured, PVC reclaim policy for stateful services, cross-AZ/region disaster recovery.

Common Pitfalls: Pod requests > limits (impossible). Cluster Autoscaler and HPA oscillating. Hardcoding pod IPs. K8s Secret data not encrypted (base64 encoded only, enable etcd encryption or use External Secrets Operator). Deleting PVC causes data loss.

Traveler's Notes

Production K8s isn't learning more API objects. True production challenges come from managing stateful data, isolating network traffic, allocating resource quotas, and establishing cross-cluster governance. Ingress manages external traffic, NetworkPolicy provides internal micro-segmentation, StatefulSet keeps stateful services alive, PV/PVC decouples storage from pod lifecycle.

Next: Observability Deep Dive (Chapter 12).