9 week Advanced Kubernetes Bootcamp on AWS (EKS) + AIops
Demo Classes
Bootcamp Details
- Fee: ₹25K ($275)
- Total Classes: 18
- Duration: 150 minute
- Format: Live Classes
- Starting On: 21st March 2026
- Classes on: Saturday & Sunday
- Timings: 6.00 PM - 8.30 PM IST
- Language: English
The Only Kubernetes(EKS) Bootcamp That Takes You From Zero to Production Engineer in 9 Weeks with Advanced AIops implementation
This 9-week Kubernetes on AWS (EKS) will teach you hardcore real-world projects on Kubernetes with production-level context. It will not just teach you k8s but also AIops implementation in modern systems.
- Production-grade Kubernetes cluster with Terraform
- CI/CD with advanced DevSecOps implementation
- GitOps implementation with ArgoCD
- Microservices and stateful sets on k8s, Gateway API ingress controller implementation
- Services mesh, networking policy, operator, CRD
- Karpenter for node scaling
- AIops with local LLM, custom RAG solution, and AI automated k8s troubleshooting
- Production-grade observability setup with Prometheus, Grafana, Loki and other modern tooling
- Real incident simulation
Pre-requisite:
- Basic AWS
- Basic Docker
- Basic CICD (preferably GitHub Action)
Important points:
- All classes are Live and will be taught by Akhilesh Mishra
- You will also get the recordings, code, notes, and other resources
- This bootcamp will be taught in the English language
9 Week Kubernetes on AWS (EKS) Bootcamp + AIops
Week 1: Kubernetes Fundamentals
Core concepts, architecture, and your first real cluster
Topics Covered
- The story behind Kubernetes — the why before the how
- Kubernetes architecture deep dive (Control Plane, Worker Nodes, etcd)
- Core concepts: Pod, Service, Deployment, ReplicaSet
- Setting up a local cluster with Minikube
- Getting comfortable with kubectl commands
- ConfigMaps and Secrets management
- Running a 2-tier app (App + DB) on Kubernetes
- Using Kubernetes IDE — Lens (Freelens)
- Pulling private images using ImagePullSecrets
- Namespaces and resource organisation
- Labels, Selectors, and Annotations
- Resource Requests and Limits
- Understanding YAML manifests in depth
- Kubernetes DNS and service discovery internals
🏗 Project
Running a proper 2-tier e-commerce app on Minikube with Secrets, ConfigMaps, and private image registry
Week 2: Advanced Minikube: CI/CD + GitOps
GitOps, pipelines, observability, and resilience patterns
Topics Covered
- Basic logging and monitoring fundamentals
- Implementing GitOps with ArgoCD on Minikube
- End-to-end CI/CD pipeline — build, push, deploy
- Prometheus and Grafana — building basic dashboards
- Rolling upgrades and rollback strategies
- Pod autoscaling with HPA and VPA
- Live troubleshooting techniques
- Init containers and sidecar patterns
- Pod Disruption Budgets for high availability
- Liveness, Readiness, and Startup probes
- CrashLoopBackOff and OOMKilled debugging
- Deployment strategies — Recreate vs RollingUpdate vs Blue-Green
- Resource quotas and LimitRanges per namespace
- Understanding Kubernetes events and how to read them
🏗 Project
GitOps deployment of e-commerce app on Minikube with CI/CD pipeline, HPA, and basic Prometheus + Grafana monitoring
Week 3: Production-Grade EKS: 3-Tier Application
Real AWS infrastructure, IAM, networking, security, and TLS
Topics Covered
- Setting up EKS cluster via AWS Console
- EKS add-ons: VPC CNI, CoreDNS, EBS CSI Driver
- Helm charts — writing, packaging, and deploying
- IRSA — Kubernetes to AWS IAM with OIDC
- Running a 3-tier app: Frontend + Backend + RDS PostgreSQL
- Database migrations using Kubernetes Jobs
- Init containers for DB connection readiness checks
- Services with Ingress for internal and external networking
- AWS annotations for ELB and target group configuration
- AWS Secrets Manager for credential management
- AWS Load Balancer Controller with Helm
- Domain, DNS, and SSL/TLS termination
- EKS managed node groups vs self-managed nodes
- Kubernetes RBAC hardening — ServiceAccounts, ClusterRoles, RoleBindings, least privilege
- aws-auth ConfigMap and RBAC for cluster access control
- ExternalDNS for automatic Route53 record management
🏗 Project
Production-grade e-commerce app on EKS with IRSA, RDS, Secrets Manager, Load Balancer Controller, custom domain, SSL, and RBAC hardening
Week 4: Microservices, GitOps & Infrastructure as Code
Terraform EKS, production microservices, Gateway API, and ArgoCD patterns
Topics Covered
- Production-grade EKS cluster with Terraform
- Running microservices on Kubernetes with best practices
- Gateway API for advanced ingress routing
- AWS Load Balancer Controller architecture deep dive
- Terraform deployment of AWS Load Balancer Controller
- SSL termination strategies
- Terraform module structure for EKS — VPC, node groups, add-ons
- Managing multiple environments with Terraform workspaces — dev, staging, prod
- ArgoCD App-of-Apps pattern for multi-service GitOps
- ArgoCD ApplicationSet for environment promotion
- Network Policies for microservice traffic isolation
- Inter-service communication — ClusterIP vs headless vs service mesh
- Kubecost or OpenCost — namespace-level cloud cost attribution
🏗 Project
EKS cluster with Terraform, e-commerce microservices with production-grade GitOps via ArgoCD App-of-Apps, Gateway API ingress with AWS LBC, multi-environment strategy, and cost visibility dashboard
Week 5: Production Logging & Monitoring + SRE
Observability at scale — metrics, logs, dashboards, and alerting
Topics Covered
- How logging and monitoring work in real companies
- Different scenarios of logging and monitoring strategy
- Implementing observability for microservices
- Monitoring differences: Fargate vs managed node groups
- Prometheus for metrics collection
- Loki for log storage and querying
- Grafana dashboards for Kubernetes and cloud resources (RDS, Lambda)
- Prometheus Operator and ServiceMonitor CRDs
- AlertManager — routing alerts to Slack, PagerDuty
- Log aggregation with Fluent Bit on EKS
- OpenTelemetry for distributed tracing across microservices
- SRE implementation
- SLO and SLI definitions — error budget dashboards in Grafana
- AWS CloudWatch Container Insights integration
- Cost visibility dashboard — RDS, Lambda, EKS node costs in Grafana
- Agentic Kubernetes troubleshooting with AI tools
🏗 Project
Full observability stack for e-commerce microservices — Prometheus + Loki + Grafana with SLO dashboards, AlertManager Slack integration, distributed tracing, and cloud cost visibility
Week 6: StatefulSets, Persistent Storage, Devsecops, Image Optimisation
Stateful workloads, storage management, DevSecOps, and container efficiency
Topics Covered
- Persistent Volume (PV), PVC, and StorageClass concepts
- Running StatefulSets on Kubernetes
- Docker image optimisation techniques
- Troubleshooting multi-attach volume errors
- Debugging common StatefulSet failures
- Dynamic vs static volume provisioning on EKS
- EBS vs EFS — choosing the right storage for the right workload
- Multi-stage Docker builds for production images
- Distroless and minimal base images for security
- Trivy for container image vulnerability scanning
- Volume snapshots and backup strategies
- Headless Services for StatefulSet DNS resolution
- Agentic Kubernetes troubleshooting with AI tools
🏗 Project
Running Elasticsearch + MinIO on Kubernetes as StatefulSets with persistent storage, optimised multi-stage Docker images, and Trivy image scanning integrated into the CI pipeline
Week 7: Service Mesh, Network Policy, Karpenter & EKS Auto Mode, Custom resources definition + Operators
Advanced networking, intelligent node scaling, and cost optimisation
Topics Covered
- Service mesh fundamentals — why it exists and when to use it
- Istio or Linkerd — installation, traffic management, mTLS
- Network Policies for zero-trust pod-to-pod communication
- Egress controls and namespace isolation
- Karpenter architecture — node provisioner vs Cluster Autoscaler
- Karpenter NodePool and EC2NodeClass configuration
- Cost optimisation with Spot + On-Demand mixed fleets
- EKS Auto Mode — what it is and when to use it over Karpenter
- Istio traffic splitting for canary deployments
- Visualising service mesh traffic with Kiali
- Pod topology spread constraints for multi-AZ resilience
- Custom Resource Definitions (CRDs) — extending the Kubernetes API with your own object types
- How Karpenter itself is built on CRDs — NodePool and EC2NodeClass as real-world examples
- Kubernetes Operators — encoding operational knowledge into controllers Operator pattern — watch, reconcile, act loop explained Building a simple Operator with the Operator SDK
- Agentic Kubernetes troubleshooting with AI tools
🏗 Project
Service mesh with mTLS and canary deployments, network policies for zero-trust isolation, Karpenter Spot node scaling, and Kyverno policy enforcement — all on the e-commerce app
Week 8: AIops on Kubernetes + RAG Implementation
Running AI workloads on Kubernetes and building intelligent infrastructure
Topics Covered
- How AI fits into real DevOps workflows — automated error detection, log triage, anomaly detection
- Deploying Ollama on Kubernetes — GPU vs CPU node scheduling, resource limits for AI workloads
- Running local LLMs (Gemma 2B) as a Kubernetes Deployment with persistent model storage
- Building a RAG pipeline on Kubernetes — Qdrant vector database as a StatefulSet
- Document ingestion pipeline — chunking, embedding with nomic-embed-text, storing in Qdrant
- Semantic search — querying vectors to retrieve relevant context before LLM inference
- Connecting RAG to real infrastructure data — feeding live logs, metrics, and incident history
- AI-powered health checker as a Kubernetes CronJob — automated failure detection and root cause suggestions
- Exposing the AI chat interface via Kubernetes Ingress with proper auth
- Resource-aware scheduling for AI pods — nodeSelectors, tolerations, and priority classes
- Observability for AI workloads — tracking inference latency, query quality, and model health in Grafana
🏗 Project
Full AIops monitoring system on Minikube — Spring Boot app with failure injection, Python health checker CronJob, Ollama + Gemma 2B for log analysis, Qdrant RAG pipeline with uploadable knowledge base, and a React dashboard with live AI chat. Everything local. Zero external API calls.
Week 9: Production Incidents, War Rooms, Oncall experience
Real incidents, live simulations, RCAs, and interview readiness
Topics Covered
- SRE principles — SLO, SLI, SLA, error budgets
- Discussing multiple real production incidents
- Live war room simulation — Incident 1 (OOMKill cascade on order service)
- Live war room simulation — Incident 2 (DB connection pool exhaustion under load)
- Writing RCAs and postmortems for both incidents
- Real-world SRE implementations
- On-call runbook writing and documentation standards
- Chaos engineering basics — pod failure injection with LitmusChaos
- Kubernetes system design interview questions — “Design a deployment pipeline for an e-commerce platform”
- How to present the capstone project on your resume and in interviews
- Answering scenario-based DevOps interview questions around Kubernetes
🏗 Project
Two full war room simulations on the e-commerce app with live troubleshooting, written RCAs, LitmusChaos experiments, and a complete resume-ready project documentation package
Reach out for Queries, Part payment requests
- Email:livingdevops@gmail.com
- WhatsApp: +91 9259681620
