22-Week Real World Project based AWS DevOps Bootcamp

Demo Classes

Bootcamp Details

Real-world DevOps bootcamp by Akhilesh Mishra with real-world projects, live troubleshooting, live incident management with production-level context

This bootcamp is designed to teach you what a real-world DevOps project looks like, and you will learn everything by doing all real-world projects, troubleshooting live issues, and navigating live production incidents. For the next 5-6 months, you will live the experience of a DevOps Engineer

My bootcamps will get you a good enough Devops experience that will help you crack any Devops interview

Bootcamp Details

  • Bootcamp Level -> beginners to advanced
  • Start date: 11th April 2026
  • Total Classes -> 60
  • Class Days: Saturday and Sunday
  • Timings: 9.30 AM IST – 12.30 PM IST
  • Class Duration: 2.5 – 3 hours each
  • Language: English
  • Teacher: Akhilesh Mishra
  • All classes are recorded, and students get lifetime access to recordings, code, notes, and resources
  • All classes will use real-world projects, production-level context, and details
  • All classes will be live, with real-time troubleshooting

Module: AWS + Containers


Class 1: Cloud Computing + AWS + Custom VPC

  • History of computing — physical → virtualisation → cloud
  • Cloud models — IaaS, PaaS, SaaS, AWS global infrastructure, shared responsibility
  • VPC, subnets (public/private), Internet Gateway, route tables
  • Security Groups vs NACLs — stateful vs stateless, when to use each
  • NAT Gateway — private subnet internet access without exposure
  • VPC Endpoints — S3 Gateway endpoint, why it saves cost and improves security
  • Bastion host pattern — why you never SSH directly to private instances
  • VPC Peering vs Transit Gateway — connecting VPCs

Project: Production VPC from Scratch

  • 6 subnets (2 public, 2 private app, 2 private DB) across 2 AZs
  • NAT Gateway, S3 Gateway endpoint, route tables for each tier
  • Security Groups per tier with least-privilege rules, NACLs on DB subnets
  • Bastion host — SSH to private instance through it, direct access blocked

Class 2: EC2 + Elastic IP + DNS + Free SSL

  • EC2 instance types, AMIs, purchase options (On-Demand, Spot, Reserved)
  • EBS volumes, snapshots, instance store vs EBS
  • User data scripts — bootstrapping EC2 at launch
  • IAM Instance Profiles — EC2 permissions without access keys
  • Elastic IP, Route 53 hosted zones, A records, alias records
  • ACM free SSL cert — DNS validation, attaching to resources

Project: Web App on EC2 with Custom Network, DNS + HTTPS

  • EC2 in private subnet, Nginx reverse proxy, app auto-starts via systemd
  • Elastic IP, Route 53 A record, ACM cert — HTTPS end to end
  • IAM Instance Profile for S3 access, SSH only via Bastion

Class 3: S3 + CloudFront + Cross-Region Replication

  • S3 storage classes, lifecycle policies, versioning, bucket policies
  • Static website hosting, pre-signed URLs, S3 event notifications
  • CloudFront — CDN, Origin Access Control, cache invalidation
  • S3 replication — SRR vs CRR, IAM roles, delete marker replication

Project 1: Static App on S3 + CloudFront + Custom Domain + SSL

  • S3 with OAC, CloudFront distribution, ACM cert in us-east-1, Route 53 alias record

Project 2: Real-Time Cross-Region Replication

  • Source bucket ap-south-1 → destination us-east-1, versioning on both, lifecycle policy on destination

Class 4: ALB + Auto Scaling Groups + 2-Tier App

  • ALB vs NLB — when to use which, listeners, rules, target groups, health checks
  • ALB path-based routing, HTTPS termination, HTTP → HTTPS redirect
  • Launch Templates, ASG — desired/min/max, AZ balancing
  • Scaling policies — target tracking, step scaling, scheduled
  • User data in ASG — pulling config from SSM on launch
  • Connection draining, instance refresh for zero-downtime AMI updates

Project: 2-Tier App with ALB + ASG + RDS + Route 53 + ACM

  • Launch Template with User data, ASG across 2 private subnets, min 2 max 6
  • ALB with HTTPS, health checks, target tracking on CPU
  • RDS PostgreSQL in private DB subnets, credentials in SSM Parameter Store
  • Simulate — terminate instance, scale under load, verify zero downtime

Class 5: RDS Patterns + Disaster Recovery

  • RDS Multi-AZ — synchronous replication, automatic failover
  • Read Replicas — async replication, read scaling, cross-region DR
  • Multi-AZ vs Read Replica — the difference most engineers get wrong
  • RDS snapshots, point-in-time recovery, Aurora basics, RDS Proxy
  • DR patterns — Backup & Restore, Pilot Light, Warm Standby, Multi-Site Active-Active
  • RTO vs RPO — how to calculate, how to pick the right strategy

Project 1: RDS Multi-AZ Failover Simulation

  • Trigger manual failover, measure downtime, verify app reconnects automatically

Project 2: Cross-Region Read Replica + DR

  • Promote Read Replica in second region, update Route 53, measure RTO and RPO end to end

Class 6: IAM Deep Dive + Production Security Hardening

  • IAM users, groups, roles, policies — why roles always beat users for apps
  • Policy structure — Effect, Action, Resource, Condition
  • Permission boundaries, cross-account role assumption, trust policies
  • SSM Session Manager — modern zero-trust alternative to Bastion/SSH
  • Secrets Manager vs SSM Parameter Store — when to use each
  • CloudTrail, AWS Config, IAM Access Analyzer — audit and compliance

Project 1: Lock Down the Stack with IAM + Session Manager

  • Remove SSH, enable SSM Session Manager, least-privilege Instance Profile
  • Secrets moved to Secrets Manager, CloudTrail logging to locked S3 bucket
  • IAM Access Analyzer — fix every flagged overly permissive policy

Project 2: Cross-Account Role Assumption

  • IAM role in prod, trust policy for dev account, developer assumes role via CLI
  • IAM Identity Center — federated login for both accounts, no IAM users for humans

Week 4: Containerization using Docker and Docker Compose

  • Evolution of Docker, its architecture, and everyday commands
  • Building, sharing, and running custom Docker images
  • Container networking, volume management with meaningful usecases
  • Docker Compose for multi-container applications, simulating real-time application, load testing, and alerting

Project

  • Containerize a full-stack application with proper optimization
  • Real-time container monitoring with a visual dashboard and interactive charts for performance visualization
  • Alert system with AWS SES email notifications and stress testing for different load scenarios

Module: Running Containers on production( AWS ECS ) + Infrastructure as code (Terraform) + CICD (GitHub Action)


Class 1: Two-Tier App on ECS (Console)

  • ECS fundamentals — clusters, services, task definitions, Fargate vs EC2 launch type
  • Task definition anatomy — container definitions, CPU/memory, environment variables, secrets
  • ECS service — desired count, deployment config, circuit breaker
  • ALB integration with ECS — target group, dynamic port mapping, health checks
  • ECS service discovery — AWS Cloud Map, how containers find each other
  • CloudWatch Container Insights — logs, metrics, container-level visibility
  • ECS IAM — task role vs task execution role, what each one does
  • Secrets Manager + ECS — injecting secrets into containers without hardcoding

Project: Deploy a Database-Backed 2-Tier App on ECS

  • ECS Fargate cluster, task definition for Node.js app connecting to RDS PostgreSQL
  • ALB with HTTPS, health check on /health, CloudWatch log group per container
  • Secrets Manager for DB credentials injected at runtime, task role for S3 access

Class 2: Terraform Fundamentals

  • Terraform architecture — providers, resources, state, plan, apply lifecycle
  • Writing your first resources — VPC, EC2, S3 in Terraform
  • Variables, locals, outputs — making configs reusable
  • terraform.tfvars and environment-specific variable files
  • Remote state — S3 backend with DynamoDB locking, why local state breaks teams
  • Terraform workspaces — managing dev and prod from the same codebase
  • terraform import — bringing existing resources under Terraform management
  • Terraform plan best practices — what to review before every apply
  • Common mistakes — hardcoded values, missing state locking, giant single files

Project: Convert Manual AWS Setup to Terraform

  • Recreate the Week 2 VPC (subnets, NAT, security groups) entirely in Terraform
  • Remote state in S3 with DynamoDB lock, workspace for dev and prod
  • terraform import the manually created RDS from Week 3 into state

Class 3: Full ECS Infrastructure with Terraform

  • Terraform module structure for ECS — VPC, ALB, ECS, RDS as separate modules
  • ECS service and task definition in Terraform — every config option explained
  • ALB in Terraform — listeners, rules, target groups, ACM cert attachment
  • RDS in Terraform — subnet groups, parameter groups, Multi-AZ toggle per environment
  • CloudWatch log groups, metric alarms, and dashboards in Terraform
  • Terraform for_each and count — creating multiple similar resources cleanly
  • depends_on and resource ordering — avoiding race conditions on apply
  • Terraform module versioning — pinning modules for stability

Project: Dev and Production ECS Environments with Terraform

  • Single module codebase, two workspaces — dev (single AZ, smaller instances) and prod (Multi-AZ, larger)
  • Full stack — VPC, ALB, ECS Fargate, RDS, ACM, CloudWatch — all in Terraform
  • terraform plan output reviewed and applied cleanly for both environments

Class 4: Git, GitHub + CI/CD Fundamentals

  • Git fundamentals — commits, branches, merge vs rebase, resolving conflicts
  • Branching strategies — Gitflow vs trunk-based, what real teams actually use
  • GitHub Actions architecture — workflows, jobs, steps, runners
  • Triggers — push, pull_request, workflow_dispatch, schedule
  • Secrets and environment variables in GitHub Actions — repo secrets vs environment secrets
  • GitHub Environments — approval gates before deploying to production
  • Writing reusable workflows — workflow_call, composite actions
  • GitHub Actions matrix builds — testing across multiple versions in parallel
  • SDLC and Jira — how tickets flow from backlog to deployed feature in real teams

Project: Multi-Stage CI Pipeline with Automated Testing

  • GitHub Actions pipeline — lint → unit test → build Docker image → push to ECR
  • Pull request check — pipeline must pass before merge allowed
  • Matrix build testing across Node 18 and Node 20
  • Environment secrets for dev vs prod ECR repositories

Class 5: Automated ECS Deployments with GitHub Actions

  • Container image versioning — git SHA tagging, semantic versioning, latest anti-pattern
  • ECR lifecycle policies — cleaning up old images automatically
  • ECS deployment strategies — rolling update, blue-green via CodeDeploy
  • GitHub Actions ECS deploy — aws-actions/amazon-ecs-deploy-task-definition
  • Environment-specific workflows — dev deploys on merge to main, prod requires approval
  • Rollback strategy — redeploying previous task definition on failure
  • Deployment notifications — Slack alerts on success and failure
  • Testing in CI — running integration tests against a staging ECS environment before prod

Project: Automated build and Deployment Pipeline for ECS

  • GitHub Actions builds image on every merge, tags with git SHA, pushes to ECR
  • Automatic rollback if health checks fail within 5 minutes
  • Slack notification on deployment success, failure, and rollback

Class 6: ECS Auto Scaling + Load Testing + Monitoring

  • ECS service auto scaling — Application Auto Scaling, target tracking on CPU and ALB request count
  • ECS task-level scaling vs service-level scaling — understanding the difference
  • CloudWatch custom metrics — pushing app-level metrics from containers
  • CloudWatch dashboards — ECS service health, ALB latency, RDS connections in one view
  • CloudWatch alarms — composite alarms, alarm actions (SNS → email/Slack)
  • AWS X-Ray for distributed tracing on ECS — enabling, reading traces
  • Load testing with k6 or hey — simulating real traffic, finding bottlenecks
  • Reading CloudWatch Container Insights under load — what to look for

Project: Load Test + Auto Scaling + Monitoring Dashboard

  • Target tracking policy — scale ECS tasks when ALB request count per target exceeds threshold
  • Run k6 load test, watch tasks scale out, verify ALB distributes traffic
  • CloudWatch dashboard showing ECS CPU, ALB 5xx rate, RDS connection count, p99 latency
  • Composite alarm — fires Slack alert when both high CPU and high error rate occur together

Class 7: Three-Tier App with Advanced Terraform

  • Advanced Terraform modules — public registry vs private, module composition patterns
  • Data sources — referencing existing resources without hardcoding ARNs
  • Multi-environment strategy — dev, staging, prod with shared modules and separate state files
  • CloudFront in Terraform — distribution, origins, cache behaviours, OAC for S3
  • RDS in Terraform — automated backups, snapshot retention, Multi-AZ for prod
  • Disaster recovery in Terraform — cross-region Read Replica, automated snapshot copy
  • Terraform lifecycle blocks — prevent_destroy, create_before_destroy, ignore_changes
  • Terraform drift detection — terraform plan in CI to catch manual changes

Project: Three-Tier App with DR Strategy in Terraform

  • Full three-tier stack — CloudFront → ALB → ECS → RDS across dev, staging, prod
  • Cross-region RDS Read Replica in Terraform, automated snapshot copy to second region
  • CloudFront + S3 for static assets, ECS for API, RDS for data — all managed in Terraform
  • prevent_destroy on RDS and S3, create_before_destroy on ECS task definitions

Class 8: OIDC + Keyless Authentication + Security Hardening

  • Why access keys in CI/CD are dangerous — rotation burden, leak risk, audit gaps
  • OIDC fundamentals — how GitHub proves its identity to AWS without a password
  • Setting up OIDC provider in AWS IAM — thumbprint, audience, provider URL
  • Keyless Terraform in CI — aws-actions/configure-aws-credentials with OIDC
  • Fine-grained OIDC conditions — locking roles to specific repos, branches, environments
  • Separate IAM roles per environment — dev pipeline cannot touch prod resources
  • Permission boundaries on CI roles — hard ceiling on what any pipeline can do
  • Checkov and tfsec — scanning Terraform for misconfigurations in CI
  • Devsecops for pipelines — secret scanning, dependency auditing, SAST in GitHub Actions

Project: Fully Keyless CI/CD with OIDC + Security Scanning

  • Remove all AWS access keys from GitHub secrets — replace with OIDC role assumption
  • Separate OIDC roles for dev and prod — prod role requires environment approval gate
  • Checkov runs on every PR — blocks merge if critical Terraform misconfiguration found
  • tfsec integrated into pipeline, secret scanning enabled on the repo
  • Full end-to-end deploy — code push → OIDC auth → Terraform plan → ECS deploy — zero static credentials anywhere

Module: Python for Devops


Class 1: Python Fundamentals for DevOps

  • Virtual environments, pip, dependency pinning with requirements.txt
  • Data structures: lists, dicts, tuples, sets with real DevOps use cases
  • Loops, conditionals, list/dict comprehensions
  • os, pathlib — filesystem ops, env vars, path handling
  • subprocess — running shell commands, capturing output, return codes
  • Error handling — try/except/finally, custom exceptions, graceful failures
  • File I/O — reading/writing JSON, YAML config files
  • argparse — building CLI tools with flags and subcommands
  • Writing reusable functions, modules, and packages

Project: Production-grade DevOps CLI — multi-command tool that generates deployment reports, parses logs for errors/warnings, monitors disk usage with threshold alerts, outputs JSON or formatted table


Class 2: Boto3 Deep Dive + API Automation

  • requests module — GET, POST, PUT, DELETE, auth, retries with backoff
  • boto3 architecture — sessions, clients vs resources, regions, profiles
  • Paginating through AWS APIs with get_paginator
  • Python logging best practices — levels, formatters, rotating file handlers
  • JSON parsing and validation for API responses
  • Environment-based config management — .env, os.environ, secrets handling

Project 1: REST API CRUD script — full create/read/update/delete against a real API with auth, error handling, and rotating log file

Project 2: AWS Cloud Usage Report — pulls EC2 state/type/cost estimate, S3 bucket sizes, IAM users with key age, flags unused or stale resources, outputs CSV + terminal table


Class 3: Lambda Fundamentals + Event-Driven Architecture

  • Lambda execution model — cold starts, warm starts, concurrency limits
  • Function anatomy — handler, event object, context object
  • IAM roles and least-privilege security for Lambda
  • Triggers — S3, SQS, SNS, EventBridge (cron and event-based)
  • Environment variables and secrets management in Lambda
  • Lambda deployment — zip packaging, console vs CLI vs Terraform
  • Dead letter queues for failed invocation handling
  • CloudWatch Logs integration and structured logging from Lambda

Project 1: IAM key rotation automation — EventBridge cron triggers Lambda, detects keys older than N days, rotates them, emails report via SES

Project 2: Daily cloud cost report emailer — Lambda pulls billing metrics + resource summary, formats HTML email, sends via SES on schedule


Class 4: Lambda Advanced — Layers, Terraform, Multi-Lambda Workflows

  • Lambda Layers — packaging dependencies, sharing code across functions
  • Reserved and provisioned concurrency — when and why
  • Terraform for Lambda — packaging, deploying, versioning, IAM, triggers all in code
  • Multi-Lambda orchestration patterns — chaining, fan-out, fan-in
  • API Gateway + Lambda integration — proxy vs non-proxy, request/response mapping
  • Lambda testing locally with python-lambda-local and mocked events

Project: Multi-stage image processing pipeline — API Gateway receives upload → Lambda 1 validates and stores to S3 → Lambda 2 resizes/transforms → Lambda 3 sends notification. Full Terraform deployment with Layers for shared dependencies (Pillow, boto3 utils)


Class 5: Multilevel Image Processing Pipeline — Production Deep Dive

  • Designing multi-Lambda architectures for real workloads
  • S3 event triggers chained across multiple Lambda functions
  • Error handling between Lambda stages — retries, DLQ, alerting
  • Lambda concurrency management for high-throughput pipelines
  • Monitoring pipeline health with CloudWatch metrics and alarms
  • Cost optimization — right-sizing memory, reducing cold starts

Project: Production image processing API — client uploads image → S3 trigger → validate format/size → transform (resize, watermark, convert format) → store to clean bucket → SES notification with processed image link. Full error handling, DLQ for failed images, CloudWatch dashboard


Class 6: ClamAV File Scanning Automation for S3 Security

  • subprocess for running ClamAV scans from Python
  • ClamAV setup, freshclam for virus DB updates, scan result parsing (return codes)
  • S3 event notification → SQS → Python consumer pattern
  • Downloading files from landing bucket, scanning locally, routing based on result
  • S3 object tagging — Clean/Infected with put_object_tagging
  • Multi-account AWS architecture — landing account vs clean account
  • SES email alerts for infected files with file metadata
  • Production error handling — what happens if ClamAV crashes, S3 download fails, SQS message is malformed

Project: Banking Compliance File Scanner — S3 upload triggers SQS message → Python consumer downloads file → ClamAV scans → tags object Clean/Infected → routes clean files to processing bucket → blocks infected files → SES alert to security team with filename, bucket, timestamp, and scan output. Full logging, retry logic, and dead letter queue for failed scans


Class 7: RDS Cost Analysis + Migration Planning Automation

  • RDS cost breakdown — instance type, storage, IOPS, multi-AZ, snapshots
  • boto3 for pulling RDS metrics — CPU, connections, storage utilization via CloudWatch
  • Identifying migration candidates — underutilized instances, oversized storage
  • pg_dump and pg_restore from Python subprocess — full and schema-only dumps
  • pgsync for live data migration with minimal downtime
  • Data validation post-migration — row counts, checksum comparison, schema diff
  • Migration rollback strategy — when to cut back, how to keep source alive
  • Python script for pre-migration health check and post-migration validation report

Project: RDS Cost Analysis + Migration Readiness Report — connects to AWS account, pulls all RDS instances with cost estimates, CloudWatch utilization metrics, flags over/under-provisioned instances, generates migration candidate report with recommended target instance types and estimated savings. Outputs HTML report + CSV


Class 8: Live RDS Migration with ECS, Lambda, and Terraform — Full Implementation

  • Containerizing the migration script with Docker — Dockerfile, entrypoint, env vars
  • Deploying migration job on AWS ECS (Fargate task, not a long-running service)
  • Lambda trigger for the ECS task — one-click or scheduled migration kick-off
  • Terraform for the complete stack — ECS cluster, task definition, IAM roles, Lambda, EventBridge, VPC networking, security groups, RDS parameter groups
  • pgsync live migration — running inside ECS container with source and target RDS connections
  • CloudWatch log streaming from ECS task for real-time migration monitoring
  • SNS alerts for migration start, completion, failure
  • Post-migration validation Lambda — runs row count checks, sends final report via SES
  • Live demo — trigger migration from Lambda, watch ECS task run, validate data, confirm zero data loss

Project: End-to-end RDS Migration Platform — full Terraform-provisioned infrastructure, ECS Fargate runs the migration container with pgsync, Lambda triggers and monitors the job, CloudWatch dashboard shows live progress, SNS/SES sends status updates at each stage, post-migration validation script confirms data integrity. Production-ready, reusable for any PostgreSQL migration on AWS


Kubernetes Module — Weeks 12–19 (16 Classes)


Class 1: Kubernetes Architecture + Core Concepts

  • The why behind Kubernetes — what broke before it existed
  • Control plane deep dive — API server, etcd, scheduler, controller manager
  • Worker node components — kubelet, kube-proxy, container runtime
  • Core objects — Pod, ReplicaSet, Deployment, Service
  • Setting up Minikube locally, kubectl basics and everyday commands
  • YAML manifests in depth — apiVersion, kind, metadata, spec
  • ConfigMaps and Secrets — creating, mounting as env vars and volumes
  • Namespaces and resource organisation
  • Labels, selectors, annotations — how Kubernetes finds things
  • Resource requests and limits — why they matter in production
  • Kubernetes DNS and service discovery internals
  • ImagePullSecrets for private registries
  • Lens (Freelens) — Kubernetes IDE for visual cluster management

Project: Deploy a 2-tier e-commerce app (frontend + PostgreSQL) on Minikube — wired together with Services, ConfigMaps, Secrets, private image registry


Class 2: Resilience Patterns, Autoscaling + Live Debugging

  • Liveness, readiness, and startup probes — real failure scenarios
  • Rolling upgrades and rollback strategies
  • HPA and VPA — pod autoscaling based on CPU/memory/custom metrics
  • Init containers and sidecar patterns
  • Pod Disruption Budgets for zero-downtime deployments
  • Deployment strategies — Recreate vs RollingUpdate vs Blue-Green
  • CrashLoopBackOff, OOMKilled — live debugging techniques
  • Resource quotas and LimitRanges per namespace
  • Reading Kubernetes events to diagnose failures fast
  • StatefulSets intro — ordered deployment, stable network identity
  • DaemonSets and Jobs — when to use each
  • Persistent Volumes, PVCs, StorageClass — concepts and local demo

Project: Add HPA, probes, and PodDisruptionBudget to the e-commerce app. Simulate CrashLoopBackOff and OOMKilled failures live and debug them. Add a PostgreSQL StatefulSet with persistent local storage


Class 3: GitOps with ArgoCD + CI/CD Pipeline on Minikube

  • GitOps fundamentals — why GitOps over push-based deployments
  • ArgoCD setup on Minikube — apps, sync policies, health checks
  • End-to-end CI/CD pipeline — GitHub Actions builds image, ArgoCD deploys
  • ArgoCD app-of-apps pattern intro
  • Branching strategy for GitOps — app repo vs config repo separation
  • Rollback with ArgoCD — one-click vs automated
  • Basic Prometheus + Grafana on Minikube — request rate, pod health dashboards
  • Kubernetes events and alerts with AlertManager basics
  • Debugging failed ArgoCD syncs — common causes and fixes
  • Multi-environment GitOps intro — dev vs prod namespaces on same cluster

Project: GitHub Actions pipeline builds and pushes e-commerce image on every commit, ArgoCD auto-deploys to Minikube, basic Grafana dashboard showing pod health and request rate, rollback demonstrated live


Class 4: Production EKS Setup + Networking + Security Foundations

  • EKS cluster setup via eksctl and AWS console
  • EKS add-ons — VPC CNI, CoreDNS, EBS CSI Driver, kube-proxy
  • Helm — writing, packaging, deploying charts, values management
  • IRSA — Kubernetes to AWS IAM with OIDC, no hardcoded credentials
  • AWS Load Balancer Controller with Helm — architecture and annotations
  • Ingress for internal and external traffic routing
  • ExternalDNS for automatic Route53 record management
  • Domain, SSL/TLS termination with ACM
  • EKS managed node groups vs self-managed nodes — when to use which
  • Kubernetes RBAC — ServiceAccounts, ClusterRoles, RoleBindings, least privilege
  • aws-auth ConfigMap and RBAC for cluster access control
  • EKS managed add-on vs self-managed — upgrade strategies

Project: EKS cluster up with eksctl, AWS Load Balancer Controller and ExternalDNS deployed via Helm, RBAC hardened, custom domain with SSL termination working


Class 5: Running 3-Tier App on EKS + AWS Integrations

  • Running 3-tier app — frontend + backend + RDS PostgreSQL on EKS
  • Database migrations using Kubernetes Jobs
  • Init containers for DB connection readiness checks
  • IRSA in practice — backend pod accessing Secrets Manager without credentials
  • AWS Secrets Manager integration — External Secrets Operator pattern
  • Ingress rules for routing traffic to frontend vs backend
  • Health checks at load balancer level vs pod level
  • Blue-Green deployment on EKS with weighted routing
  • Namespace strategy for multi-tier apps
  • Real troubleshooting — ImagePullBackOff, pending pods, service not reachable

Project: Full 3-tier e-commerce app on EKS — frontend + Node.js backend + RDS PostgreSQL, IRSA for Secrets Manager, DB migration Job, custom domain, SSL, live troubleshooting of staged failures


Class 6: StatefulSets, Persistent Storage + Image Optimisation

  • StatefulSets deep dive — production patterns and failure recovery
  • PersistentVolume, PVC, StorageClass — static vs dynamic provisioning on EKS
  • EBS vs EFS — choosing the right storage for the workload
  • Headless Services for StatefulSet DNS resolution
  • Troubleshooting multi-attach volume errors and common StatefulSet failures
  • Volume snapshots and backup strategies on EKS
  • Multi-stage Docker builds — drastically smaller production images
  • Distroless and minimal base images for attack surface reduction
  • Trivy — container image vulnerability scanning in CI pipeline
  • Docker image optimisation — layer caching, build context, .dockerignore

Project: Add MinIO as a StatefulSet with persistent EBS storage to the e-commerce app for product image uploads. Rebuild all images with multi-stage builds, integrate Trivy in GitHub Actions, reduce image sizes by 60%+


Class 7: Production EKS with Terraform

  • Production EKS cluster with Terraform — VPC, subnets, node groups, add-ons
  • Terraform module structure for EKS — separation of concerns
  • Managing dev/staging/prod with Terraform workspaces
  • Deploying AWS Load Balancer Controller and ExternalDNS via Terraform
  • IRSA setup via Terraform — no manual console steps
  • Terraform state management for EKS — remote backend, locking
  • Importing existing EKS resources into Terraform state
  • Terraform drift detection on EKS infrastructure
  • Node group configuration — instance types, spot vs on-demand, taints and tolerations
  • EKS upgrade strategy with Terraform — node group rotation

Project: Rebuild the entire EKS cluster from scratch with Terraform — VPC, node groups, add-ons, IRSA, Load Balancer Controller, ExternalDNS all provisioned via code. Zero manual console steps


Class 8: Microservices on EKS + Advanced ArgoCD GitOps

  • Microservices design principles — bounded context, single responsibility
  • Splitting monolith into microservices — frontend, order, inventory, user service
  • Inter-service communication — ClusterIP vs headless vs service mesh
  • Network Policies for microservice traffic isolation between namespaces
  • Gateway API — advanced ingress routing vs traditional Ingress
  • ArgoCD App-of-Apps pattern — managing many services cleanly
  • ArgoCD ApplicationSet for environment promotion across dev/staging/prod
  • Helm chart per microservice — templating, values per environment
  • Matrix builds in GitHub Actions for multiple microservices
  • Reusable GitHub Actions workflows with composite actions
  • Kubecost/OpenCost — namespace-level cost attribution per service

Project: E-commerce app split into 4 microservices each with own Helm chart and ArgoCD Application, App-of-Apps managing all services, Gateway API routing, matrix CI/CD builds, OpenCost showing per-service spend


Class 9: Metrics, Logs + Dashboards

  • How observability works in real production companies
  • Prometheus — metrics collection, PromQL, scrape configs
  • Prometheus Operator and ServiceMonitor CRDs
  • Loki for log storage and querying — LogQL basics
  • Fluent Bit on EKS — log aggregation, filtering, routing to Loki
  • Grafana dashboards — Kubernetes cluster, app metrics, AWS resource metrics
  • AlertManager — routing alerts to Slack and PagerDuty with grouping and silencing
  • CloudWatch Container Insights integration alongside Prometheus
  • Monitoring differences — Fargate vs managed node groups
  • Cost visibility dashboard — RDS, Lambda, EKS node costs in Grafana

Project: Prometheus + Loki + Grafana + Fluent Bit deployed on EKS, Grafana dashboard showing order volume, error rates, DB query latency, AlertManager fires Slack alert when order service error rate crosses 1%, CloudWatch Container Insights alongside


Class 10: Distributed Tracing, SLOs + Advanced Alerting

  • OpenTelemetry for distributed tracing — instrumentation, collectors, exporters
  • Tracing a request across frontend → order service → inventory service → DB
  • Jaeger or Tempo as tracing backend — setup and querying
  • SLO and SLI definitions — what they mean in practice
  • Error budget dashboards in Grafana — how teams use them day to day
  • Multi-window, multi-burn-rate alerting for SLOs
  • AlertManager advanced — inhibition rules, routing trees, deduplication
  • Runbook links in alerts — connecting alert to action
  • Log-based alerting in Grafana with Loki rules
  • Observability for stateful services — what’s different about monitoring databases

Project: OpenTelemetry tracing across all e-commerce microservices, Tempo as backend, Grafana trace explorer showing end-to-end request flow, SLO dashboards with error budget burn rate, multi-window AlertManager rules for order service


Class 11: Service Mesh + Network Policies + Zero Trust

  • Service mesh fundamentals — why it exists, what problems it actually solves
  • Istio installation and architecture — control plane, data plane, sidecars
  • mTLS between all microservices — automatic, no code changes
  • Traffic management — VirtualService, DestinationRule, Gateway
  • Canary deployments with Istio traffic splitting — 10% to new version
  • Visualising service mesh traffic with Kiali
  • Network Policies for zero-trust pod-to-pod communication
  • Egress controls and namespace isolation
  • Pod topology spread constraints for multi-AZ resilience
  • Istio observability — built-in metrics, tracing integration with Jaeger

Project: Istio deployed on e-commerce EKS cluster, mTLS enforced between all microservices, canary deployment for order service routing 10% traffic to v2, Kiali showing live traffic topology, network policies blocking all non-essential pod communication


Class 12: Karpenter, EKS Auto Mode + Cost Optimisation

  • Karpenter architecture — how it differs from Cluster Autoscaler
  • NodePool and EC2NodeClass configuration
  • Cost optimisation with Spot + On-Demand mixed node fleets
  • Karpenter bin packing and consolidation policies — removing underutilised nodes
  • Taints, tolerations, and node selectors with Karpenter
  • EKS Auto Mode — what it is, when to use it over Karpenter
  • Pod topology spread constraints across AZs with Karpenter
  • Kyverno policy enforcement — blocking deployments without resource limits
  • OPA Gatekeeper vs Kyverno — when to use which
  • Pod security admission — restricted, baseline, privileged modes
  • Security contexts and pod security standards in practice

Project: Karpenter deployed on EKS replacing managed node group for inventory service, Spot instances with on-demand fallback, Kyverno policies enforced blocking any deployment without resource limits and liveness probes, pod security standards applied cluster-wide


Class 13: DevSecOps Pipeline + Runtime Security

  • DevSecOps on Kubernetes — shifting security left in the pipeline
  • Trivy in CI — blocking deployments with critical CVEs before they reach EKS
  • Checkov for Kubernetes manifest scanning — misconfiguration detection
  • Falco for runtime threat detection on EKS — rules, alerts, responses
  • SAST, DAST, SCA scanning in GitHub Actions for Kubernetes workloads
  • Secret scanning in Git — pre-commit hooks, GitHub secret scanning
  • OIDC with AWS and GitHub for keyless authentication in CI/CD
  • Image signing with Cosign — verifying images before deployment
  • Kyverno policy — block unsigned images from running on cluster
  • Security audit of the full e-commerce stack — what to look for, how to fix it

Project: Complete DevSecOps pipeline — Trivy image scan blocks critical CVEs, Checkov lints manifests before apply, Falco fires alerts on suspicious runtime activity, Cosign signs all images, Kyverno rejects unsigned images. All enforced in GitHub Actions before anything reaches EKS


Class 14: Advanced Live Troubleshooting + Kubernetes Interview Scenarios

  • Advanced kubectl — exec, debug, port-forward, ephemeral containers
  • Node-level debugging — SSH into node, containerd, crictl commands
  • Common production failure patterns — node pressure, evictions, DNS failures, RBAC misconfiguration, webhook timeouts
  • Debugging networking — pod-to-pod, pod-to-service, ingress, CNI issues
  • etcd health checks and control plane debugging on EKS
  • Interpreting kubectl describe, events, and pod logs together
  • Live debugging 10 real Kubernetes interview scenarios — staged failures on the e-commerce cluster
  • Kubernetes system design questions — walk through 5 real scenarios
  • How to think through and answer K8s design questions in interviews

Project: 10 staged production failures on the live e-commerce cluster — node pressure eviction, DNS resolution failure, RBAC misconfiguration, webhook timeout, CNI misconfiguration, OOMKill cascade, ImagePullBackOff at scale, HPA not scaling, PVC stuck in pending, ArgoCD sync loop. Debug and fix each live


Class 15: SRE Principles + War Room Simulations

  • SRE principles — SLO, SLI, SLA, error budgets in plain terms
  • On-call culture, runbook writing, and escalation paths
  • DORA metrics — deployment frequency, lead time, MTTR, change failure rate
  • Postmortem culture — blameless analysis, root cause identification
  • Chaos engineering basics with LitmusChaos — pod failure, network delay, CPU stress injection
  • War room simulation 1 — OOMKill cascade on order service, live diagnosis with Grafana + Loki, live fix, written RCA
  • War room simulation 2 — DB connection pool exhaustion under load, live fix, written postmortem
  • War room simulation 3 — Karpenter fails to provision nodes during traffic spike, cluster degraded, live recovery
  • Writing proper RCAs — structure, timeline, contributing factors, action items
  • Sprint planning and retrospectives for DevOps teams

Project: Three live war room simulations on the e-commerce cluster, all diagnosed using the full observability stack built in Week 16. Written RCA for each. LitmusChaos experiment injecting pod failure and network delay with documented blast radius and recovery steps

Class 16: Devsecops Advanced

  • Create a comprehensive security scanning pipeline
  • Creating checks before PR merges (production-style pipelines)
  • Build an enterprise-grade CI/CD pipeline
  • Security shift-left principles and SAST, DAST, SCA integration in CI/CD
  • Security scanning integration (SAST, DAST, SCA) and compliance workflows

Module: AIops

AI Tools for DevOps Engineers

  • Effective use of ChatGPT, GitHub Copilot, and Claude for DevOps tasks and prompt engineering
  • AI-assisted code generation, debugging, documentation, and infrastructure design
  • Project: Create AI-generated documentation and automation scripts

Intelligent Monitoring and Infrastructure Optimization

  • Machine learning for anomaly detection and intelligent alerting systems
  • AI for resource usage prediction, automated scaling decisions, and cost optimization
  • Project: Implement ML-based monitoring and optimization system

Advanced AIops Integration and Self-Healing Infrastructure

  • Self-healing infrastructure with AI decision making and intelligent deployment strategies
  • AI-powered performance analysis and advanced ChatOps with incident management
  • Handling live incidents with production-grade troubleshooting
  • troubleshooting scenarios for the interview

Project

  • Build a comprehensive AIops platform demonstration
  • Handling live incidents and documenting RCA

Resume, Portfolio + Final Interview Prep

  • Day-to-day of a Devops engineer, SRE
  • project requirements gathering, task allocation
  • How to present the bootcamp work on a resume — what to include, what to cut
  • Walking through the different projects in an interview — structure and talking points
  • Kubernetes scenario-based interview questions
  • System design interview
  • How senior engineers think about Kubernetes — trade-off questions, cost vs reliability vs complexity
  • Common mistakes candidates make in Kubernetes interviews and how to avoid them
  • GitHub portfolio cleanup — README structure, architecture diagrams, decision logs
  • Resume review framework — action verbs, metrics, project impact statements
  • What hiring managers actually look for in a DevOps/SRE candidate in 2026
  • Next steps after this bootcamp — what to learn, what to build, how to keep growing

Course Completion and Certification

Upon completion of all classes and associated projects, students will receive:

  • Advanced DevOps Practitioner Certificate with AIops specialization
  • Portfolio of 15+ real-world projects, including microservices on k8s and DevSecOps
  • GitHub repository showcasing all implemented solutions
  • Reference architecture diagrams and best practices documentation
  • Interview preparation and job placement assistance

This curriculum follows a logical, incremental learning path from Linux fundamentals to advanced Kubernetes projects, ensuring each concept builds upon previous knowledge

Reach out for Queries

  • Email:livingdevops@gmail.com
  • WhatsApp: +91 9259681620

Reach out for Queries, Part payment requests

50000 INR

Testimonials

Akhilesh has provided structured DevOps course details right from the beginning. I could see the detail oriented approach and his sincerity throughout those sessions. He was able to show what to expect and how to troubleshoot. The additional resources were also very helpful.
Your structure of topics & teaching method are really great. This help us to understand the realworld infrastructure and daytoday activities in devops well. Thankyou AKhilesh for sharing knowledge & experience.
One of the best Devops Project Course. Thanks Akhilesh. I loved the real time troubleshooting part, i hav never seen someone do this
I gained valuable hands-on experience and built confidence working with various DevOps tools, real-world projects, and practical implementations. He has been amazing always supportive, and continues to guide me even now. His guidance and deep technical knowledge have made a huge difference in my learning journey. couldn’t have asked for a better mentor.
Best knowledge has been shared/ thought by sir Akhilesh which will definitely help crack interviews in devops profile .
Akhilesh's DevOps Boot Camp delivered a genuine real-world experience that other platforms lack. It strengthened my practical skills and made me job-ready for real DevOps environments. This bootcamp really helped me understand the real world production environments, specially the live troubleshooting part. I was able to crack the interview and move to Devops from cloud support role.
I really liked the way you scheduled the calls and presented things. I particularly learned some new topics too. I also have to credit you for debugging things live, in real time when things break while you do, it was much needed. Totally appreciate your work Akhilesh. Thank you so much.
Akhilesh’s bootcamp was an excellent learning experience. Unlike others that only cover basic app deployments, he focused on real-world scenarios and practical implementations, which gave me a deeper understanding of how real projects are handled.
Akhilesh has provided structured DevOps course details right from the beginning. I could see the detail oriented approach and his sincerity throughout those sessions. He was able to show what to expect and how to troubleshoot. The additional resources were also very helpful.
Your structure of topics & teaching method are really great. This help us to understand the realworld infrastructure and daytoday activities in devops well. Thankyou AKhilesh for sharing knowledge & experience.
One of the best Devops Project Course. Thanks Akhilesh. I loved the real time troubleshooting part, i hav never seen someone do this
I gained valuable hands-on experience and built confidence working with various DevOps tools, real-world projects, and practical implementations. He has been amazing always supportive, and continues to guide me even now. His guidance and deep technical knowledge have made a huge difference in my learning journey. couldn’t have asked for a better mentor.
Best knowledge has been shared/ thought by sir Akhilesh which will definitely help crack interviews in devops profile .
Akhilesh's DevOps Boot Camp delivered a genuine real-world experience that other platforms lack. It strengthened my practical skills and made me job-ready for real DevOps environments. This bootcamp really helped me understand the real world production environments, specially the live troubleshooting part. I was able to crack the interview and move to Devops from cloud support role.
I really liked the way you scheduled the calls and presented things. I particularly learned some new topics too. I also have to credit you for debugging things live, in real time when things break while you do, it was much needed. Totally appreciate your work Akhilesh. Thank you so much.
Akhilesh’s bootcamp was an excellent learning experience. Unlike others that only cover basic app deployments, he focused on real-world scenarios and practical implementations, which gave me a deeper understanding of how real projects are handled.