Stop Over-Complicating Terraform: This is How You Learn IAC for DevOps

Share

If you’re manually creating cloud resources in 2025, you’re doing it wrong.

Photo by Anastasia Zhenina on Unsplash

I see too many DevOps engineers clicking around cloud consoles, creating resources manually, and wondering why their infrastructure is a mess.

Most Devops folks come from an operations background, and any code (Not that I consider Terraform a coding language) of any kind sounds overwhelming to them at the start.

They know they should be using Infrastructure as Code, but they’re intimidated by Terraform’s syntax and concepts.


If you want to Learn Devops by building an advanced, real-world project, then consider joining my upcoming Advanced Devops Bootcamp.

20-Week Real-World Project-based AWS DevOps Bootcamp


You’re not ready for production deployments if you don’t understand Terraform.

Why Terraform Matters in DevOps

Every production infrastructure should be version-controlled. Every deployment should be repeatable. Every change should be reviewable. Terraform makes all of this possible.

Back in the day, I was managing AWS infrastructure through the console. Everything worked fine until I needed to recreate the entire setup in a new region.

I spent three days clicking through consoles, missing configurations, and debugging why things didn’t work the same way. One environment was subtly different from another, and I couldn’t even tell you how.

Don’t be that person.

The Terraform Skills That Actually Matter

Forget about memorizing every resource type and argument. Focus on the core concepts and workflows that you’ll use daily in production environments. Here are the 10 essential areas every DevOps engineer must master:

1. Understanding Infrastructure as Code: The Mindset Shift

Before you write a single line of Terraform, you need to understand what problem it solves. Infrastructure as Code means treating your infrastructure like software code.

Why This Matters

Traditional approach: Click through the console, create resources, and hope you remember what you did.

IaC approach: Write declarative code, version-control it, review changes, and apply consistently.

# This simple block creates an EC2 instance
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

tags = {
Name = "WebServer"
}
}

This code is:

  • Version controlled (you can see every change)
  • Reviewable (team can review before applying)
  • Repeatable (creates identical infrastructure every time)
  • Documented (the code IS the documentation)

2. Terraform Workflow: Plan, Apply, Destroy

Terraform has three core commands that you’ll use constantly. Understanding this workflow prevents costly mistakes.

The Holy Trinity

# 1. Initialize - download providers and modules
terraform init

# 2. Plan - preview what will change
terraform plan
# 3. Apply - make the changes
terraform apply

Never skip the plan step. I’ve seen engineers go straight to apply and accidentally delete production databases.

Real-World Workflow

# Start every project with init
terraform init

# Always plan before applying
terraform plan -out=tfplan
# Review the plan carefully
# Look for:
# - Resources being destroyed (red minus signs)
# - Unexpected changes
# - Resource replacements
# Only then apply
terraform apply tfplan
# When you need to clean up
terraform destroy

Production story: I was updating a database instance configuration. Ran plan and noticed Terraform wanted to destroy and recreate the database (force replacement). That would have deleted all data. Instead, I modified the code to avoid replacement, ran plan again, and applied safely.

Understanding Plan Output

Terraform will perform the following actions:
  # aws_instance.web will be created
+ resource "aws_instance" "web" {
+ ami = "ami-12345"
+ instance_type = "t2.micro"
}
  # aws_instance.db will be destroyed
- resource "aws_instance" "db" {
- ami = "ami-67890"
- instance_type = "t2.small"
}
Plan: 1 to add, 0 to change, 1 to destroy.

Learn to read this output:

  • + means creating some resources
  • - means deleting them
  • ~ means modifying/updating the existing resources
  • -/+ means destroy and recreate (DANGEROUS)

3. Provider Configuration: Connecting to Cloud Platforms

Providers are Terraform’s way of talking to cloud platforms. Every Terraform project starts with provider configuration.

AWS Provider Basics

# Required provider configuration
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Configure the AWS provider
provider "aws" {
region = "us-east-1"
}

Version pinning is critical. The ~> 5.0 means “any version 5.x.x but not 6.0.0”. This prevents breaking changes from auto-updating.

Multi-Region Setup

# Default provider
provider "aws" {
region = "us-east-1"
}

# Secondary region
provider "aws" {
alias = "west"
region = "us-west-2"
}

And you can choose which provider to use to create the resources

# Use specific provider
resource "aws_instance" "west_server" {
provider = aws.west
ami = "ami-12345"
instance_type = "t2.micro"
}

Real scenario: Building a disaster recovery setup. Primary infrastructure in us-east-1, backup in us-west-2. Used provider aliases to manage both regions from one codebase.

Other Common Providers

Terraform has providers for almost anything that exposes an API

# Azure
provider "azurerm" {
features {}
}

# Google Cloud
provider "google" {
project = "my-project"
region = "us-central1"
}
# Kubernetes
provider "kubernetes" {
config_path = "~/.kube/config"
}

4. Resources: The Building Blocks

Resources are actual infrastructure components. This is where you define what gets created.

Basic Resource Structure

resource "resource_type" "resource_name" {
argument1 = "value1"
argument2 = "value2"

# Nested blocks
nested_block {
nested_arg = "value"
}
}

Let me show you how to create some most common resources

AWS EC2 Instance, Security group and RDS instace

resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
key_name = "my-keypair"

vpc_security_group_ids = [aws_security_group.web_sg.id]
subnet_id = aws_subnet.public.id

user_data = <<-EOF
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "Hello from Terraform" > /var/www/html/index.html
EOF

tags = {
Name = "WebServer"
Environment = "Production"
ManagedBy = "Terraform"
Repo = "one-odd-gh-repo"
}
}

Pro tip: Always tag resources with ManagedBy = "Terraform" So you know what’s managed by code versus manually created. Also, Repo = “one-odd-gh-repo” so you know which resources is created from which code(You will be using multiple Terraform repos in real life)

RDS instace

resource "aws_db_instance" "postgres" {
identifier = "myapp-db"
engine = "postgres"
engine_version = "14.7"
instance_class = "db.t3.micro"
allocated_storage = 20

db_name = "myappdb"
username = "admin"
password = var.db_password # Never hardcode passwords!

vpc_security_group_ids = [aws_security_group.db_sg.id]
db_subnet_group_name = aws_db_subnet_group.main.name

backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "myapp-db-final-snapshot"

tags = {
Name = "MyApp-Database"
}
}

Security Group Configuration

resource "aws_security_group" "web_sg" {
name = "web-server-sg"
description = "Security group for web server"
vpc_id = aws_vpc.main.id

# Allow HTTP
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

# Allow HTTPS
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

tags = {
Name = "web-server-sg"
}
}

Security tip: Never use 0.0.0.0/0 for SSH ingress in production. Always restrict to known IP ranges, security groups.

5. Variables: Making Code Reusable

Variables make your Terraform code flexible and reusable across environments. This is how you avoid hardcoding values.

Variable Declaration

most common way to use varibales (Just a followed practice) is to keep variables in a file called variables.tf

variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t2.micro"
}

variable "instance_count" {
description = "Number of instances to create"
type = number
default = 1
}
variable "enable_monitoring" {
description = "Enable detailed monitoring"
type = bool
default = false
}

# List type variable
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}

# Map type
variable "tags" {
description = "Common tags for resources"
type = map(string)
default = {
ManagedBy = "Terraform"
}
}

Using Variables

# main.tf
resource "aws_instance" "web" {
ami = "ami-12345"
instance_type = var.instance_type

tags = merge(
var.tags,
{
Name = "web-${var.environment}"
Environment = var.environment
}
)
}

Providing Variable Values

Method 1: Command line

terraform apply -var="environment=production" -var="instance_type=t2.medium"

Method 2: terraform.tfvars file

# terraform.tfvars
environment = "production"
instance_type = "t2.medium"
instance_count = 3

Method 3: Environment-specific files

# prod.tfvars
environment = "production"
instance_type = "t2.large"
instance_count = 5
# dev.tfvars
environment = "development"
instance_type = "t2.micro"
instance_count = 1

And apply with a specific file

terraform apply -var-file=prod.tfvars

I maintain separate tfvars files for each environment (dev.tfvars, staging.tfvars, prod.tfvars) with the same variable structure but different values. Single codebase, multiple environments.

Sensitive Variables

variable "db_password" {
description = "Database password"
type = string
sensitive = true
}
# Terraform will hide this value in output and logs

Never commit sensitive values to version control. Use environment variables or secret management tools.

6. Outputs: Extracting Information

Outputs let you extract information from Terraform to use elsewhere. Essential for connecting different parts of your infrastructure.

Basic Output Syntax

# outputs.tf
output "instance_public_ip" {
description = "Public IP of the web server"
value = aws_instance.web.public_ip
}

output "database_endpoint" {
description = "Database connection endpoint"
value = aws_db_instance.postgres.endpoint
sensitive = true
}
output "security_group_id" {
description = "ID of the web security group"
value = aws_security_group.web_sg.id
}

Viewing Outputs

# Apply and see outputs automatically
terraform apply
# Query specific output
terraform output instance_public_ip
# Get outputs in JSON format
terraform output -json
# Use in scripts
PUBLIC_IP=$(terraform output -raw instance_public_ip)
echo "Connect to server: ssh ubuntu@$PUBLIC_IP"

Complex Output Examples

# List of instance IPs
output "instance_ips" {
value = aws_instance.web[*].public_ip
}

# Map of instance names to IPs
output "instance_map" {
value = {
for instance in aws_instance.web :
instance.tags.Name => instance.public_ip
}
}
# Formatted connection string
output "db_connection_string" {
value = "postgresql://${aws_db_instance.postgres.username}@${aws_db_instance.postgres.endpoint}/${aws_db_instance.postgres.db_name}"
sensitive = true
}

I created a VPC infrastructure with Terraform, outputted subnet IDs. Used those outputs as input variables in another Terraform project deploying applications.

This is how you build modular infrastructure.

7. State Management: The Most Critical Concept

State is how Terraform tracks what it has created. Understanding the state prevents infrastructure disasters.

What is State?

Terraform stores a mapping between your code and real infrastructure in a state file (terraform.tfstate). This file contains:

  • Resource IDs and attributes
  • Dependencies between resources
  • Metadata about your infrastructure

The state file is the source of truth. Lose it, and Terraform can’t manage your infrastructure anymore.

Local State (Don’t Use in Production)

Terraform stores the state file locally by default. And you are never going to use that way in production.

Note: Statefile is created after the first apply.

Problems with the local state:

  • Not shared between team members
  • No locking (two people can run Terraform simultaneously)
  • Easy to lose or corrupt
  • Contains sensitive data in plain text

Remote State (Always Use This)

And now S3 also supports state locking by default, no more DynamoDB to enable locking (Not sure why s3 didn’t have it before)

# backend.tf
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
}
}

Benefits of remote state:

  • Shared across the team
  • Automatic locking prevents concurrent modifications
  • Encrypted storage
  • Version history

Setting Up S3 Backend

# Create S3 bucket for state
aws s3api create-bucket \
--bucket my-terraform-state \
--region us-east-1

# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-terraform-state \
--versioning-configuration Status=Enabled

State Commands You’ll Need

# List resources in state
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Move resource to different name
terraform state mv aws_instance.web aws_instance.web_server

# Remove resource from state (keeps real resource)
terraform state rm aws_instance.old

# Pull current state
terraform state pull > backup.tfstate

# Import existing resource
terraform import aws_instance.web i-1234567890abcdef0

Emergency scenario: Someone manually deleted a resource in the console. Terraform still thinks it exists. Run terraform state rm to remove it from state, then terraform apply to recreate it.

8. Data Sources: Using Existing Resources

Data sources let you reference resources that already exist but aren’t managed by your Terraform code.

Fetching Existing Resources

# Get the latest Amazon Linux AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]

filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}

And you can easily use the data sources from this to other resource blocks

# Use the AMI in an instance
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
}

No more hardcoding AMI IDs that become outdated!

Common Data Sources

# Get current AWS account ID
data "aws_caller_identity" "current" {}

output "account_id" {
value = data.aws_caller_identity.current.account_id
}
# Get existing VPC
data "aws_vpc" "existing" {
tags = {
Name = "production-vpc"
}
}
# Get availability zones
data "aws_availability_zones" "available" {
state = "available"
}
# Use in resource
resource "aws_subnet" "public" {
count = 2
vpc_id = data.aws_vpc.existing.id
availability_zone = data.aws_availability_zones.available.names[count.index]
cidr_block = cidrsubnet(data.aws_vpc.existing.cidr_block, 8, count.index)
}

Real scenario: You need to deploy the application in the existing VPC created by another team. Well, use the data sources to fetch VPC and subnet IDs instead of hardcoding or asking for values.

Conditional Resource Creation

# Only create VPC if one doesn't exist
data "aws_vpcs" "existing" {
filter {
name = "tag:Name"
values = ["my-vpc"]
}
}

resource "aws_vpc" "main" {
count = length(data.aws_vpcs.existing.ids) == 0 ? 1 : 0
cidr_block = "10.0.0.0/16"

tags = {
Name = "my-vpc"
}
}

9. Modules: Building Reusable Components

Modules are reusable Terraform configurations. Instead of copying code, you package it as a module and use it multiple times.

Basic Module Structure

modules/
└── web-server/
├── main.tf
├── variables.tf
└── outputs.tf

Creating a Module

# modules/web-server/main.tf
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = var.instance_type

vpc_security_group_ids = [aws_security_group.web.id]

tags = {
Name = "${var.name_prefix}-web-server"
}
}

resource "aws_security_group" "web" {
name = "${var.name_prefix}-web-sg"
description = "Security group for web server"

ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# modules/web-server/variables.tf
variable "ami_id" {
type = string
}
variable "instance_type" {
type = string
}
variable "name_prefix" {
type = string
}
# modules/web-server/outputs.tf
output "instance_id" {
value = aws_instance.web.id
}
output "public_ip" {
value = aws_instance.web.public_ip
}

Using a Module

# main.tf
module "production_web" {
source = "./modules/web-server"

ami_id = "ami-12345"
instance_type = "t2.medium"
name_prefix = "prod"
}

module "staging_web" {
source = "./modules/web-server"

ami_id = "ami-12345"
instance_type = "t2.micro"
name_prefix = "staging"
}
# Access module outputs
output "prod_web_ip" {
value = module.production_web.public_ip
}
output "staging_web_ip" {
value = module.staging_web.public_ip
}

One module definition, multiple uses with different configurations!

Using Public Modules

# Using Terraform Registry modules
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"

name = "my-vpc"
cidr = "10.0.0.0/16"

azs = ["us-east-1a", "us-east-1b"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]

enable_nat_gateway = true
enable_vpn_gateway = false

tags = {
Environment = "production"
}
}

Pro tip: Browse the Terraform Registry (registry.terraform.io) for well-maintained modules instead of writing everything from scratch. My favourites are AWS EKS and Lambda one.

10. Count and For_Each: Creating Multiple Resources

When you need multiple similar resources, count and for_each prevent repetitive code.

Using Count

# Create 3 identical instances
resource "aws_instance" "web" {
count = 3
ami = "ami-12345"
instance_type = "t2.micro"

tags = {
Name = "web-server-${count.index}"
}
}

# Reference with index
output "instance_ips" {
value = aws_instance.web[*].public_ip
}

Using For_Each (More Flexible)

# Create instances with specific names
resource "aws_instance" "servers" {
for_each = toset(["web", "api", "worker"])

ami = "ami-12345"
instance_type = "t2.micro"

tags = {
Name = "${each.key}-server"
}
}

# Reference by key
output "web_server_ip" {
value = aws_instance.servers["web"].public_ip
}

Advanced For_Each with Maps

variable "instances" {
type = map(object({
instance_type = string
ami = string
}))
default = {
web = {
instance_type = "t2.medium"
ami = "ami-12345"
}
api = {
instance_type = "t2.small"
ami = "ami-67890"
}
}
}

resource "aws_instance" "servers" {
for_each = var.instances

ami = each.value.ami
instance_type = each.value.instance_type

tags = {
Name = "${each.key}-server"
}
}

If you need to create different EC2 instances for web, API, and worker roles, each with different instance types and AMIs, you can use for_each with a map to manage all three from one resource block.

Conditional Resources

You can use count as a conditional. Terraform doesn’t really have conditions like other programming languages do; it uses the ternary operator syntax

Condition ? do if condition true> : <do this if false>
count = 0 -> don’t create
count = 1 -> do create

# Create resource only in production
resource "aws_instance" "bastion" {
count = var.environment == "production" ? 1 : 0

ami = "ami-12345"
instance_type = "t2.micro"

tags = {
Name = "bastion-server"
}
}

The Commands You’ll Use Every Day

Here are the Terraform commands I use most frequently in production:

Daily Operations

# Initialize and validate
terraform init
terraform fmt # Format code
terraform validate # Check syntax

# Planning and applying
terraform plan -out=tfplan # Create execution plan
terraform apply tfplan # Apply saved plan
terraform apply -auto-approve # Skip confirmation (use carefully!)
# Specific resource targeting
terraform plan -target=aws_instance.web
terraform apply -target=aws_instance.web
# Different environments
terraform plan -var-file=prod.tfvars
terraform apply -var-file=prod.tfvars

State Investigation

# Explore state
terraform state list
terraform state show aws_instance.web
terraform show # Show entire state in readable format

# State management
terraform state mv aws_instance.old aws_instance.new
terraform state rm aws_instance.temporary
terraform import aws_instance.existing i-1234567890

Workspace Management

# Work with multiple environments
terraform workspace list
terraform workspace new production
terraform workspace select production
terraform workspace show

Workspaces let you manage multiple states from the same code. Production and staging can share code but have separate states.

Emergency Commands

# Refresh state without making changes
terraform refresh

# Unlock state if it gets stuck
terraform force-unlock LOCK_ID
# Taint resource to force recreation
terraform taint aws_instance.web
terraform apply
# Untaint if you changed your mind
terraform untaint aws_instance.web
# Show dependency graph
terraform graph | dot -Tpng > graph.png

Best Practices That Will Save Your Career

Always Use Version Control

# Initialize git repository
git init
echo "*.tfstate*" >> .gitignore
echo "*.tfvars" >> .gitignore
echo ".terraform/" >> .gitignore
git add .
git commit -m "Initial Terraform configuration"

Never commit:

  • State files (.tfstate)
  • Variable files with secrets (*.tfvars)
  • .terraform directory

Structure Your Code

project/
├── main.tf # Main resources
├── variables.tf # Variable declarations
├── outputs.tf # Outputs
├── backend.tf # Backend configuration
├── provider.tf # Provider configuration
├── terraform.tfvars # Variable values (gitignored)
└── modules/ # Custom modules
└── network/
├── main.tf
├── variables.tf
└── outputs.tf

Use Meaningful Names

# Bad - unclear names
resource "aws_instance" "i1" {
# ...
}
# Good - descriptive names
resource "aws_instance" "production_web_server" {
# ...
}

Always Plan Before Applying

# Wrong workflow
terraform apply -auto-approve # DANGEROUS!

# Correct workflow
terraform plan -out=tfplan # Review changes
# Review the plan output carefully
terraform apply tfplan # Apply reviewed plan

Use Remote State

# Never do this in production
# (local state by default)

# Always do this
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}

Separate Environments

infrastructure/
├── dev/
│ ├── main.tf
│ └── dev.tfvars
├── staging/
│ ├── main.tf
│ └── staging.tfvars
└── prod/
├── main.tf
└── prod.tfvars

Or use workspaces:

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

Document with Comments

# Security group for web servers
# Allows HTTP/HTTPS from anywhere
# Allows SSH only from office IP
resource "aws_security_group" "web_sg" {
name = "web-sg"
description = "Security group for web tier"

# HTTP access
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "Allow HTTP from internet"
}

# SSH access - restricted
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["203.0.113.0/24"]
description = "Allow SSH from office"
}
}

Common Mistakes to Avoid

Mistake 1: Not Using Variables

# Wrong - hardcoded values everywhere
resource "aws_instance" "web" {
ami = "ami-12345"
instance_type = "t2.micro"
}

resource "aws_instance" "api" {
ami = "ami-12345"
instance_type = "t2.micro"
}
# Right - use variables
variable "ami_id" {
default = "ami-12345"
}
variable "instance_type" {
default = "t2.micro"
}
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = var.instance_type
}

Mistake 2: Ignoring State File Security

# State files contain:
# - Resource IDs
# - IP addresses
# - Passwords and secrets in plain text
# Always:
# - Use remote encrypted backend
# - Never commit state to git
# - Limit access to state bucket

Mistake 3: Not Using Modules

# Wrong - repeating code everywhere
# 500 lines of repeated VPC configuration
# in every project
# Right - create module once, use everywhere
module "vpc" {
source = "../modules/vpc"

environment = var.environment
cidr_block = var.vpc_cidr
}

Mistake 4: Applying Without Planning

# This is how you accidentally delete production databases
terraform apply -auto-approve
# Always review first
terraform plan
# Read the output carefully
# Understand what will change
terraform apply

What’s Next?

Master these Terraform fundamentals, and you can manage any infrastructure as code. But here’s what most people miss: Terraform is just one piece of the DevOps puzzle. The real power comes from combining tools.

# Complete DevOps workflow
1. Write Terraform code
2. Store in Git
3. Create CI/CD pipeline (GitHub Actions, GitLab CI)
4. Pipeline runs terraform plan on PRs
5. Approve and merge
6. Pipeline runs terraform apply
7. Monitor infrastructure with CloudWatch/Prometheus

And remember: everything builds on these fundamentals. When your infrastructure deployment fails on Friday evening(While thinking about sipping beers with friends), you’ll debug using these same core concepts.

Connect with me:

LinkedIn: https://www.linkedin.com/in/akhilesh-mishra-0ab886124/ Twitter: https://x.com/livingdevops

Thanks for reading, see you at the next one.

If you found this article useful, do clap, comment, share, follow, and subscribe.

Share
Akhilesh Mishra

Akhilesh Mishra

I am Akhilesh Mishra, a self-taught Devops engineer with 11+ years working on private and public cloud (GCP & AWS)technologies.

I also mentor DevOps aspirants in their journey to devops by providing guided learning and Mentorship.

Topmate: https://topmate.io/akhilesh_mishra/