Terraform Best Practices: Writing Code That Won’t Make You Cry Later – T12

Hey there, Terraform warriors! Welcome to the ultimate guide on Terraform best practices.

Whether you’re managing a handful of resources or orchestrating enterprise-scale infrastructure, this guide will help you write Terraform code that your future self will actually thank you for.

I’ve spent years building, breaking, and rebuilding Terraform configurations. Today, I’m sharing everything I’ve learned so you can skip the painful parts and jump straight to writing beautiful, maintainable infrastructure code.

Why Most Terraform Projects Turn Into Nightmares

Let’s be honest – we’ve all been there. Your Terraform project starts as a beautiful, simple configuration. Three months later? It’s a sprawling monster that no one dares to touch. Here’s what typically goes wrong:

  • Simple changes take days because the code is a tangled mess
  • Teams are afraid to deploy because something always breaks
  • New developers need weeks just to understand what’s going on
  • Production incidents happen due to preventable configuration mistakes
  • Module reusability is a myth because everything is hardcoded

But here’s the thing – it doesn’t have to be this way! With the right practices from day one, your Terraform code can be a joy to work with, even years down the line.

The 7 Deadly Sins of Terraform (And How to Avoid Them)

Sin #1: The 3,000-Line Monster File

The Problem: Everything crammed into one massive main.tf file.

Imagine opening a file with 3,000+ lines where VPC configuration blends into database setup, which morphs into Lambda functions. Finding anything becomes a treasure hunt, and God help you if two developers need to work on different parts simultaneously – merge conflicts will become your daily nightmare.

Why This Happens: It usually starts innocently. You begin with a simple VPC and a few EC2 instances. Then you add RDS. Then Lambda. Before you know it, you’re scrolling for minutes just to find that one security group rule.

The Better Way: Think of your infrastructure like a well-organized kitchen. You wouldn’t throw all your utensils, plates, and food in one drawer, would you? Apply the same principle:

  • Keep your main.tf under 100 lines – it should orchestrate, not implement
  • Create separate files for logical components (networking.tf, compute.tf, database.tf)
  • Use modules for reusable patterns
  • Follow the single responsibility principle

Golden Rule: If you’re scrolling to find something, it’s time to split the file.

Sin #2: Copy-Paste Environment Hell

The Problem: Separate folders with duplicated code for each environment.

This approach seems logical at first – “Let’s keep dev separate from prod!” But what happens when you need to add a new security group rule? You update dev, test it, then copy to staging, then to prod. Miss one? Congratulations, your environments are now different, and you won’t know until something breaks.

Why This Is Dangerous:

  • Configuration drift is inevitable
  • Security patches become a multi-day ordeal
  • Testing loses its meaning when environments differ
  • “But it works in staging!” becomes your team’s anthem

The Better Way: One codebase, multiple configurations. Think of it like a recipe – you don’t write three different recipes for small, medium, and large portions. You have one recipe and adjust the quantities.

Use variable files (tfvars) to handle environment-specific values while keeping your infrastructure definition DRY (Don’t Repeat Yourself). This ensures that when you fix a bug or add a feature, it’s automatically available across all environments.

Sin #3: Hardcoding Everything

The Problem: Values baked directly into resources like concrete.

When you hardcode values, you’re essentially writing a love letter to technical debt. That t3.medium instance type? What happens when you need t3.large in production? That AMI ID? It’s region-specific and will break the moment you deploy elsewhere.

Why This Kills Scalability:

  • Environment promotions require code changes
  • Multi-region deployments become impossible
  • Cost optimization requires touching every resource
  • You can’t share modules between projects

The Better Way: Variables are your friends, but smart variables are your best friends. Don’t just extract values into variables – add validation, descriptions, and sensible defaults. Think about:

  • What might change between environments?
  • What might change between regions?
  • What might change as the application scales?
  • What would a new team member need to know?

Use variable validation to catch errors early. If someone tries to deploy a t2.nano for a production database, wouldn’t you rather catch that during plan than at 3 AM when the site is down?

Sin #4: Local State Files (The Collaboration Killer)

The Problem: State files living on developer laptops.

This is like keeping the only copy of your house keys in your pocket while going swimming in the ocean. Local state files are disasters waiting to happen:

  • Laptop dies? Your infrastructure is now orphaned
  • Two developers run apply simultaneously? State corruption
  • Need to rollback? Hope you have that old state file somewhere
  • Audit requirements? Good luck explaining your “process”

Why Remote State Is Non-Negotiable: Remote state isn’t just about backup – it’s about collaboration, consistency, and confidence. With remote state:

  • State locking prevents concurrent modifications
  • Versioning allows rollbacks
  • Encryption keeps sensitive data secure
  • Team members can collaborate without fear

Implementation Tips:

  • Use S3 with DynamoDB for AWS (built-in locking)
  • Enable versioning on your state bucket
  • Implement proper IAM policies
  • Consider separate state files for different components
  • Always encrypt state at rest and in transit

Sin #5: The Count Trap

The Problem: Using count for creating multiple resources.

The count parameter seems innocent enough. Need three web servers? Just use count = 3. But here’s the trap: Terraform identifies counted resources by their index. Remove the middle server, and Terraform thinks the third server is now the second, triggering a destroy and recreate.

Real-World Horror Story: A team used count for their microservices. They removed one service from the middle of the list. Result? Half their production services were recreated, causing 30 minutes of downtime.

Why for_each Is Superior:

  • Resources are identified by keys, not position
  • Add, remove, or reorder without affecting others
  • More expressive and self-documenting
  • Works beautifully with maps and sets

When to Use What:

  • Use count only for truly identical resources (like read replicas)
  • Use for_each when resources have any unique properties
  • Use for_each when the list might change over time
  • Default to for_each when in doubt

Sin #6: Module Monoliths

The Problem: Creating “god modules” that do everything.

It’s tempting to create one module that handles your entire application stack. VPC, EKS, RDS, ElastiCache, S3, CloudFront – why not put it all together? Because you’ve just created an unmaintainable, inflexible monster that no one can reuse.

Why This Fails:

  • Can’t reuse parts of the module
  • Testing becomes nearly impossible
  • Version updates affect everything
  • 147 variables that no one understands
  • One size fits nobody

The Module Philosophy: Think of modules like LEGO blocks, not like pre-built castles. Each module should:

  • Do one thing well
  • Be composable with other modules
  • Have a clear interface (inputs/outputs)
  • Be testable in isolation
  • Be versioned independently

Module Best Practices:

  • Separate Repository: Each module gets its own repo for independent versioning
  • Semantic Versioning: Use vMajor.Minor.Patch for clear upgrade paths
  • Examples Included: Show how to use the module
  • Comprehensive Outputs: Expose what others might need
  • Optional Variables: Use Terraform’s optional() for backwards compatibility

Sin #7: Zero Documentation

The Problem: “The code is self-documenting!” (Narrator: It wasn’t.)

Six months later, no one knows why the RDS backup window is at 3:47 AM, why there are exactly 7 subnets, or what that weird IAM policy is for. The original developer has left, and now every change is a risky adventure.

What Documentation Should Answer:

  • Why did we make this choice? (not what – the code shows that)
  • What are the dependencies and prerequisites?
  • How does this integrate with other systems?
  • When should settings be changed?
  • Who should be contacted for questions?
  • How much will this cost to run?

Documentation Best Practices:

  • Document at the point of decision
  • Include cost estimates
  • Explain the “why” behind non-obvious choices
  • Add links to relevant documentation
  • Include recovery procedures
  • Document known limitations

Building Production-Ready Terraform Modules

Now that we’ve covered what NOT to do, let’s dive deep into building modules that are actually reusable, maintainable, and production-ready.

The Philosophy of Great Modules

Great modules aren’t just about organizing code – they’re about creating abstractions that make sense for your organization. Think of them as building blocks that encode your best practices, security requirements, and operational knowledge.

Module Design Principles

1. Single Responsibility Each module should have one clear purpose. A VPC module creates networking infrastructure. An RDS module creates databases. Don’t mix concerns.

2. Composability Over Completeness It’s better to have several focused modules that work together than one module that tries to do everything. This allows teams to mix and match based on their needs.

3. Convention Over Configuration Establish conventions and make them defaults. If your organization always uses specific tag names, encryption settings, or network configurations, build these into your modules.

4. Progressive Disclosure Make simple things simple and complex things possible. Use sensible defaults but allow overrides for advanced use cases.

Essential Module Components

Every production-ready module needs:

  1. Clear Interface – Well-defined inputs and outputs
  2. Validation – Catch errors early with variable validation
  3. Documentation – README with examples and explanations
  4. Testing – Automated tests to ensure functionality
  5. Versioning – Semantic versioning for safe upgrades

Working with Dynamic Blocks

Dynamic blocks are powerful but can make code harder to read. Use them when you need flexibility, but document their behavior clearly.

When to Use Dynamic Blocks:

  • Optional features (like logging or monitoring)
  • Variable numbers of similar configurations
  • Environment-specific settings

When to Avoid:

  • Simple on/off features (use ternary operators instead)
  • When it makes the code significantly harder to understand

Leveraging Terraform Functions

Terraform’s built-in functions are powerful tools for creating flexible modules:

  • lookup() – Safe map access with defaults
  • try() – Graceful handling of optional values
  • can() – Validation and conditional logic
  • merge() – Combining maps for tag strategies
  • optional() – Backwards-compatible variable schemas

Understanding these functions is crucial for building robust modules that handle edge cases gracefully.

Module Development Workflow

1. Start with the Interface

Before writing any resources, design your module’s interface:

  • What inputs does it need?
  • What outputs should it provide?
  • What are sensible defaults?

2. Build Incrementally

Start with the minimum viable module and add features gradually. This helps maintain simplicity and ensures each addition is necessary.

3. Test Early and Often

Write tests alongside your module:

  • Unit tests for logic
  • Integration tests for resource creation
  • Example configurations for documentation

4. Version Thoughtfully

Use semantic versioning:

  • Major: Breaking changes
  • Minor: New features (backwards compatible)
  • Patch: Bug fixes

5. Document as You Go

Documentation written after the fact is often incomplete. Document decisions when you make them.

Practical Module Patterns

The VPC Module Pattern

A good VPC module should:

  • Create consistent network layouts
  • Handle multiple availability zones
  • Provide flexible CIDR allocation
  • Output subnet IDs for other modules
  • Include sensible security defaults

The Application Module Pattern

Application modules should:

  • Accept VPC information as inputs
  • Create all necessary resources for one application
  • Handle environment differences through variables
  • Include monitoring and alerting
  • Provide connection information as outputs

The Data Store Module Pattern

Database modules should:

  • Enforce encryption at rest
  • Handle backup configurations
  • Manage security groups
  • Provide connection strings
  • Support high availability options

Security Best Practices That Actually Matter

Security in Terraform isn’t just about not hardcoding passwords. It’s about building security into every aspect of your infrastructure code.

Secrets Management Strategy

Never Store Secrets in Code or State

  • Use AWS Secrets Manager or HashiCorp Vault
  • Generate random passwords within Terraform
  • Use IAM roles instead of access keys
  • Mark sensitive outputs appropriately

State File Security

Your state file contains sensitive information. Protect it like you would protect your production database:

  • Encryption at Rest: Always encrypt state files
  • Encryption in Transit: Use HTTPS/TLS for state operations
  • Access Control: Limit who can read/write state
  • Audit Logging: Track all state access
  • Backup Strategy: Regular backups with retention policies

Network Security Patterns

Build security into your modules:

  • Default to least privilege
  • Use security group rules that reference other security groups
  • Implement network segmentation
  • Enable VPC Flow Logs
  • Use AWS WAF for public-facing applications

Testing Your Terraform Code

Testing isn’t optional for production infrastructure. Here’s a practical approach:

Static Analysis

Start with the basics:

  • terraform fmt – Consistent formatting
  • terraform validate – Syntax checking
  • tflint – Linting for best practices
  • tfsec or checkov – Security scanning

Integration Testing

Test that your modules actually create working infrastructure:

  • Use Terratest for automated testing
  • Create temporary test environments
  • Verify resources are created correctly
  • Test connectivity and functionality
  • Clean up after tests

Pre-commit Hooks

Catch issues before they enter your repository:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_docs
      - id: terraform_tflint
      - id: terraform_tfsec

CI/CD Pipeline Essentials

A good CI/CD pipeline for Terraform should include multiple stages. Let me break down each component:

Stage 1: Validation

First, validate your code syntax and formatting:

- name: Terraform Format Check
  run: terraform fmt -check -recursive

- name: Terraform Init
  run: terraform init -backend=false

- name: Terraform Validate
  run: terraform validate

This catches basic errors before moving forward.

Stage 2: Security Scanning

Next, scan for security issues:

- name: TFSec Security Scan
  uses: aquasecurity/tfsec-action@v1.0.3
  with:
    soft_fail: true

- name: Checkov Scan
  uses: bridgecrewio/checkov-action@master
  with:
    directory: .

These tools catch common security misconfigurations.

Stage 3: Planning

Generate and review the plan:

- name: Terraform Plan
  run: |
    terraform plan \
      -var-file=environments/${{ matrix.environment }}/terraform.tfvars \
      -out=tfplan \
      -no-color

Stage 4: Apply (with Approval)

For production, always require manual approval:

- name: Wait for Approval
  uses: trstringer/manual-approval@v1
  with:
    approvers: platform-team
    minimum-approvals: 1

- name: Terraform Apply
  if: github.ref == 'refs/heads/main'
  run: terraform apply tfplan

Common Pitfalls to Avoid

  • Over-engineering simple infrastructure
  • Under-documenting complex decisions
  • Ignoring costs until the bill arrives
  • Skipping tests to save time (you won’t)
  • Not versioning modules properly
  • Mixing concerns in modules
  • Forgetting about disaster recovery
  • Not planning for multi-region
  • Hardcoding account IDs or regions
  • Using latest for module versions

Real-World Migration Strategies

Migrating from Manual Infrastructure

When importing existing infrastructure:

  1. Inventory First: Document what exists before importing
  2. Import Gradually: Start with stateless resources
  3. Verify Continuously: Run plans after each import
  4. Add Management Features: Like tagging and monitoring
  5. Document Differences: Note any manual configurations

Migrating from Other IaC Tools

When moving from CloudFormation, CDK, or other tools:

  1. Run in Parallel: Keep both systems during migration
  2. Match Functionality: Ensure feature parity
  3. Migrate by Service: Don’t try to do everything at once
  4. Test Thoroughly: Especially data stores and stateful services
  5. Plan Rollback: Have a clear rollback strategy

What’s Next?

Congratulations! You now have a comprehensive playbook for writing Terraform code that scales. But the journey doesn’t end here.

Keep Learning:

  • Join the Terraform community (Discord, Reddit, HashiCorp forums)
  • Contribute to open-source Terraform modules
  • Share your own modules and learnings
  • Stay updated with new Terraform features

Advanced Topics to Explore:

  • Terraform Cloud/Enterprise for team collaboration
  • Policy as Code with Sentinel or OPA
  • GitOps with Flux or ArgoCD
  • Cost Management with Infracost
  • Multi-cloud strategies and patterns

Remember: Great infrastructure code isn’t about being clever – it’s about being clear, consistent, and maintainable. Start with the basics, improve incrementally, and always prioritize clarity over cleverness.

Happy Terraforming!


Found this guide helpful? Share it with your team and let me know what Terraform challenges you’re facing. I’d love to hear about your infrastructure journey!

Akhilesh Mishra

Akhilesh Mishra

I am Akhilesh Mishra, a self-taught Devops engineer with 11+ years working on private and public cloud (GCP & AWS)technologies.

I also mentor DevOps aspirants in their journey to devops by providing guided learning and Mentorship.

Topmate: https://topmate.io/akhilesh_mishra/