Click. Click. Click. That’s how I used to create AWS resources. Log into the console, pick a region, scroll through a dozen dropdown menus, type in a name, click “Create.” Need a subnet? Click, click, click some more. Security group? You already know. I had a pretty solid routine going — twenty minutes of careful clicking, a mental checklist I’d memorized, and the quiet confidence of a man who believed he had everything under control.
Then I deleted the wrong one.
Not a test resource. Not a sandbox experiment. A production VPC that had three EC2 instances, two RDS databases, and a NAT gateway costing us about 14,000 rupees a month. Gone in a single click. No confirmation dialog (or maybe there was one and I just muscle-memoried through it). My manager in Pune called me within four minutes. Four minutes. I didn’t even know he monitored the Slack alerts that closely.
I spent the next six hours rebuilding everything by hand. Console tab after console tab. CIDR blocks scribbled on the back of a notebook. Security group rules copied from screenshots a colleague had taken weeks earlier for an unrelated reason. By 11 PM, we were back online, but I’d aged about five years. And somewhere in the middle of all that frantic clicking, a thought crystallized: there has to be a better way to do this.
That better way, it turned out, was something called Infrastructure as Code. And the tool that made it click for me — pun intended — was Terraform.
What Broke Inside My Head That Night
Before I explain what Terraform actually does, let me tell you what changed in my thinking. Because the tool is just a tool. What matters is the idea behind it.
When you create infrastructure manually through a web console, you’re performing a series of actions. Each action changes the state of your cloud environment, and the only record of what you did lives in your memory, maybe some notes, and the console’s own audit logs (which, honestly, most people never check). If something goes wrong, you have to remember what you did. If someone else needs to recreate the same setup, you have to explain it step by step. If you need to build the exact same thing in a different region — say, migrating from Mumbai to Singapore for a client — you’re starting from scratch.
Infrastructure as Code flips all of that. Instead of clicking buttons, you write configuration files that describe what you want. A VPC with this CIDR block. Two public subnets. An internet gateway. A security group allowing HTTP and SSH. An EC2 instance running Ubuntu. You write it down in files, and a tool reads those files and makes it real. Want the same setup in another region? Change one variable. Want to destroy everything? One command. Want to know what changed between Tuesday and Friday? Run a diff on the files, same as you would with application code.
I remember the first time I showed this to a junior developer on our team — she’d been doing AWS console work for about three months and looked physically relieved when she realized she didn’t have to memorize subnet configurations anymore. “So it’s like a recipe?” she asked. Yeah. Sort of. Except the recipe also cooks itself.
Why Terraform, Specifically
Now, Infrastructure as Code isn’t a single product. AWS has CloudFormation. Azure has ARM templates (and now Bicep). Google Cloud has Deployment Manager. Pulumi lets you write infrastructure in Python or TypeScript. Each of these has strengths. But I picked Terraform, and from what I’ve seen, most startups in Bengaluru and Hyderabad that I’ve talked to have done the same.
Why? Because Terraform works with everything. AWS, Azure, GCP, DigitalOcean, Cloudflare, even on-prem VMware setups. One language. One workflow. You don’t lock yourself into a single cloud provider’s tooling. HashiCorp (the company behind Terraform) created a language called HCL — HashiCorp Configuration Language — and while the name sounds enterprise-y, the syntax is actually quite readable. Maybe even pleasant, once you get used to it.
Terraform is declarative. You don’t tell it “create a VPC, then create a subnet inside that VPC, then attach an internet gateway.” You just describe the end state: “I want a VPC, subnets, and a gateway, and here’s how they should be connected.” Terraform figures out the order of operations on its own. It maintains a state file that tracks what currently exists, compares it to what you’ve declared in your code, and applies only the changes needed. Changed the instance type from t3.micro to t3.small? Terraform sees the difference, plans the change, and asks you to confirm before doing anything.
I think that’s the part that won me over. After deleting that production VPC, the idea of a tool that says “here’s what I’m about to do — do you approve?” felt like a safety net I desperately needed.
Getting Terraform on My Machine
Alright, story time is fun but let’s actually build something. Here’s where I started, and where you probably should too.
First, install Terraform. On my Ubuntu workstation at the office, this was the process:
# Install Terraform (Linux/macOS)
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
# Verify installation
terraform version
# Configure AWS CLI (Terraform uses these credentials)
aws configure
# Enter: AWS Access Key ID, Secret Key, Region (ap-south-1 for Mumbai)
Took about three minutes. Terraform printed its version number. I set up my AWS credentials pointing at ap-south-1 (Mumbai), which is probably where you want to be if you’re serving Indian users and want the lowest latency. My AWS bill already had about 2,500 rupees of free-tier stuff on it, and I wanted to keep costs down while experimenting.
One thing I wish someone had told me upfront: Terraform itself doesn’t cost anything. It’s open source. You pay for the infrastructure it creates, not the tool. There’s a paid product called Terraform Cloud for team collaboration, but for learning and even small production setups, the CLI is all you need.
How I Organized My First Real Project
My first attempt at Terraform was a single massive file called main.tf with everything jammed inside it. VPC, subnets, security groups, EC2 instances, outputs — 400 lines in one file. It worked. But it was miserable to scroll through, and when I tried to show it to a colleague, he winced. “Split it up,” he said. “One concern per file.”
Here’s the structure we settled on, and the one I’ve used on every project since:
infra/
main.tf # Provider and backend configuration
variables.tf # Input variable declarations
vpc.tf # VPC, subnets, route tables
security.tf # Security groups
compute.tf # EC2 instances
outputs.tf # Output values
terraform.tfvars # Variable values (DO NOT commit secrets)
Simple. Each file has a job. When something breaks in your networking layer, you open vpc.tf. When you need to change instance sizes, you go to compute.tf. When your team lead asks “what variables does this project accept?” you hand them variables.tf. No hunting through a monolith.
And that last file — terraform.tfvars — deserves a warning in bold. Never commit it to Git if it contains secrets. I’ve seen AWS keys in public GitHub repos. I’ve seen them scraped and exploited within minutes. Put terraform.tfvars in your .gitignore on day one. Not day two. Day one.
Writing the Foundation: Provider and State
Every Terraform project starts with telling it two things: which cloud provider to talk to, and where to store the record of what it’s built. Here’s what my main.tf looked like:
terraform {
required_version = ">= 1.7.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Remote state storage (recommended for teams)
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "prod/infrastructure.tfstate"
region = "ap-south-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
}
Let me walk through what happened when I first wrote this. The required_version line pins the minimum Terraform version. I learned this the hard way when a colleague running an older version tried to apply my code and got cryptic errors. Pin your version. Always.
The provider block tells Terraform we’re using AWS, and the ~> 5.0 version constraint means “any 5.x version but not 6.0.” You want this flexibility for patch updates but not major breaking changes. The default_tags section is something I discovered on a Stack Overflow thread maybe six months in — it automatically applies tags to every resource Terraform creates. Tagging matters more than you think, especially when your AWS bill shows a line item and you can’t figure out which project spawned it.
And then there’s the backend. Ah, the backend. My first project stored state locally — a file called terraform.tfstate sitting on my laptop. Fine for solo experiments. Terrible for teams. If two people run terraform apply at the same time, both working off their own local state files, you get collisions. Resources created twice. Drift. Chaos.
Remote state in S3 with DynamoDB locking solves this. One state file, one source of truth, and a lock that prevents concurrent modifications. Setting up the S3 bucket and DynamoDB table is a bit of a chicken-and-egg problem (you need infrastructure to store the state of your infrastructure), but I just created those two resources manually in the console. Ironic? Maybe. Practical? Definitely.
Variables: Making the Whole Thing Flexible
Hard-coding values is tempting. Quick. Gets things working. And then you need a staging environment that’s slightly different from production, and suddenly you’re copy-pasting files and changing strings by hand, which is exactly the sort of thing Terraform is supposed to save you from.
So I learned to use variables. Here’s my variables.tf:
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "ap-south-1"
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "project_name" {
description = "Project name used for resource naming"
type = string
default = "byteyogi"
}
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
variable "key_pair_name" {
description = "Name of the SSH key pair"
type = string
}
Notice the validation block on environment. I added that after someone on the team ran a deploy with environment = "production" instead of "prod". Everything deployed fine, but every resource got tagged with the wrong environment name, and our monitoring dashboards went blank because they filtered on “prod.” Tiny mistake. Big headache. Validation catches it at plan time, before anything touches AWS.
And here’s the companion file, terraform.tfvars, where actual values live:
aws_region = "ap-south-1"
environment = "dev"
project_name = "byteyogi"
key_pair_name = "my-ssh-key"
instance_type = "t3.micro"
Short. Clean. If you wanted to switch to a production config, you’d create prod.tfvars with different values and run terraform plan -var-file="prod.tfvars". Same code, different inputs. I probably could have figured this pattern out earlier if I’d read the docs more carefully, but honestly, I learn by breaking things. Most of us do, I think.
Building a VPC From Scratch (and Understanding Why)
Right, here’s where it gets real. A VPC — Virtual Private Cloud — is basically your own isolated network inside AWS. Every EC2 instance, every database, every Lambda function (if it needs VPC access) lives inside a VPC. AWS gives you a default one, and for quick experiments, it’s fine. But for anything resembling production? You want your own. Custom CIDR ranges. Public and private subnets. Route tables you control. An internet gateway you explicitly created and can explicitly destroy.
My vpc.tf grew into the largest file in the project. Here’s what I ended up with:
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-${var.environment}-igw"
}
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-${count.index + 1}"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "${var.project_name}-private-${count.index + 1}"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-public-rt"
}
}
resource "aws_route_table_association" "public" {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
Let me unpack what’s happening here, because when I first read HCL code like this, it looked like gibberish. Took me a weekend with the HashiCorp docs and a lot of terraform plan runs before the pieces fell into place.
First, the data block. Terraform has two kinds of blocks: resource (things you’re creating) and data (things that already exist and you’re looking up). Here, we’re asking AWS “which availability zones are available in my region?” In ap-south-1, that gives us ap-south-1a, ap-south-1b, and ap-south-1c. We only use the first two, but the data source figures out the names dynamically. No hard-coding region-specific values.
Then the VPC itself. cidr_block = var.vpc_cidr gives it the 10.0.0.0/16 address range we defined in variables — that’s 65,536 IP addresses. Way more than you’ll need for a beginner project, but CIDR blocks can’t be easily changed after creation (you’d have to destroy and recreate), so I always go bigger than I think I’ll need.
Subnets are where it gets interesting. See that count = 2? Terraform’s way of saying “make two of these.” Each one gets a different CIDR block (calculated by cidrsubnet, which splits the VPC’s range into smaller chunks) and a different availability zone. Public subnets get map_public_ip_on_launch = true, meaning any instance launched into them gets a public IP automatically. Private subnets don’t. That’s the core distinction, and it matters for security: your web servers go in public subnets (they need to be reachable from the internet), while databases and internal services go in private ones (reachable only from within the VPC).
The route table and its association are what actually make a public subnet “public.” Without a route to the internet gateway, a subnet is private by default, regardless of the public IP setting. I missed this detail on my first attempt and spent an embarrassing amount of time on a StackOverflow thread titled “EC2 instance has public IP but can’t reach internet” before realizing I hadn’t created the route table association. Classic.
Locking Down the Network and Launching a Server
With the VPC in place, I needed two more things: security rules (who can talk to my server and on which ports) and an actual server to talk to. In AWS, firewall rules live in security groups, and servers are EC2 instances. Both went into their own files.
Here’s security.tf and compute.tf combined — in my actual project they’re separate files, but the original post shows them together and the logic flows better this way:
# security.tf
resource "aws_security_group" "web" {
name = "${var.project_name}-web-sg"
description = "Allow HTTP, HTTPS, and SSH"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["YOUR_IP/32"] # Replace with your IP
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# compute.tf
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-*-24.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
key_name = var.key_pair_name
subnet_id = aws_subnet.public[0].id
vpc_security_group_ids = [aws_security_group.web.id]
root_block_device {
volume_size = 20
volume_type = "gp3"
encrypted = true
}
user_data = <<-EOF
#!/bin/bash
apt-get update -y
apt-get install -y nginx
systemctl start nginx
systemctl enable nginx
EOF
tags = {
Name = "${var.project_name}-web-server"
}
}
A few things worth calling out, because they tripped me up or surprised me. SSH access. See that cidr_blocks = ["YOUR_IP/32"]? Replace it with your actual public IP. Not 0.0.0.0/0. I know it’s tempting — I’ve done it myself during late-night debugging sessions — but leaving SSH open to the entire internet is an invitation for brute-force attacks. If you’re working from a coworking space in Koramangala or a cafe in Connaught Place, your IP changes. Fine. Update the security group. Takes ten seconds with Terraform.
The AMI data source is clever. Instead of hard-coding an AMI ID (which changes across regions and gets deprecated), we’re asking AWS “give me the latest Ubuntu 24.04 server image from Canonical.” Dynamic. Always fresh. One less thing to manually track.
And then there’s user_data. A bash script that runs once when the instance first boots. Ours installs nginx, starts it, and enables it on boot. When Terraform finishes and gives you a public IP, you can paste it into a browser and see the nginx welcome page. That moment — seeing a real web server you built entirely from code files — is probably when Terraform stops feeling academic and starts feeling powerful.
The root_block_device block specifies a 20 GB gp3 volume with encryption. Default volumes in ap-south-1 are 8 GB gp2, which fills up faster than you’d expect once you start installing packages and logging things. I bumped it to 20 after running out of disk space on a staging server at 3 AM. (Yes, another 3 AM story. DevOps has a way of finding you at the worst hours.)
Getting Output and Running the Whole Thing
By this point, you’ve written a lot of HCL. Six files. Multiple resources wired together. But nothing has happened yet. Everything is just text on disk. Terraform hasn’t touched AWS. I remember staring at my project directory, all these .tf files neatly organized, and thinking: “okay, now what?”
First, let’s define what we want Terraform to tell us after it’s done. Outputs. These go in outputs.tf:
# outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "web_server_public_ip" {
value = aws_instance.web.public_ip
}
output "web_server_public_dns" {
value = aws_instance.web.public_dns
}
After apply finishes, Terraform prints these values. The public IP is the one you’ll paste into your browser. VPC ID and subnet IDs are useful if other Terraform projects or scripts need to reference this infrastructure later.
Now the workflow itself. Here’s the sequence I run every single time, and I do mean every time, even if I’ve only changed one line:
# Initialize Terraform (downloads providers, sets up backend)
terraform init
# Format code consistently
terraform fmt -recursive
# Validate syntax
terraform validate
# Preview changes (ALWAYS do this before apply)
terraform plan -out=tfplan
# Apply the plan
terraform apply tfplan
# View current state
terraform state list
# Destroy everything when done
terraform destroy
terraform init is the first command you run in any new project. It downloads the AWS provider plugin, connects to your S3 backend, and sets up the working directory. Takes a few seconds. You only need to re-run it if you change the backend configuration or add new providers.
terraform fmt reformats your code to match HCL conventions. Consistent indentation, aligned equals signs, sorted arguments. Run it before every commit. Your future self and your teammates will be grateful. I once opened a pull request where the only feedback was “run fmt” — embarrassing, but now it’s muscle memory.
terraform validate checks syntax without touching AWS. Catches typos, missing required arguments, type mismatches. Fast. Free. No reason not to run it.
And then terraform plan. If there’s one command I could tattoo on every DevOps engineer’s forearm, it’s this one. Plan shows you exactly what Terraform will create, modify, or destroy. Green plus signs for new resources. Yellow tildes for modifications. Red minus signs for deletions. You read the plan. You verify it makes sense. And only then do you run apply.
I save plans to a file (-out=tfplan) so that apply executes exactly what I reviewed. Without the saved plan, Terraform recalculates at apply time, and if someone else pushed a change between your plan and apply… well. You might not get what you expected.
When I ran apply for the first time on this project, Terraform created twelve resources in about ninety seconds. VPC. Internet gateway. Two public subnets. Two private subnets. Route table. Two route table associations. Security group. Ubuntu AMI lookup. EC2 instance. Twelve resources, all described in code, all reproducible, all destroyable with a single terraform destroy.
What I’d Do Differently If I Started Over
Looking back, there are things I wish I’d known from the start. Not technical things, really — the docs cover those well enough. More like mindset things.
First: don’t try to Terraform everything on day one. I see people on r/devops asking how to manage their entire AWS organization, all accounts, all regions, IAM policies, CloudWatch alarms, S3 lifecycle rules — everything — in Terraform from the start. That way lies madness. Start with one project. One VPC. One instance. Get comfortable with the workflow. Break some things in dev. Learn what terraform state mv does and why you’ll probably need it eventually. Grow from there.
Second: state is sacred. Treat your state file the way you’d treat a production database backup. Don’t edit it by hand. Don’t delete it. Don’t ignore the lock table. A corrupted state file means Terraform loses track of what it’s managing, and recovering from that involves manual imports or, in the worst case, recreating resources. I’ve been there once. Once was enough.
Third: modules. I didn’t mention them in this walkthrough because I think they’re a second-week concept, not a first-day one. But once you’re comfortable with the basics, you’ll want to package your VPC code, your security group patterns, and your compute setups into reusable modules. Terraform’s module registry has community modules for almost everything. The official AWS VPC module, for instance, handles edge cases I never would have thought of — and it’s battle-tested by thousands of users.
Fourth: terraform import. If you already have AWS resources you created manually (I certainly did), you can import them into Terraform’s state so it starts managing them as code. Hard to say exactly how smooth the process is — it depends on the resource type and complexity. Some imports are painless. Others require you to write the HCL first and then import, which feels backwards but works. I spent a whole Saturday importing a legacy VPC from our staging account, and by the end I had full Terraform coverage of infrastructure that had been click-built over two years.
Fifth, and maybe most important: version control your .tf files the same way you version control application code. Pull requests. Code reviews. CI pipelines that run terraform plan on every PR so reviewers can see what the change will do before approving it. A colleague at an e-commerce company in Delhi told me they caught a misconfigured security group in PR review that would have opened port 3306 (MySQL) to the public internet. Code review saved them from a potential data breach. You can’t code-review a console click.
The Moment It All Connected
About three weeks after my VPC deletion disaster, I had our entire dev environment in Terraform. Seven files. Around 350 lines of HCL. One terraform apply brought up everything we needed: networking, security, compute, DNS entries, even the S3 buckets for static assets. And one terraform destroy tore it all down when we didn’t need it — saving us roughly 8,000 rupees a month on dev resources that used to run 24/7 because nobody wanted to recreate them manually.
My manager — the same one who called me four minutes after the VPC incident — started requesting Terraform for everything. New staging environment for a client demo? Terraform. Disaster recovery setup in a second region? Terraform. Quick proof-of-concept for a product pitch? Terraform up, demo it, Terraform down.
But here’s the thing that’s been gnawing at me, and I don’t have a clean answer for it. Terraform is powerful, but it’s also one more layer of abstraction between you and your infrastructure. When something goes wrong at 3 AM (it’s always 3 AM), you still need to understand what a VPC is, how route tables work, what CIDR notation means, why a security group rule isn’t behaving the way you expected. Terraform won’t debug that for you. It just executes your declarations. Garbage in, garbage out — except the garbage is production infrastructure.
So I wonder. If you’re reading this and you’ve never built a VPC by hand — never clicked through the AWS console subnet by subnet, never fat-fingered a CIDR block and spent an hour figuring out why nothing could talk to anything — should you skip the manual experience entirely and go straight to Terraform? Or is there something valuable in the clicking, the struggling, the deleting-the-wrong-thing-and-rebuilding-it? Does the pain teach you something the abstraction can’t?
I honestly don’t know. I know my mistakes made me better at Terraform, because I understood what the code represented. But maybe you could get there faster, cleaner, without the scars. What do you think — should you learn the console first, or just start with code and never look back?