Back to Blog

3 minutes read

Terraform state: Divide and conquer

Milan Ogrizović

Site Reliability Engineer

When setting up Terraform for managing cloud infrastructure, many teams have one big Terraform state per environment. While it may seem like the simplest approach, things can quickly become messy as more teams start contributing.

  • Engineers have to wait in line to apply their changes. If someone is updating networking, another engineer changing an AKS setting has to wait.
  • Small mistakes can take down critical resources because everything is in one big state file.
  • Terraform plan and apply times become painfully slow as the infrastructure grows.
  • Developers can’t own their infra since everything is tangled together.

So, you might want to break the Terraform state into smaller, independent states. It isn’t an overnight fix, but once you get it working, it is a game-changer.

Why we moved to multi-state Terraform

1. No more waiting to apply changes

With one big state file, Terraform could only lock one Apply at a time, meaning engineers had to take turns. If networking updates were in progress, anyone trying to deploy a new storage account had to wait.

By splitting the state into separate parts – such as networking, shared infra, application infra, and monitoring – the issue was resolved. Now, multiple teams can deploy in parallel without stepping on each other’s toes.

2. Breaking stuff is less scary

When everything is in one state file, even the smallest mistake can have big consequences. One wrong terraform apply, and suddenly your production database is gone.

Now that we have smaller, isolated states, a mistake in one area won’t affect everything else. If someone misconfigures AKS, it won’t accidentally delete networking resources or monitoring setups.

3. Letting developers own their infra

We wanted to shift left and enable developers to provision their own resources. However, Terraform can be tricky, and we didn’t want every team reinventing the wheel.

The solution? Pre-built Terraform modules for common use cases. 

Some examples include:

  • A standard AKS module with best practices baked in.
  • A Postgres module that handles backups and security by default.
  • A storage account module that enforces encryption and access policies.

This approach allowed developers to spin up infrastructure without needing deep Terraform knowledge. They just use the modules, tweak some configurations, and deploy.

4. Automating CI/CD for Terraform

Managing multiple states manually would have been a nightmare, so we automated the whole process with CI/CD. Our pipeline:

  • Enforces best practices – pipeline templates set up the right backend and state locking automatically.
  • Adds approval steps for higher environments, so production changes get reviewed.
  • Uses Azure Blob Storage for state locking, so we don’t have to worry about state conflicts.

For dev environments, we kept things flexible. Developers can experiment freely, as the smaller states mean the blast radius is small.

Lessons learned

1. Version your modules, or you’ll regret it

When multiple states rely on the same Terraform module, versioning becomes critical. We learned this the hard way – someone updated a shared module, and suddenly, half of our infrastructure was broken.

Now, we strictly version all modules:

  • Only use tagged versions (e.g., v1.2.3), never main.
  • New versions are tested in lower environments before being deployed to staging or production.

2. Automate bootstrapping new states

Manually setting up a new Terraform state is tedious. We automated the process, so spinning up a new state is seamless:

  • Creates the Azure Blob Storage backend for state storage.
  • Creates a Resource Group.
  • Sets up IAM permissions (service principal or managed identities).
  • Creates the Azure DevOps Pipelines configuration.

Now, setting up a new state is quick and painless.

3. Cross-state dependencies need a strategy

Splitting state comes with a downside –  how do you handle dependencies? For example, the application infra state needs to reference networking resources from the networking state.

Our fix? Terraform remote state:

data "terraform_remote_state" "networking" {
  backend = "azurerm"
  config = {
    storage_account_name = "terraformstate"
    container_name       = "networking"
    key                 = "networking.tfstate"
  }
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks-cluster"
  resource_group_name = "rg-aks"
  location            = "East US"
  dns_prefix          = "myaks"

  default_node_pool {
    name           = "agentpool"
    node_count     = 3
    vm_size        = "Standard_DS2_v2"
    vnet_subnet_id = data.terraform_remote_state.networking.outputs.aks_subnet_id
  }

  #…
}

This lets us dynamically reference values from other states without hardcoding them.

Final thoughts

Splitting Terraform state made a huge difference for us. Now, we can:

  • Deploy changes in parallel without waiting.
  • Reduce risk – mistakes only affect a small part of the infra.
  • Empower developers to manage their own resources.
  • Standardize best practices through automated CI/CD and pre-built modules.

But this approach isn’t perfect – you need good module versioning, automation for bootstrapping, and a solid strategy for cross-state dependencies.

If you’re dealing with Terraform bottlenecks, breaking up state might be worth it.

ProductDock | Milan Ogrizović

Milan Ogrizović

Site Reliability Engineer

Milan is a software engineer with over eight years of experience in the IT industry, transitioning into an SRE/Cloud Engineer role over the past three years, and has a background in backend development. He specializes in cloud infrastructure, Kubernetes, Terraform, and CI/CD, focusing on building scalable, resilient, and developer-friendly platforms. Passionate about DevOps culture and observability, he believes good infrastructure should be invisible when it works right.


Related posts.