← Back to tools

Infrastructure as Code

Ansible Documentation

Ansible automates configuration and provisioning over SSH.

Level: Intermediate

What Is Ansible?

Ansible is a intermediate-level DevOps tool used to manage specific parts of software delivery and operations. It helps teams standardize workflows and reduce manual effort.

Why We Use It

Teams use Ansible to improve speed, reliability, and consistency. It reduces repetitive manual work, lowers failure risk, and makes collaboration easier across development and operations.

Where It Fits In DevOps

It converts infrastructure changes into versioned code, making cloud operations safer, repeatable, and auditable.

From Beginner To End-to-End

1. Foundations

Start with core Ansible concepts and basic setup so you can use it safely in day-to-day work.

- Understand Ansible fundamentals

- Set up local/dev environment

- Run first working example

2. Team Workflow

Integrate Ansible into real team practices with repeatable conventions and collaboration patterns.

- Adopt standards and naming conventions

- Integrate with repositories and CI/CD

- Create reusable templates

3. Production Operations

Use Ansible in production with observability, security, and rollback plans.

- Monitor behavior and failures

- Secure access and secrets

- Define incident and rollback flow

4. Scale and Optimization

Continuously improve reliability, performance, and cost while standardizing usage across services.

- Improve performance and cost

- Automate compliance checks

- Document best practices for the team

Key Concepts

- Inventory

- Playbooks

- Roles

Learning Path

- Inventory design

- Playbook authoring

- Role-based reuse

Real Use Cases

- Provisioning infrastructure

- Configuring multi-environment stacks

- Automated change management

Beginner Learning Plan

- Read the Ansible basics and terminology

- Run at least one hands-on mini project

- Break and fix a small setup to build confidence

- Document your first repeatable workflow

Advanced / Production Plan

- Integrate Ansible with your full delivery pipeline

- Add security and policy checks

- Add observability and incident playbooks

- Define reusable standards for multiple services

Common Mistakes

- Using defaults in production without security hardening

- Skipping monitoring and post-deployment validation

- No rollback strategy for failed changes

- Over-complex setup before mastering fundamentals

Production Readiness Checklist

- Access control and least privilege applied

- Secrets managed securely

- Monitoring and alerting enabled

- Rollback and recovery process tested

- Documentation updated for team onboarding

Installation Guide

Install Ansible on host with practical commands and verification steps.

Install Ansible

sudo apt update && sudo apt install -y ansible

Create inventory and ping

printf '[local]\nlocalhost ansible_connection=local\n' > hosts.ini
ansible -i hosts.ini all -m ping

Verify install

ansible --version

Quick Start

Ping hosts

ansible all -m ping

Run playbook

ansible-playbook site.yml

Check syntax

ansible-playbook --syntax-check site.yml

Common Commands

Simple command list with short descriptions.

ansible --version

Show Ansible version.

ansible all -m ping

Connectivity check via ping module.

ansible all -a 'uptime'

Run ad-hoc command on all hosts.

ansible web -a 'df -h'

Run command on group `web`.

ansible-inventory -i inventory.ini --list

Print parsed inventory.

ansible-inventory -i inventory.ini --graph

Show inventory graph.

ansible-playbook site.yml

Run a playbook.

ansible-playbook site.yml --check

Dry run playbook changes.

ansible-playbook site.yml --diff

Show diffs for changed templates/files.

ansible-playbook site.yml --limit web

Run playbook on a group.

ansible-playbook site.yml -t nginx

Run only specific tags.

ansible-playbook site.yml --start-at-task "Install packages"

Resume from a task.

ansible-doc copy

Show module docs.

ansible-galaxy collection list

List installed collections.

ansible-galaxy collection install community.general

Install a collection.

ansible-galaxy init role_name

Create a new role structure.

Reference

Official documentation:

https://docs.ansible.com/

Complete Guide

A full, structured guide for this tool (with commands, diagrams, best practices, and learning path).

Ansible

A complete DevOpsLabX guide for Ansible: what it is, why we use it, key concepts, commands, best practices, and how to learn it.

At A Glance

  • Category: Infrastructure as Code
  • Difficulty: Intermediate
  • Outcome: learn the fundamentals, then build real workflows, then make it production-ready

Prerequisites

  • Basic cloud concepts (networking, compute, IAM)
  • Git basics for versioning infrastructure
  • Basic CLI usage and environment variables

Glossary

  • Declarative: You define desired state; tooling makes it real.
  • Drift: Real infra differs from the code/state.
  • Idempotent: Running again yields same result (no random changes).
  • Remote state: Shared state storage with locking.
  • Promotion: Moving changes from dev to stage to prod.

Overview

Ansible automates configuration and provisioning over SSH.

Architecture Diagram

A real, visual mental model of how Ansible fits into a typical workflow.

Ansible Workflow

IaC CoderepoValidatefmt + lintPlandiff previewStatelock + driftApplyprotectedCloudresources

This diagram is a practical mental model, not vendor-specific.

Reference Architecture (Production)

A production-oriented view: guardrails, checks, and the parts that matter when it breaks.

Production Reference Flow

IaC CoderepoValidatefmt + lintPlandiff previewStatelock + driftApplyprotectedCloudresources

This diagram is a practical mental model, not vendor-specific.

Key Concepts

  • Inventory
  • Playbooks
  • Roles

Concept Deep Dive

Inventory

Inventory is a core idea you’ll use repeatedly while working with Ansible.

Why it matters: Understanding Inventory helps you design safer workflows and troubleshoot issues faster.

Practice:

  • Explain Inventory in your own words (1 minute rule).
  • Find where Inventory appears in real docs/configs for Ansible.
  • Create a small example that uses Inventory, then break it and fix it.

Playbooks

Playbooks is a core idea you’ll use repeatedly while working with Ansible.

Why it matters: Understanding Playbooks helps you design safer workflows and troubleshoot issues faster.

Practice:

  • Explain Playbooks in your own words (1 minute rule).
  • Find where Playbooks appears in real docs/configs for Ansible.
  • Create a small example that uses Playbooks, then break it and fix it.

Roles

Roles is a core idea you’ll use repeatedly while working with Ansible.

Why it matters: Understanding Roles helps you design safer workflows and troubleshoot issues faster.

Practice:

  • Explain Roles in your own words (1 minute rule).
  • Find where Roles appears in real docs/configs for Ansible.
  • Create a small example that uses Roles, then break it and fix it.

Core Workflow

1. Foundations

Start with core Ansible concepts and basic setup so you can use it safely in day-to-day work.

Goals:

  • Understand Ansible fundamentals
  • Set up local/dev environment
  • Run first working example

2. Team Workflow

Integrate Ansible into real team practices with repeatable conventions and collaboration patterns.

Goals:

  • Adopt standards and naming conventions
  • Integrate with repositories and CI/CD
  • Create reusable templates

3. Production Operations

Use Ansible in production with observability, security, and rollback plans.

Goals:

  • Monitor behavior and failures
  • Secure access and secrets
  • Define incident and rollback flow

4. Scale and Optimization

Continuously improve reliability, performance, and cost while standardizing usage across services.

Goals:

  • Improve performance and cost
  • Automate compliance checks
  • Document best practices for the team

Quick Start

  1. Ping hosts
ansible all -m ping
  1. Run playbook
ansible-playbook site.yml
  1. Check syntax
ansible-playbook --syntax-check site.yml

Tutorial Series

A tutorial-style sequence (like a handbook). Do these in order to build skill from beginner to production.

Tutorial 1: Your First Plan and Apply

Goal: Create one small resource and learn how drift works.

Steps:

  1. Verify you understand what the tool does and what problem it solves.
  2. Install or enable it on your machine (or in a sandbox environment).
  3. Run the smallest working example and write down what happened.
  4. Create a minimal configuration, run validate and plan.
  5. Apply, then destroy to clean up.

Checkpoints:

  • You understand plan vs apply
  • You keep state safe and versioned

Exercises:

  • Introduce a change and preview it with plan
  • Write a naming convention for resources

Tutorial 2: Modules and Environments

Goal: Structure code so it scales across dev/stage/prod.

Steps:

  1. Extract repeated code into a module.
  2. Parameterize variables for each environment.

Checkpoints:

  • You can reuse code safely
  • You can promote changes across environments

Exercises:

  • Set up remote state + locking
  • Write a runbook for state recovery

Command Cheatsheet

  • ansible --version: Show Ansible version.
  • ansible all -m ping: Connectivity check via ping module.
  • ansible all -a 'uptime': Run ad-hoc command on all hosts.
  • ansible web -a 'df -h': Run command on group web.
  • ansible-inventory -i inventory.ini --list: Print parsed inventory.
  • ansible-inventory -i inventory.ini --graph: Show inventory graph.
  • ansible-playbook site.yml: Run a playbook.
  • ansible-playbook site.yml --check: Dry run playbook changes.
  • ansible-playbook site.yml --diff: Show diffs for changed templates/files.
  • ansible-playbook site.yml --limit web: Run playbook on a group.
  • ansible-playbook site.yml -t nginx: Run only specific tags.
  • ansible-playbook site.yml --start-at-task "Install packages": Resume from a task.
  • ansible-doc copy: Show module docs.
  • ansible-galaxy collection list: List installed collections.
  • ansible-galaxy collection install community.general: Install a collection.
  • ansible-galaxy init role_name: Create a new role structure.

Learning Path

  • Inventory design
  • Playbook authoring
  • Role-based reuse

Beginner To Advanced Path

Beginner Path (Foundations)

What to learn:

  • Learn Ansible terminology and the “why” behind it
  • Install/setup and run a first working example
  • Understand the main components and the default workflow
  • Learn safe debugging: where to look when something fails
  • Build a small checklist for your own repeatable setup
  • Write notes (commands, errors, fixes) while learning

Hands-on labs:

  • Follow a hello-world style tutorial and document every step
  • Break one config intentionally and fix it (learn error patterns)
  • Write a 10-command cheat sheet you can reuse later
  • Create a simple diagram of the tool’s flow in your own words

Milestones:

  • You can explain the tool in 2 minutes
  • You can reproduce a working setup from scratch
  • You can troubleshoot the top 3 common failures
  • You can share a clean quick-start with someone else

Intermediate Path (Real Workflows)

What to learn:

  • Use the tool inside a realistic DevOps workflow
  • Create reusable templates/configs and standard naming conventions
  • Add security basics: secrets handling and least privilege
  • Reduce toil: automate repeated steps and build confidence
  • Make the workflow faster and safer (cache, validations, checks)
  • Document the workflow as if onboarding a new teammate

Hands-on labs:

  • Integrate it with a CI pipeline (lint/build/test/deploy style flow)
  • Parameterize config for dev/stage/prod environments
  • Create a runbook: steps to validate and roll back a change
  • Add a preflight validation step that blocks unsafe changes

Milestones:

  • You can onboard another person with your docs
  • You can run the tool consistently across environments
  • You can explain tradeoffs (speed vs safety, flexibility vs complexity)
  • You can debug failures using logs/outputs without guesswork

Advanced Path (Production & Scale)

What to learn:

  • Operate the tool safely in production with guardrails
  • Add observability: metrics/logs/traces and meaningful alerts
  • Optimize performance/cost and standardize across multiple services
  • Design failure modes and recovery (rollback, restore, incident flow)
  • Create upgrade strategy and test it (versioning, compatibility)
  • Create ownership: docs, alerts, dashboards, and operational SLAs

Hands-on labs:

  • Add policy checks (security scans, approvals, protected environments)
  • Load test or scale test the workflow and measure bottlenecks
  • Create an incident simulation and write a postmortem template
  • Automate audits: drift checks, compliance checks, and reports

Milestones:

  • You can detect failures quickly and recover safely
  • You can maintain the setup long-term (upgrade strategy, docs, ownership)
  • You can explain architecture decisions and alternatives
  • You can standardize patterns across multiple services/teams

Hands-On Labs

Beginner Labs

  • Install/setup and verify version
  • Run the smallest working example
  • Change one parameter and observe the behavior
  • Cause a safe failure and document the fix

Intermediate Labs

  • Integrate into a realistic workflow (pipeline, deploy, or automation)
  • Parameterize configuration for two environments
  • Add validation and rollback steps
  • Write a runbook (steps + commands) for common failures

Advanced Labs

  • Add guardrails (policy checks, approvals, least privilege)
  • Add observability and meaningful alerts
  • Load/scale test and identify bottlenecks
  • Create an upgrade + rollback plan and test it

Advanced Topics

  • Module design patterns and versioning strategy
  • Remote state backends, locking, and state recovery
  • Policy as code and guardrails (deny unsafe changes)
  • Drift detection and remediation around Ansible
  • Change management: phased rollouts and blast-radius control

Production Patterns

  • Remote state with locking + drift detection
  • Small PRs with plan output attached
  • Environment separation (dev/stage/prod) and promotions
  • Least privilege credentials for Ansible automation
  • Change windows and rollback/restore planning for risky infra changes

Real-World Scenarios

  • Use Ansible to provision cloud infrastructure with reviewable code and safe plans.
  • Prevent outages with small, reversible changes and remote state locking.
  • Detect and fix drift between code and the real environment.

Troubleshooting

  • Reproduce the issue with the smallest possible example
  • Check logs/output first, then configuration, then permissions/credentials
  • Validate inputs (versions, environment variables, file paths, network access)
  • Rollback to last known-good state if production is affected
  • Write down the root cause and add a guardrail so it does not repeat

Runbook Templates

Use these templates to make your docs feel like real production documentation.

Deploy Runbook

  • Purpose
  • Preconditions (secrets, access, approvals)
  • Steps to deploy (exact commands)
  • Post-deploy verification (health checks)
  • Rollback steps
  • Owner and escalation

Incident Triage Runbook

  • Impact assessment (who is impacted?)
  • Current signals (errors, latency, saturation)
  • Recent changes (deploys, config, infra)
  • First checks (logs, health endpoints, dependencies)
  • Mitigation steps (rate limiting, rollback, scale)
  • Follow-up actions (postmortem, guardrails)

Checklist (Copy/Paste)

  • What changed since it last worked?
  • What do logs say at the exact failure time?
  • Is the service reachable on the expected port and DNS?
  • Are credentials/permissions valid?
  • Is disk full, memory exhausted, or CPU pegged?
  • Do we have a safe rollback plan and is it tested?

Security & Best Practices

  • Never hardcode secrets in code or commits
  • Use least privilege (roles, scopes, minimal permissions)
  • Prefer reproducible builds/configs over manual steps
  • Add validations before applying changes (lint/validate/plan/dry-run)
  • Keep documentation and runbooks updated
  • Version pin critical dependencies and plan upgrades

Common Error Patterns

Symptom

Plan shows unexpected changes every run

Likely cause: Drift, unstable values, or computed attributes

Fix steps:

  • Stop manual changes or reflect them in IaC
  • Use remote state with locking
  • Review provider docs for computed attributes and lifecycle settings

Symptom

Apply fails mid-way leaving partial resources

Likely cause: Quota limits, ordering issues, or transient API failures

Fix steps:

  • Re-run after fixing the root cause (idempotent design)
  • Split changes into smaller batches
  • Add explicit dependencies when required

FAQ

What is Ansible used for?

Ansible is used to standardize and automate parts of delivery and operations so teams can ship faster and more reliably.

How long does it take to learn Ansible?

You can get productive in days with fundamentals, but production mastery comes from building workflows, debugging failures, and operating it over time.

What should I learn before Ansible?

Learn basic Linux + Git first, then follow the prerequisites section. Fundamentals make every advanced topic easier.

How do I use Ansible safely in production?

Add guardrails: least privilege, validation before apply/deploy, monitoring, and a tested rollback plan.

Common Mistakes

  • Using defaults in production without security hardening
  • Skipping monitoring and post-deployment validation
  • No rollback strategy for failed changes
  • Over-complex setup before mastering fundamentals

Production Readiness Checklist

  • Access control and least privilege applied
  • Secrets managed securely
  • Monitoring and alerting enabled
  • Rollback and recovery process tested
  • Documentation updated for team onboarding

Mini Projects

  • Build a small project that uses Ansible in a realistic workflow
  • Write a checklist for production usage
  • Create a troubleshooting runbook for common failures
  • Create a one-page internal doc: setup, usage, debugging, rollback

Interview Questions

  • Explain what Ansible is and where it fits in DevOps.
  • Describe a real problem you solved using Ansible.
  • What can go wrong in production, and how do you detect and recover?
  • What is state, and why do we use remote state + locking?
  • How do you handle drift and manual changes?
  • How do you structure modules and environments?

References

Extended Documentation

Extra long-form notes for Ansible. This loads on demand so the page stays fast.