A
Akash
Guest
For years, Infrastructure as Code (IaC) has been the gold standard in DevOps. We write templates, define states, run plans, and apply changes. Itβs a reliable system, but in todayβs fast-paced world, it can feel rigid. Engineers often spend more time learning domain-specific languages (DSLs) and memorizing commands than actually solving problems.
But what if you could just describe what you want, and an intelligent system took care of the rest? Thatβs the promise of Agentic AI in DevOps, and itβs poised to revolutionize how we manage infrastructure.
Right now, the reality of AI in DevOps is still in its early stages. Most engineers use Generative AI to do little more than generate Terraform modules. The workflow remains the same: run
Some teams have experimented with AI agents in various scenarios, but the results have often fallen short of expectations. This has led to a predictable reaction: a return to the old way of doing things. But hereβs the key: even with fully autonomous agents, we wouldnβt jump straight to one-click infrastructure. For the foreseeable future, weβll still need the "plan and act" cycle, not just for safety, but for certainty. Engineers want to see whatβs about to change before it happens.
The transition from traditional DevOps to fully autonomous AI agents is not a single step; itβs a multi-phase, multi-year journey. Large Language Models (LLMs) are already excellent at Natural Language Processing (NLP) and producing structured outputs. They can take a plain English request and translate it into something machine-readable. Thatβs a real strength, and itβs where the first phase of this journey begins.
However, when it comes to executionβthe "act" phaseβthese models can be unreliable. They might generate a plan and then fail to follow their own instructions, sometimes skipping steps or even hallucinating new ones. You wouldnβt want that kind of behavior in a production environment.
The realistic approach, then, is to lean on LLMs for what they do best: planning, reasoning, and generating structured outputs. We hold back from letting them execute directly until the chances of mistakes are close to zero.
Consider this example: you could give a high-level prompt to a reasoning LLM with a web search tool and hope for the best. But a smarter approach is to craft a low-level prompt with clear instructions and use a cheaper, non-reasoning LLM that excels at generating structured output. By feeding it the exact documentation it needs, you get two major benefits:
Once you have that structured output, you donβt even need the LLM to execute it. A simple Python program can handle the provisioning directly, with no risk of hallucinations in the act phase.
This is where Agentic AI truly shines. Traditional tools like Terraform are rigid. If you want to make a small change, like adding a tag to a resource, Terraform still has to compare every resource in the desired state with every resource in the state file. This can be time-consuming, especially in large environments.
An AI agent, on the other hand, can be more flexible. It can run a quick
Agentic AI also offers a new way of interacting with your infrastructure. Instead of juggling multiple CLIs with different syntaxes, you can use a single, unified interface. You can provide instructions in natural language, YAML, or even an architecture diagram. The agent can then handle everything from provisioning and scaling to querying resources and calculating costs.
This multimodal approach, combined with asynchronous execution, creates a "fire and forget" workflow. You issue a request, and the agent takes care of the rest, notifying you when the task is complete. This frees you up to focus on what really matters: solving problems and delivering value.
The shift to Agentic AI is already happening. Employers are increasingly looking for engineers who can leverage AI in their workflows. Sticking to traditional IaC tools alone could put you at risk of falling behind. By embracing AI-native DevOps, you can reduce cognitive load, work smarter, and become a more valuable asset to your team.
The future of DevOps is not about replacing engineers with AI; itβs about empowering them. By integrating Agentic AI into your workflows, you can automate the tedious tasks and focus on the high-level, strategic work that truly makes a difference.
This is just the beginning of a larger journey. In upcoming posts, weβll dive deeper into how you can start building real-world DevOps workflows powered by AI and what that means for the future of engineering. Stay tuned.
Continue reading...
But what if you could just describe what you want, and an intelligent system took care of the rest? Thatβs the promise of Agentic AI in DevOps, and itβs poised to revolutionize how we manage infrastructure.
The Current State of AI in DevOps
Right now, the reality of AI in DevOps is still in its early stages. Most engineers use Generative AI to do little more than generate Terraform modules. The workflow remains the same: run
terraform plan
, then terraform apply
, and the resources are created. Itβs a helpful starting point, but itβs far from the fully autonomous systems we envision.Some teams have experimented with AI agents in various scenarios, but the results have often fallen short of expectations. This has led to a predictable reaction: a return to the old way of doing things. But hereβs the key: even with fully autonomous agents, we wouldnβt jump straight to one-click infrastructure. For the foreseeable future, weβll still need the "plan and act" cycle, not just for safety, but for certainty. Engineers want to see whatβs about to change before it happens.
A Realistic Approach to Agentic AI
The transition from traditional DevOps to fully autonomous AI agents is not a single step; itβs a multi-phase, multi-year journey. Large Language Models (LLMs) are already excellent at Natural Language Processing (NLP) and producing structured outputs. They can take a plain English request and translate it into something machine-readable. Thatβs a real strength, and itβs where the first phase of this journey begins.
However, when it comes to executionβthe "act" phaseβthese models can be unreliable. They might generate a plan and then fail to follow their own instructions, sometimes skipping steps or even hallucinating new ones. You wouldnβt want that kind of behavior in a production environment.
The realistic approach, then, is to lean on LLMs for what they do best: planning, reasoning, and generating structured outputs. We hold back from letting them execute directly until the chances of mistakes are close to zero.
A Smarter, More Efficient Workflow
Consider this example: you could give a high-level prompt to a reasoning LLM with a web search tool and hope for the best. But a smarter approach is to craft a low-level prompt with clear instructions and use a cheaper, non-reasoning LLM that excels at generating structured output. By feeding it the exact documentation it needs, you get two major benefits:
- Lower Costs: Youβre not burning expensive reasoning cycles.
- Faster, More Accurate Responses: Youβre not relying on unpredictable search results.
Once you have that structured output, you donβt even need the LLM to execute it. A simple Python program can handle the provisioning directly, with no risk of hallucinations in the act phase.
Overcoming the Rigidity of IaC
This is where Agentic AI truly shines. Traditional tools like Terraform are rigid. If you want to make a small change, like adding a tag to a resource, Terraform still has to compare every resource in the desired state with every resource in the state file. This can be time-consuming, especially in large environments.
An AI agent, on the other hand, can be more flexible. It can run a quick
git diff
to see whatβs changed and focus only on the delta. You can even give it a simple prompt like, "Add 'tag1' to Resource 1," and it will update the desired state and apply the change directly to the target resource, without the overhead of a full state comparison.A New Paradigm for DevOps
Agentic AI also offers a new way of interacting with your infrastructure. Instead of juggling multiple CLIs with different syntaxes, you can use a single, unified interface. You can provide instructions in natural language, YAML, or even an architecture diagram. The agent can then handle everything from provisioning and scaling to querying resources and calculating costs.
This multimodal approach, combined with asynchronous execution, creates a "fire and forget" workflow. You issue a request, and the agent takes care of the rest, notifying you when the task is complete. This frees you up to focus on what really matters: solving problems and delivering value.
Staying Relevant in the AI Era
The shift to Agentic AI is already happening. Employers are increasingly looking for engineers who can leverage AI in their workflows. Sticking to traditional IaC tools alone could put you at risk of falling behind. By embracing AI-native DevOps, you can reduce cognitive load, work smarter, and become a more valuable asset to your team.
The future of DevOps is not about replacing engineers with AI; itβs about empowering them. By integrating Agentic AI into your workflows, you can automate the tedious tasks and focus on the high-level, strategic work that truly makes a difference.
This is just the beginning of a larger journey. In upcoming posts, weβll dive deeper into how you can start building real-world DevOps workflows powered by AI and what that means for the future of engineering. Stay tuned.
Continue reading...