Terraforming Datadog Workflows
Terraform recently added the datadog_workflow_automation resource to its Datadog registry. This new resource enables teams to provision and manage Datadog workflows using Terraform, giving teams the ease and flexibility of infrastructure-as-code for observability automation.
At Runa, we have found workflows to be especially useful in operations management, alert optimization, and incident flow automation. However, managing these workflows solely via the Datadog UI quickly becomes difficult. Changes are hard to track, reviews are not centralized, and small edits can easily get lost. Combined with the challenge of maintaining workflows that serve multiple teams with different priorities, these gaps become painful and error-prone.
This why I have spent the past few months migrating two of our existing operations management workflows to Terraform. In this post, I will take you through the key technical challenges, the solutions, and the pitfalls to avoid when adopting Terraform for Datadog workflows.
Key Technical Challenges and Solutions
Complex JSON Structure Management
The most significant challenge was managing the complex JSON structure required for the spec_json parameter. Datadog workflows have intricate step definitions with nested parameters, outbound edges, and display configurations.
Solution: Export the workflow from the Datadog console as Terraform code.
The Datadog console provides an “Export” option for existing workflows that outputs a complete Terraform representation of the workflow. This gives you the exact JSON format that Datadog expects, minimizing trial-and-error. From there, you can refactor the spec_json using jsonencode() and Terraform locals to make it more modular and maintainable.
Steps to export:
- Go to your workflow in the Datadog console.
- Click on the
Exportoption in the menu bar. - Select the
Terraformoption. - Copy the generated
spec_jsoncontent.
Dynamic Step References and Dependencies
One of the trickiest aspects was handling dynamic references between steps that are conditionally included. For example, one workflow includes a call to another workflow only when a specific criterion is met, which means that the outbound edges (the “links” between steps) must reference steps that may or may not exist.
Solution: Use Terraform locals to define conditional outbound edge maps. This approach allows you to model branching logic without duplicating JSON structures.
locals { check_monitor_outbound_edges = var.enable_notifications ? [ { "nextStepName" : "SendSlackNotification", "branchName" : "true" } ] : []}
---
{ "name" : "CheckMonitorStatus", "actionId" : "com.datadoghq.core.if", "parameters" : [ ... ], "outboundEdges" : local.check_monitor_outbound_edges, }
AWS Services Integration and Connection Management
Both workflows integrate with multiple AWS services, which require authentication. On the Datadog console, AWS connection is easily managed by creating an appropriate connection for each environment. However, Terraform is environment agnostic, so you must supply the AWS connection for the correct environment being deployed.
Solution: Maintain a mapping of pre-existing environment-specific AWS connections:
locals { aws_connection_map = { "sandbox" = { connection_id : "12345", label : "SAMPLE_CONNECTION_1" } "dev" = { connection_id : "67890", label : "SAMPLE_CONNECTION_2" } }}
Dynamic workflow naming
Not a challenge, but worth mentioning to avoid all provisioned workflows having the same name.
Solution: Use Terraform variables to parameterize workflow names
resource "datadog_workflow_automation" "my_workflow" { name = "My-workflow-${var.team_name}-${var.env}"}
JavaScript Code in Workflow Steps
Many workflow steps include JavaScript code for data transformation. Managing this code within Terraform strings can quickly become messy and hard to read.
Best Practices:
- Write and test your code directly in the Datadog workflow JavaScript step editor or any IDE.
- Maintain proper indentation and commenting.
- Export the working code as JSON from Datadog or IDE once validated.
{ "name" : "ParseOutput", "actionId" : "com.datadoghq.datatransformation.func", "parameters" : [ { "name" : "script", "value" : "// parse JSON string \nlet parsedFaults = $.Steps.DescribeOutput;\nlet trigger_monitor_url = $.Source.monitor.url;\n\n ... // rest of logic here" } ]}
Complete example: Putting It All Together
Here’s a sample workflow that demonstrates the patterns discussed above in a complete workflow.
locals { # Create conditional notification steps notification_steps = var.enable_notifications ? [ { "name" : "SendSlackNotification", "actionId" : "com.datadoghq.slack.send_simple_message", "parameters" : [ { "name" : "teamId", "value" : "MYSAMPLETEAM" }, { "name" : "channel", "value" : "#alerts" }, { "name" : "text", "value" : "Alert triggered: " } ], "display" : { "bounds" : { "x" : 0, "y" : 432 } } } ] : []
# Create conditional outbound edges # If notifications are enabled, route to notification step; otherwise, workflow ends check_monitor_outbound_edges = var.enable_notifications ? [ { "nextStepName" : "SendSlackNotification", "branchName" : "true" } ] : []
sample_workflow_name = "Sample-workflow-${var.team_name}" aws_connection_map = { # AWS connection IDs for different environments. Already created on datadog console "sandbox" = { connection_id : "abcd", label : "MY_SAMPLE_AWS_CONNECTION_1" } "dev" = { connection_id : "abcdefg", label : "MY_SAMPLE_AWS_CONNECTION_2" } } aws_connection_id = local.aws_connection_map["sandbox"].connection_id aws_connection_label = local.aws_connection_map["sandbox"].label}
variable "enable_notifications" { type = bool description = "Enable Slack notifications in the workflow" default = false}
resource "datadog_workflow_automation" "sample_workflow" { name = local.sample_workflow_name description = "Sample workflow demonstrating Terraform patterns for Datadog workflow automation" tags = var.workflow_tags published = true
spec_json = jsonencode( { "triggers" : [ { "startStepNames" : [ "CheckMonitorStatus" ], "monitorTrigger" : {} } ], "steps" : concat([ { "name" : "CheckMonitorStatus", "actionId" : "com.datadoghq.core.if", "parameters" : [ { "name" : "joinOperator", "value" : "or" }, { "name" : "conditions", "value" : [ { "comparisonOperator" : "eq", "leftValue" : "", "rightValue" : "Alert" }, { "comparisonOperator" : "eq", "leftValue" : "", "rightValue" : "Warn" } ] } ], "outboundEdges" : local.check_monitor_outbound_edges, "display" : { "bounds" : { "x" : 0, "y" : 0 } } } ], local.notification_steps), "handle" : local.sample_workflow_name, "connectionEnvs" : [ { "env" : "default", "connections" : [ { "connectionId" : local.aws_connection_id, "label" : local.aws_connection_label } ] } ], "inputSchema" : { "parameters" : [ { "name" : "MonitorURL", "type" : "STRING", "defaultValue" : "sample_workflow" } ] } } )}
Critical Pitfalls to Avoid
1. Permissions & Ownership
Pitfall: Workflows created via Terraform are owned by Terraform, not by individual users. Standard-role users cannot publish or edit them directly. While this is not necessarily a problem, it could become a blocker in some situations.
Impact: Teams may be unable to manage workflows after they’re deployed.
Solution: Have an admin grant users permission to publish/unpublish workflows.
2. Step Function Limitations
Pitfall: Datadog workflows currently have limited capabilities with Express Step Functions.
Impact: Workflow may fail on step functions-related steps.
Solution: Prefer Standard Step Functions and document this limitation for your team.
3. Datastore Management
Pitfall: Datastores cannot be provisioned via Terraform, and workflows need explicit permissions to access them.
Impact: Workflow may fail due to missing datastore access.
Solution:
- Create your datastore in the Datadog console and reference it by ID in Terraform.
- Grant Terraform a Manager role in the datastore:
- Go to the Datastore page in Datadog.
- Click the
Settingsicon. - Select
Edit Permissions - Add Terraform with Manager role
4. AWS Connection Management
Pitfall: AWS connections are environment-specific and must be pre-created in Datadog.
Impact: Workflow will fail if connections are missing or incorrect.
Solution:
- Create AWS connections for each environment in the Datadog console and reference them by ID in Terraform.
- Click
Actionsin the Datadog sidebar. - Select
Connections - Click on the
+ New Connectionbutton at the top of the connections page. - Select
AWSfrom the list of possible integrations. - Complete the form, then copy the generated IAM policy statement (the statement only shows at creation of the connection).
- In the AWS console, create a new role and attach the generated policy statement to it.
- Click
Benefits Achieved
- Team Autonomy: Each team can own and modify their workflow variations independently.
- Environment Consistency: Identical workflows can be deployed across environments with environment-specific configurations.
- Infrastructure as Code: Workflows are now part of the infrastructure, enabling automated deployments and rollbacks.
- Documentation: The Terraform code serves as living documentation of the workflow logic.
Terraform support for Datadog workflows is a game-changer. It brings version control, consistency, and automation to what used to be a manual, UI-driven process. While the JSON structures can be complicated to set up, the long-term benefits outweigh the initial setup cost.
Further Reading
For more details on Datadog workflows, here are some resources: