Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure #61010

SameerMesiah97 · 2026-01-24T17:26:03Z

Description

Added best-effort cleanup to EmrCreateJobFlowOperator to terminate EMR clusters when failures occur after successful cluster creation.

In certain failure modes, the operator could previously create a cluster via create_job_flow and then fail during later execution steps (for example, while waiting for completion when DescribeCluster permissions are missing). In these cases, the task failed while leaving the cluster running. The operator now attempts to terminate the created job flow if an exception is raised after creation. Cleanup is best-effort and does not override or mask the original exception.

This change applies the same failure-handling approach recently introduced for EC2CreateInstanceOperator in PR #60904.

Rationale

EmrCreateJobFlowOperator is responsible for provisioning and coordinating an external, stateful service whose lifecycle extends beyond task execution. If the task fails after cluster creation, Airflow can no longer reliably manage or observe the cluster’s state. Adding opportunistic cleanup in these scenarios reduces the risk of orphaned EMR clusters and unexpected infrastructure costs, while preserving existing failure semantics. Cleanup errors are logged and do not affect the task’s final failure state.

Tests

Added a unit test covering failure after cluster creation and verifying that termination is attempted.
Added a unit test ensuring cleanup failures do not mask the original exception.

Backwards Compatibility

No changes to the public API or operator parameters.

Reproduciblity

The failure scenario could not be reproduced directly due to personal AWS account permissions. However, based on the current control flow of EmrCreateJobFlowOperator, it is possible for cluster creation to succeed while a later step fails, leaving the EMR cluster running without cleanup. This change defensively addresses that case. Contributors reading this PR are free to provide a reproduction for the aforementioned failure mode if they can.

Attempt best-effort termination of EMR clusters when failures occur after successful job flow creation. Cleanup does not mask the original exception and aligns EMR behavior with existing EC2 operator semantics.

Add post-create cleanup to EmrCreateJobFlowOperator

9968272

Attempt best-effort termination of EMR clusters when failures occur after successful job flow creation. Cleanup does not mask the original exception and aligns EMR behavior with existing EC2 operator semantics.

SameerMesiah97 requested a review from o-nikolas as a code owner January 24, 2026 17:26

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jan 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure #61010

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure #61010

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure #61010

Are you sure you want to change the base?

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure #61010

Conversation

SameerMesiah97 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading