nvme_driver: admin aer handler should respond with proper failure and status messages when in a failed state (#2658) #2688
+94
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Clean cherry pick of PR #2658
When the
AdminAerHandlerenters a failed state, it stops responding toget_next_aenrequests. As a result, the driver’shandle_asynchronous_eventsloop becomes stuck—no errors are surfaced, the function never fails, and we lose both the original failure context and any indication that AER processing has halted.This PR introduces clearer failure semantics for the
AdminAerHandler. When the handler fails, it now records the most recent error and returns that error for any in‑flight or future AEN requests. This ensures that the handle async event loop receives a definitive failure signal instead of waiting indefinitely.With this change, it becomes the responsibility of
handle_asynchronous_eventsloop to avoid repeatedly issuing AEN requests once a failure is reported. Longer term, this structure sets us up for a more robust AER‑restart strategy—where the handler may fail but can be restarted under specific conditions, with the async event task deciding when a restart is appropriate.Current:

Updated:
