Workflows not triggering
Incident Report for Alloy
Postmortem

Post Mortem details for the Major Incident affecting Workflows on 08/04/2024.

Incident summary: Customer Workflows failed to trigger due to the Task Executor failing.

Root cause: The problem started on Saturday 6th April. The Task Executer failed due to high frequency layer build errors. By Sunday this had escalated to 100% memory usage, which then caused some tasks to fail.

Resolution and recovery: We identified the root cause and resolved this via an emergency patch on Monday 8th April. All Workflows that did not trigger were then requeued for processing. We have a fix being worked on to provide a permanent fix, however the patch on Monday fully resolved the MI.

Corrective and preventative measures: We have created additional alerts that will notify us if this issue occurs again; this will allow us to resolve any memory problems quickly, before the threshold is breached and tasks start to fail.

Posted Apr 11, 2024 - 18:16 UTC

Resolved
An issue regarding workflows not triggering has been resolved as of 09:45 this morning. All workflows that did not trigger have been requeued and are currently being processed. We will add the full Post Mortem within 5 working days
Posted Apr 08, 2024 - 09:30 UTC
This incident affected: UK (Task Executor).