Transcoder service: Jobs stalling
Incident Report for Xvid MediaHub
Resolved
This incident has been resolved.
Posted Nov 30, 2021 - 12:46 EET
Monitoring
The situation has stabilized and performance is close to normal with the changes we applied. We continue to closely monitor the service.
Posted Nov 30, 2021 - 08:54 EET
Update
Our upstream provider informed us that they suffered multiple simultaneous fiber cuts affecting the routes Washington Paris, Washington Newark and Washington Chicago. No ETA yet for the repair.

We continue making configuration changes to avoid data transfers on the impacted routes in order to enhance processing speeds.
Posted Nov 30, 2021 - 06:48 EET
Identified
We are seeing very slow and unstable network transfers between Europe and North America apparently due to an issue in the network backbone of our cloud provider. Especially the transfers from Amazon S3 us-east-1 region to Europe are terribly slow with below 20 KB/s. These slow transfers caused tasks to hang for a long time and were consequently blocking other tasks from getting a processing slot, caused overally decreased performance.

We have reported the issue to our upstream provider and made configuration changes to minimize USEU network transfers for now. Jobs are progressing faster again now and the backlog has been mostly processed by now.
Posted Nov 30, 2021 - 06:26 EET
Update
We are still investigating the issue. We are seeing network transfers randomly stalling forever and this blocks tasks on the queue from progressing. We have been able to successfully process about half of the backlog on the queue but we haven't identified the root cause for the problem yet and so the issue is ongoing.
Posted Nov 30, 2021 - 05:29 EET
Investigating
We are seeing an increasing number of pending jobs on the queue. We are investigating.
Posted Nov 30, 2021 - 03:34 EET
This incident affected: Transcoder Service.