Major cloud outage impacting Transcoder Service
Incident Report for Xvid MediaHub
Resolved
This incident has been resolved.
Posted Apr 06, 2021 - 17:52 EEST
Update
Transcoding service operates normally, currently. However, we are still not at the same capacity than we had before. During load spikes, lower performance than normal can occur. So please allow for some more time to process your jobs than usual.

And to re-iterate: Due to this incident the IP addresses of our cloud workers which fetch content from your own external storage systems (like your FTP server, S3 bucket, etc.) have changed. In case you have set up any IP-based access control rules on your end, please make sure that you also allow our new IP range, which is:

149.202.160.0/19
Posted Mar 12, 2021 - 01:34 EET
Update
Datacenters that were not destroyed by the fire will be restarted between 15th to 19th March, so we'll likely have to run on our emergency setup in our alternative cloud for quite a while now.

This also means that in case you specifically whitelisted the IP range of our transcoder service in order to limit access to your FTP/SFTP/etc. services, you will need to temporarily whitelist our new IP range, which is 149.202.160.0/19 to ensure continued service.
Posted Mar 11, 2021 - 01:04 EET
Update
The backlog of jobs has been processed and newly posted jobs are processed ok again since a few hours. However, we're still having problems getting enough compute resources in our backup cloud and adequately scaling the system up. Hence, unfortunately performance may still be lower than usual.
Posted Mar 10, 2021 - 12:14 EET
Monitoring
We have an emergency setup deployed in another cloud and jobs are processing again. However, we have only limited compute resources with this emergency setup and a large backlog of jobs has built up on the queue meanwhile, so that performance is unfortunately still rather slow. We keep working on improving the situation.
Posted Mar 10, 2021 - 07:37 EET
Identified
We have received updated information about the major incident at our upstream provider: There is a fire in one of their datacenter buildings. Firefighters were immediately on the scene but they were not able to control the fire. Four adjacent datacenters are affected by this major fire too and are now also offline. So altogether five datacenters are affected. Due to the nature of this incident it is unlikely that any of our data or services in the affected datacenters will come back online anytime soon. We will activate our disaster recovery plan.
Posted Mar 10, 2021 - 05:02 EET
Update
The situation has actually worsened at our upstream provider because several of their datacenters are now entirely down and the issue does not only affect the cloud services now but also all bare-metal servers and storage systems we have with the same provider. Because we have redundancy with other vendors, only the Transcoder service is impacted by this major outage and all our other services (including the CDN service) still operate normally.
Posted Mar 10, 2021 - 04:52 EET
Update
We are continuing to investigate this issue.
Posted Mar 10, 2021 - 03:07 EET
Investigating
Our main upstream cloud service provider is experiencing an outage. It looks like the network is entirely down so that cloud instances cannot communicate with the internet anymore. This is affecting our Transcoder Service and jobs may currently be processed unreliably or processing be stalled for a long time. We're investigating the issue and will make sure that all affected jobs resume once the problem with our upstream provider is resolved.
Posted Mar 10, 2021 - 03:04 EET
This incident affected: Transcoder Service.