Transcoder service: Jobs stalling
Incident Report for Xvid MediaHub
Resolved
This incident has been resolved.
Posted Oct 01, 2021 - 06:00 EEST
Update
We are continuing to monitor for any further issues.
Posted Oct 01, 2021 - 04:39 EEST
Monitoring
It's actually not been CentOS 7 but Python also brings it own CA bundle with certifi which did not trust the new Letsencrypt root. We have implemented a fix now but it takes time until it is deployed to all running workers. At the same time, a lot of jobs got queued up and so backlog is large. Therefore, transcoding speed is still slower than usual. We're monitoring it and hope it's going to improve soon now once the backlog is processed.
Posted Oct 01, 2021 - 03:17 EEST
Identified
The IdentTrust DST Root CA X3 certificate, which is the root certificate of LetsEncrypt expired today and the new certificate root that our upstream cloud provider uses is not trusted anymore in CentOS 7, which caused all our workers to disconnect. We are trying to manually add the new CA and re-establish a new valid trust chain.
Posted Oct 01, 2021 - 01:13 EEST
Investigating
Currently, many jobs hang and do not complete. Our upstream cloud provider has logged an incident related to the expiry of the LetsEncrypt root CA certificate today. Upstream API is unavailable. We are investigating.
Posted Oct 01, 2021 - 00:53 EEST
This incident affected: Transcoder Service.