Timeline of events
- Around 01:00am CEST our monitoring system started recording devices not being able to reconnect to the server when they lost connection.
- Early morning, about 06:30am CEST, investigation started by the technical team. At that time, only around 40% of devices were still connected.
- As the connections kept dropping, a reboot was initiated around 07:30am CEST.
- This reboot did not solve the issues, and our technical team kept searching for the root cause.
- The problem was found in an unoptimized configured service and hot fixed at around 08:20am CEST.
- A second reboot was initiated at 08:30am CEST.
- Devices restored their connections over the next hours.
- At around 1:30pm CEST normal operation was restored.
Impact
This issue was confined to the RL/mbCONNECT24 EU V2 Server.
Follow-up actions
Possible side effects of the hot fix are being tested and monitored. Afterward, the changes will be implemented into the next RSP release.