Sunday, August 7, 2011

Service failures 09 Aug (Back Online)

We have experienced a major outage on our servers due to a outage on the main and backup power supplies at our host, but have recovered all services: http://www.theregister.co.uk/2011/08/08/bpos_amazon_power_outages/

Update 10.20am, 10 AUG 2011 – All services are back online including the OMS.

Update 11:14pm, 09 AUG 2011 – all websites are now back online. OMS by tomorrow morning 10 Aug latest.

Update 05.00pm, 09 AUG 2011 -  our one affected mail server is now back online. this affected 10% of our email users (other email servers were fine).

Update 08.28pm, 09 AUG 2011 – senior engineers from Amazon AWS are currently assisting on the recovery process. We will provide an update in the next hour.

From Amazon AWS where our servers are hosted:

“We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire. Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We’ve now restored power to the Availability Zone and are bringing EC2 instances up. We’ll be carefully reviewing the isolation that exists between the control system and other components. The event began at 10:41 AM PDT with instances beginning to recover at 1:47 PM PDT.”

No comments:

Post a Comment