Back
Outlook

Details of the Hotmail / Outlook.com outage on March 12th

Outlook.com left preview a few weeks ago, and as part of that, we shared that we’d start to upgrade the hundreds of millions of people using Hotmail to the new, modern Outlook.com experience.  We had done multiple pilots during the preview period and learned a ton. Overall, the upgrade has been going very well–people have upgraded much faster than we had expected.  The vast majority of people using our services have had a smooth experience during this time and are enjoying the new Outlook.com experience.  That said, we had an issue yesterday and wanted to provide you with a deeper look at what happened. 

Before we dive into the details, we do want to sincerely apologize to anyone that was unable to access their email during the interruption. Outages are something we take very seriously and invest a significant amount of our time and energy in doing our best to prevent.

Root cause analysis

At 13:35 PM PDT on March 12th, 2013 there was a service interruption that affected some people’s access to a small part of the SkyDrive service, but primarily Hotmail.com and Outlook.com. Availability was restored over the course of the afternoon and evening, and fully restored by 5:43 AM PDT on March 13th, 2013.

On the afternoon of the 12th, in one physical region of one of our datacenters, we performed our regular process of updating the firmware on a core part of our physical plant. This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way. This failure resulted in a rapid and substantial temperature spike in the datacenter. This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter.

These safeguards prevented access to mailboxes housed on these servers and also prevented any other pieces of our infrastructure to automatically failover and allow continued access. This area of the datacenter houses parts of the Hotmail.com, Outlook.com, and SkyDrive infrastructure, and so some people trying to access those services were impacted.

Details of impact and restoration

Once the safeguards kicked in on these systems, the team was instantly alerted and they immediately began to get to work to restore access. Based on the failure scenario, there was a mix of infrastructure software and human intervention that was needed to bring the core infrastructure back online. Requiring this kind of human intervention is not the norm for our services and added significant time to the restoration.

From that point onward, the team brought back access in waves throughout the evening. The majority of the impacted mailboxes were fully restored before midnight and the rest completed by 5:30 AM.

Conclusion

We hope this helped provide an understanding of the incident and again, we sincerely apologize and regret the impact this outage had on all of you.  Now that we’re through the resolution, we’re also hard at work on ensuring this doesn’t happen again.

https://status.live.com is always the best and most reliable way to get real time information specific to any service issues that we are encountering, and when you are signed in, is customized based on the health of your specific account.

Arthur de Haan, Vice President

 

Join the conversation

28 comments
  1. There was the old Microsoft, and here is the new Microsoft. Acknowledge, update and apologize. You’ve got a lifer here!

    It’s great to not be left in the proverbial darkness! :-) I believe other companies should follow suite!

    • Tariq – with all due respect – I wish you all the best.

  2. I think that it is great that you give informations on the outage, but what me concerned, is that your redunancy level seems a little bit low, since you provide services for more than a million users.

    • Johannes – two planes are on the tarmac – one with Microsoft software one with Linux … which one are you going to fly in ?

  3. I’ve been a CIO/CTO for almost 30 years and I’ve run a service bureau that handled millions of customer accounts. The notion that a hardware failure in a datacenter can render a critical service unreachable seems like a throwback to the 1990′s. NO single infrastructure item should be a single point of failure for both sides of a ’2n’ cluster. And online systems should be mirrored across multiple locations to prevent a catastrophic local incident from causing an outage.

    I’d love to see a white paper on ‘datacenter design principles for high-availability online services’ from Microsoft. I know I’ve had Microsoft account teams/TAM’s lecturing me for years on the right way to deploy your tools.

    • Wayne

      You are asking too much – look at their flagship messaging product Exchange. How can a simple device like iPhone bring down an entire messaging system. Not too mention calendar issues and ActiveSynch devices – allowing device to change an owner of the meeting !??????? – insane – now if someone could just tell me why are people willingly pay money for this. MS should simply license messaging backend from IBM (Lotus Domino) – reliable replication was introduced in 1996 (if I recall) and clustering technology like in Domino maybe introduced in Exchange 2050. My advise to IBM is to write a good interface to Oulook and we will all be happy.

      • num m, iPhones were able to take down Exchange servers due to no administrator setting throttling limits. Exchange servers had the option to protect themselves from the incident. Administrators did not turn on this protection. Result: The servers did not throttle misbehaving customers. There was no bug in Exchange that allowed this, the bug was Apple’s alone to fix.

  4. "This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way. This failure resulted in a rapid and substantial temperature spike in the datacenter."

    The headline promises details. How about some actual details?

  5. And who is hosting https://status.live.com ?
    I hope you outsource this to more reliable service provider.
    Another beta testing on general public, nothing new.
    As far as "details" you can only publish this information if you actually know it.
    Let’s see what develops.

  6. "status.live.com is always the best and most reliable way to get real time information specific to any service issues that we are encountering,"

    THIS IS NOT TRUE my status is all is working normally and

    Lets see I haven’t been able to access my email since 23rd Feb, when it does login in its so slow its unusable and timesout before I can forward a single email.

    Answers.Microsoft.com claim its because Microsoft haven’t invested enough in the hardware and nothing can be done
    UK Telephone customer service can’t deal with it as it isn’t a UK issue.
    Chat CS merely send you back to the UK tel no despite telling them the UK doesn’t deal with it.
    Live support don’t deal withit and sent me to chat CS.
    Tech support escalation (2 weeks so far) can’t forward email, can’t export contacts, In fact can’t do anything unless I send them an email from the hotmail account…Shame I cant access that sine it was "upgraded"
    Complete refusal from all MS departments to provide the contact details for anyone incharge or capable of doing 2nd line tech work.

    I could go on about the pop ups along the bottom, the constantly on messenger service, the print command hidden 3 levels down, inabilty to change inbox FONT or page layout, inability to list more than 5 contacts,avatar pictures! and on..

    To anyone still on Hotmail I suggest you move your data before you lose access to it, Currently it looks like I will need to take legal action to get mine back

  7. What exactly is ‘core part of our physical plant’ , that requires a firmware update and can have an effect on data center cooling ? Is it referring to a CRAC unit ? I doubt firmware upgrade of a server would cause the temperature to shoot up, unless the earlier version was using cpu core parking and the new version disabled it.

  8. Appreciate the openness of the posting – always good to know the reason of an outage.

  9. Some of us were really hoping the outage was indicative of a Metro upgrade for the calendar module, or implementation of EAS support for Mac mail clients!

  10. Not a big deal for my Hotmail account. But if this had been my Exchange server…….

    I’ll probably never move my Exchange to the cloud.

    JamesNT

  11. New outlook not bad, but automatically opens messenger. We don’t always want to be disturbed when signing in to Hotmail. can this be turned of manually when we want to?
    And don’t see where to turn it on or off.

  12. Still no access to my email! despite it being with support for 3 weeks

  13. Re Outlook.com, as of Mar 23: So what’s new? I was just getting ready to make the jump from Gmail, but I held off with problems. Is it fixed? Should I jump?

    • No it isn’t, stay as far away from Outlook.com as possible.

  14. I couldn’t retrieve my emails on my phone! oh that’s because hotmail was updated to this new STUPID outlook! I don’t see the difference other than the graphics are basic and I didn’t get to choose if I wanted to update. I know yahoo gives you a choice. THEE most annoying thing hotmail does is make me change my password every once in a while with out warning they make me change my password. And I mean THEY MAKE ME! they won’t accept my password and have to reset it thru my husband’s email. If I had a security breech I’d understand, but I can’t use an old password and my husband has never had to do that. I don’t have time keep track of new random passwords and I HATE when they say convenient and it’s not; at least for me. I don’t have an office job so I rely on my emails to schedule my day and if I’m not near my desk top at home I’m screwed so thank you once again hotmail for making an extremely important week extra hard by not allowing me to use my password. hotmail do more research when making changes because you are taking a huge step back and let outlook go it’s not useful. I have an app that does outlook’s job and more!

  15. I have a hotmail account that today i was locked out of..and according to the security bot you seem to be using i didnt have enough information to acess my account. this WAS my main email account. My main way of online communication. there is no option but for me to wait 24 hours to attempt again. No other way to access help..and considering the size of your company i find this beyond POOR customer service. I log in my email daily…. no i can not remember the subject matter of my sent emails..generally i don’t slap a subject on them.I will try again tomorrow…but after that i WILL be transferring my email to another company and using their services and will never use an outlook account again unless require by an employer….YOU need to change the protocol for retrieving account information on this service. I understand trying to weed out bots and hackers…but not at the expense of your loyal customers.

  16. Why have all my filtered folders gone ??? VERY important files and work in them and now thanks to the stupid and ugly change over they have all gone… FUMING is not the word.. 80k+ emails / files GONE… how do I get them back??? If not, who is gonna pay for all the work lost ??

    • Micky, can you ping me at outhelp@ at the microsoft.com domain?

  17. This Outlook Express is terrible. I cannot read any of my email. I get a message saying Outlook.com cannot connect to the internet at this time and always saying try later, error 8, and a bunch of other stuff. what a waste of my time. Here I come, Yahoo!!!!!!!

  18. Won’t run on my vista home prem. with ie explorer 8. explorer 9 won’t run. Outlook does work using Chrome. And don’t tell to upgrade to that kinderschool interface windows 8. Certainly as bad as Bob if you remember. Microsoft, you should hire some adults with common sense or go bankrupt.

  19. The upgrade to the Outlook product really is awful. How does Microsoft continue to survive? I tried cutting and pasting word text into an email and discovered there’s no way to keep the font size consistent. Emails start in 14 pt, but there’s no 14 pt choice in the font size selection. Stupid mistakes like that tell us this company is in major dysfunction and this product is a joke. Time to migrate to gmail like everyone else.

  20. I am having the same problem as Micky, where have all my folders gone to, I also have VERY important information in them. I know I have emails but they are gone too?????? Come on Outlook get your act together !!!!!!!!!!!!!!!!!!!!!

    • Rocketronnie, can you ping me at outhelp@ at the microsoft.com domain?

Comments are closed.