Microsoft has been having some problems with Xbox Live, and protecting Xbox Live.
Several Microsoft online services are suffering from problems this evening, with both Xbox Live and the Windows Azure cloud service in the midst of extended outages. According to Microsoft's status page, Xbox Live users may have issues accessing data and saved games in the cloud. SmartGlass is also affected, along with the Xbox Live Karaoke and ESPN apps. The problem has been ongoing for several hours and appears to be getting worse, with the company recently adding Xbox Live functionality for Halo 4 to the list of affected services. In an update posted at 9:03PM ET, Microsoft states that "We are still working as fast as we can to fix the issue."
The problem may be related to another service interruption, this one affecting the company's Windows Azure cloud platform. At the time of this writing, 52 different services are currently listed as either offline or suffering from degraded performance. Microsoft explains that Azure's storage service is currently down worldwide "due to an expired certificate." The company goes on to say that it is "validating the recovery options before implementing them," and will keep users appraised with further updates. It's not specifically stated that the Azure outage is responsible for the Xbox Live problems, but given that both revolve around storage issues the timing is certainly suspect.
THIS ISN'T THE FIRST TIME
This isn't the first time Windows Azure has suffered from an outage due to certificate problems. Key parts of the service went down for 12 hours last February as well. In that case, Microsoft cited "a time calculation that was incorrect for the leap year"; the bug caused problems with the proper processing of digital certificates. It's not clear what led to the expired certificate behind today's outage
Update: Xbox Music and Video services are now affected as well. According to a 10:31PM ET update, users may be unable to browse, stream, or buy audio and video content.
Update 2: As of 2:17AM ET, all Xbox Live services save Music and Video are listed as in normal working order. Windows Azure slowly appears to be coming back online, but according to Microsoft's status page 43 services are still affected by the outage.
Update 3: Microsoft's latest update, at 4:00AM ET, states that it has recovered "99 percent availability across all sub-regions" for Azure, while adding that "customers may experience intermittent failures during this period." All Xbox Live services — including Music and Video — are back up and running as well.
The Azure failure also affected Microsoft's Xbox game, Halo 4, Microsoft confirmed.
The highest-profile incident may have had the least effect: "a small number" of Microsoft PCs were penetrated by an unknown intruder. No user data was compromised, Microsoft said in a blog post.
"Consistent with our security response practices, we chose not to make a statement during the initial information gathering process," Matt Thomlinson, general manager of Microsoft's Trustworthy Computing Security unit, wrote. "During our investigation, we found a small number of computers, including some in our Mac business unit, that were infected by malicious software using techniques similar to those documented by other organizations. We have no evidence of customer data being affected and our investigation is ongoing."
The attacks were consistent with other efforts to penetrate computers within Apple and Facebook, Microsoft said. Facebook discovered its attack last week, which followed attacks on the Wall Street Journal and The New York Times via an unpatched exploit within Java, exploited, experts believe, by the Chinese military.
Separately, ZenDesk reported Friday that it too, was hacked, exposing emails that clients Tumblr, Twitter and Pinterest used to communicate it with it for service-related requests.
At press time Friday night, Microsoft still had not implemented a fix for the Azure issue, caused by a failure to obtain a new SSL certificate. That brought its Azure storage services down across all of its worldwide regions, as well as services that were dependent upon them.
At 9:30 PM UTC (4:30 PM ET), Microsoft discovered that "HTTPS operations (SSL transactions) on Storage accounts worldwide are impacted," the company said. By 9:45 PM UTC, the the management portal, WindowsAzure.com, and the service bus, plus the websites that Azure serves were also down. By 10:15 PM, the company had begun validating steps to repair the problem, but hadn't formally announced a fix. After users began circulating screenshots of what appeared to be an expired SSL certificate, the company acknowledged its error.
"Windows Azure Storage has been affected by an expired certificate," a spokesman said in an emailed statement. We are working to complete the restoration as quickly as possible. We apologize for any inconvenience this has caused our customers. For more information please go to http://www.windowsaz...ce-dashboard/." Microsoft also apologized to customers via Twitter.
Microsoft also reported problems with its Compute services, preventing users from creating new virtual machines. That left users who needed to create those virtual machines to host new apps scratching their heads. "Most of our apps are screwed up now!" pinvoke.in, one commenter, complained. "WHATS NEXT? All compute instances die because someone at the data center switched them off?"
Unfortunately for Microsoft, this sort of thing has happened before. At the end of February 2012, Microsoft failed to account for the leap day at the end of the month, Feb. 29. As a result, the Azure services was down for more than 12 hours before Microsoft could issue a fix. Microsoft hasn't said whether or not the recent outage was a result of an oversight, or a more serious technical error.
Oddly enough, Netflix began reporting problems of its own on Friday night, leading to the intriguing possibility that two cloud services may have been failing at the same time. But although Netflix has gone down before when Amazon's AWS service failed, Amazon's own AWS service dashboard didn't indicate any problems.
Recommended Comments