Amazon failure takes down other sites
New York - Amazon.com struggled on Friday morning to restore computers used by other major websites such as Reddit as an outage stretched beyond 24 hours.
Though better known for selling books, DVDs and other consumer goods, Amazon also rents out space on huge computer servers that run many websites and other online services.
The problems began at an Amazon data centre near Dulles Airport outside Washington early on Thursday.
On Friday morning, Amazon's status page said the recovery effort was making progress, but it couldn't say when all affected computers would be restored.
Most of the sites that were brought down by the outage on Thursday were back up on Friday, but news-sharing site Reddit was still in "emergency read-only mode", and smaller sites were still reporting trouble.
Location-sharing social network Foursquare and HootSuite, which lets users monitor Twitter and other social networks more easily, appeared to have recovered.
Many other companies that use Amazon Web Services, like Netflix Inc and Zynga Inc, which runs Facebook games, were unscathed by the outage. Amazon has at least one other major US data centre that stayed up, in California.
It's not uncommon for internet services to become inaccessible due to technical problems, sometimes for hours or even days.
But the outage is notable because Amazon's servers are so commonly used, meaning many sites went down at once.
Amazon, which had not responded to requests for comment, has not revealed how many companies use its web services or how many were affected by the outage.
No one knew for sure how many people were inconvenienced, but the services affected are used by millions.
Amazon Web Services provide "cloud" or utility-style computing in which customers pay only for the computing power and storage they need, on remote computers.
Seattle-based Amazon has big plans for AWS. Although it now makes up just a few percent of the company's revenue, CEO Jeff Bezos said last year that it could eventually be as large as Amazon's retail business.
Competitors include Rackspace Hosting Inc and Microsoft Corp's Azure platform.
Some people consider cloud computing more reliable than conventional hosting services in which a small company might rent a handful of computers in a data centre.
If one of them malfunctions, the failure can take down a website.
But "clouds" like AWS use vast banks of computers. If one fails, the tasks that it performs, such as running a website or a game, can immediately be taken over by others.
When a company needs more capacity, maybe because of a surge in visitors to its website, it only takes minutes to rent more computers from Amazon.
But cloud computing isn't immune to failure, either.
Lydia Leong, an analyst for the tech research firm Gartner, said that judging by details posted on Amazon's AWS status page, a network connection failed on Thursday morning, triggering an automatic recovery mechanism that then also failed.
Amazon's computers are divided into groups that are supposed to be independent of each other.
If one group fails, others should stay up. And customers are encouraged to spread the computers they rent over several groups to ensure reliable service. But Thursday's problem took out many groups simultaneously.
Outages with Amazon's services are rare but not unprecedented. In 2008, several companies lost access to their own files for about two hours when one of Amazon's data centres failed.
The companies included DigitalChalk Inc, which delivers multimedia training over the web.
In general, Amazon Web Services have been more reliable and, above all, cheaper than many other hosting systems, said Josh Cochrane, vice president of product development at Palo Alto Software in Eugene, Oregon.
But the firm's websites and web-based applications that create business plans were all brought down by Thursday's crash.
"It's a pretty vulnerable feeling," he said. "This is a really big message to us that we need to revisit our strategy."
That might include spreading the applications more widely over Amazon's network, so that problems at one data center won't bring down everything, he said.
Amazon engineers struggled throughout the day to rectify the problem. Leong said the problems are of a type that's not covered by Amazon's money-back guarantees.