There was a major outage in one of Amazon’s regions affecting several availability zones last Thursday.
– For a summary of the events and their impact see this blog entry of RightScale (I guess but I am not sure if it was written by Thorsten). The RightScale blog is updated now with some more details of the event.
– George Reese, the grand homme of Cloud Computing, calls this event a shining moment for clouds. Don’t get me wrong. I am big fan of George, not only because he is following me on twitter :). He gave a podcast interview repeating that you need to design for the cloud by designing for failure instead of sticking with your traditional architecture.
– Amazon did an poor job communicating what happened. Failures are a part of business but they have to be dealt with accordingly. Add this to your lessons learned list about Clouds. At least I did. Here is their summary.
– In my Cloud Computing book there is a whole chapter about RightScale (who provided the best analysis so far) as well as a section about disaster recovery and another one on designing for clouds (“why it is not enough to simply run WebLogic on AWS”) . There is also a free chapter for download available at Oracle’s Archbeat site.
IMHO this event teaches us that it is not enough to know how to simply run WebLogic on AWS or any other IaaS cloud provider such as Rackspace. By the way, this is one of the reasons why my book has more than the initially planned 120 pages …