From the Author of The Agile Architecture Revolution

Jason Bloomberg

Subscribe to Jason Bloomberg: eMailAlertsEmail Alerts
Get Jason Bloomberg via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Cloud Computing, Microservices Journal

Cloud Computing: Article

Failure Is the Only Option

A core cloud best practice is to expect and plan for failure

I ran a computer department for a small private school back in 1991. I remember rolling out our first Macintosh computers, the ones with one megabyte of memory and no hard drive (the hard drives were too expensive for us). I managed to get them up and running with no disk in the floppy drive, so that students could load and save their work to their own floppy.

This diskless approach was not particularly stable, however, and the computers would crash on a regular basis. As a result, my mantra was SAVE YOUR WORK! Expect your computer to crash, and plan accordingly!

Schoolkids being schoolkids, however, they frequently ignored my admonition. Sure enough, periodically one would come up to me with a tear in her eye. “Mr. Bloom! I just spent all period writing my English paper and the computer crashed! Please help!” But of course, there was nothing I could do at that point.

Two decades later, the computers are bigger, faster, and cheaper, we have the Internet and all it has done for us, and today we even have the Cloud. But in some ways nothing has changed. Failure is still unpredictable, yet it is around every corner. The core best practice, expect and plan for failure, is still as important as it was in the floppy days.

The Cloud: Built to Fail
It was actually my research into the causes of last month’s spectacular Amazon Cloud flameout that reminded me of my time in the school computer lab. One of the core architectural principles of Amazon’s Cloud—or any Cloud for that matter, public or private—is our old friend, expect and plan for failure. After all, each individual node in the Cloud, whether it be a server, hard drive, or piece of network equipment, consists of cheap commodity hardware. Each Cloud provider architects their Cloud environment so that any such piece of equipment—or even several pieces at once—can fail, and the environment should recover automatically.

In fact, fault tolerance and elasticity go hand in hand. Elasticity, which is the dynamic, automatic provisioning and deprovisioning of Cloud resources on demand, requires the same bootstrapping that fault tolerance calls for. Sometimes, the reason to bootstrap a box is to meet additional demand (elasticity) or to replace some other box that is having issues (fault tolerance). Essentially, when you need a new box, it boots and asks what it’s supposed to do, and then it finds and installs the appropriate resources to become the piece of equipment it needs to be.

Architecting for Failure
However, just because your Cloud provider architected their internal infrastructure to be elastic and fault tolerant doesn’t mean that your app will automatically inherit these traits once you move it to the Cloud. When an organization wants to run an application in the Cloud, it is important to architect the application to take advantage of the elasticity and fault tolerance the Cloud provides. Moving a big chunk of legacy spaghetti code into the Cloud is asking for trouble, as is trying to migrate a tightly coupled set of objects. Where you might have been able to get away with such design antipatterns for an in-house app, the Cloud forces you to clean up your act and deploy modular, loosely coupled apps that can take advantage of the inherently elastic, fault tolerant nature of the Cloud.

There is an important story here: the internal architecture of the Cloud is forcing organizations to rearchitect their apps, enabling them to take advantage of the Cloud, but as a welcome side effect, gives them better architected apps. But that’s not the only story.

The broader story is especially ironic. The irony, of course, is that a core Cloud best practice is to expect for and plan for failure—not only within the Cloud, but of the Cloud itself. The Cloud brokering we discussed in the last ZapFlash is no longer a “nice to have,” but rather a core architectural best practice. If you tell yourself that Cloud architectures are inherently fault tolerant, and therefore it’s sufficient to count on a single Cloud provider, you’re fooling yourself.

Architecting for the Cloud doesn’t mean sticking your app into the Cloud as though it were a black box. There’s far more to the story. On the one hand, it means rearchitecting your app to take advantage of the Cloud, and on the other hand, it means considering each Cloud provider instance as one element in your broader enterprise architecture. And if you want that enterprise architecture to be fault tolerant, avoid any single point of failure—even if that point if failure is the Cloud itself. Bottom line: SAVE YOUR WORK. I don’t want you coming up to me at the end of class because you lost your data. I won’t be able to do anything about your data, but I will be able to tell you I told you so!

More Stories By Jason Bloomberg

Jason Bloomberg is the leading expert on architecting agility for the enterprise. As president of Intellyx, Mr. Bloomberg brings his years of thought leadership in the areas of Cloud Computing, Enterprise Architecture, and Service-Oriented Architecture to a global clientele of business executives, architects, software vendors, and Cloud service providers looking to achieve technology-enabled business agility across their organizations and for their customers. His latest book, The Agile Architecture Revolution (John Wiley & Sons, 2013), sets the stage for Mr. Bloomberg’s groundbreaking Agile Architecture vision.

Mr. Bloomberg is perhaps best known for his twelve years at ZapThink, where he created and delivered the Licensed ZapThink Architect (LZA) SOA course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, the leading SOA advisory and analysis firm, which was acquired by Dovel Technologies in 2011. He now runs the successor to the LZA program, the Bloomberg Agile Architecture Course, around the world.

Mr. Bloomberg is a frequent conference speaker and prolific writer. He has published over 500 articles, spoken at over 300 conferences, Webinars, and other events, and has been quoted in the press over 1,400 times as the leading expert on agile approaches to architecture in the enterprise.

Mr. Bloomberg’s previous book, Service Orient or Be Doomed! How Service Orientation Will Change Your Business (John Wiley & Sons, 2006, coauthored with Ron Schmelzer), is recognized as the leading business book on Service Orientation. He also co-authored the books XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996).

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting).