Top Digital Transformation and DevOps Influencer

Jason Bloomberg

Subscribe to Jason Bloomberg: eMailAlertsEmail Alerts
Get Jason Bloomberg via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Cloud Computing, Microservices Journal, Agile Digital Transformation, Platform as a Service

Cloud Computing: Article

Data Remanence: Cloud Computing Shell Game

Hackers will be quietly stealing your data before you know what happened

Everybody knows that dragging a file into the trash and then emptying the trash doesn't actually erase the file. It simply indicates to the file system that the file is deleted, but the data in the file remain on the hard drive until the file system eventually overwrites the file. If you require the actual erasure of deleted files, then you must take an active step to erase the portion of the drive that contained the file, perhaps by explicitly overwriting each bit of the original file. Even then, it may be possible (although generally quite difficult) to recover parts of the original file, due to the magnetic properties of the storage medium. We call this problem data remanence.

Cloud Computing complicates the data remanence issue enormously. You typically have no visibility into the physical location of your data in the Cloud, so overwriting the physical media is virtually impossible. The Cloud infrastructure may distribute your storage or virtual machine instance across multiple physical drives. And furthermore, deprovisioning that instance is similar to dragging it to the trash: the data that your instance wrote to the various drives remain until the Cloud provider eventually gets around to reallocating the sectors you were using to other instances. And even then, an enterprising hacker might be able to read your data by looking at the bits in their newly provisioned instance.

Unfortunately, the current state of the art for dealing with data remanence in the Cloud is a shell game: applications relegate the solution to the infrastructure level, while the infrastructure considers the problem to be at the application level. To make matters worse, no one seems to be focusing on the data remanence problem in the Cloud. That is, except for the hackers, who will be quietly stealing your data before you know what happened.

Encryption: Necessary but not Sufficient
Encryption is the obvious first line of defense against the data remanence problem. Make sure all the data you store in the Cloud are encrypted. Manage your keys locally, rather than putting them in the Cloud. In this way, not only are your data confidential, but all you have to do to securely delete your data is to delete (or expire) the key.

Problem solved, right? Not so fast.

Such application-level encryption has a major limitation. There's simply not much you can do with encrypted data unless you decrypt them, other than simply store them or move them around. If you decrypt your data in the Cloud, then the data remanence problem once again rears its ugly head. As a result, application-level encryption can only solve the data remanence problem when you're using the Cloud for storage only. If you want to process your data in the Cloud, the approach is insufficient.

Perhaps we should handle encryption below the application level, say, at the media layer. With media encryption, you essentially have an encrypted volume in the Cloud. You must present the appropriate credential to mount the volume, just as you would a local hard drive that has media encryption. Media encryption protects you from stolen hard drives (or your Cloud provider going bankrupt and putting the drives on eBay), but it is still insufficient for dealing with the data remanence issue.

The limitation of media encryption in the Cloud is that it only protects read/write operations to the file systems or databases that are physically present on the encrypted media. Other operations, however, may not have adequate protection, for example, message queuing, data caching, and logging. In a traditional on-premise server environment, your systems people are fully in control over how and where they handle such operational or transitory data. However, in the Cloud you have no such control. The Cloud provider's underlying provisioning infrastructure may use a caching scheme as part of its elastic load balancing, and you'd be none the wiser. Remember, you may believe queues or caches are inherently temporary, but the data remanence issue centers on situations where "temporary" really means "unpredictably persistent."

One approach to addressing this problem that is gaining in popularity is "Virtual Private Storage," or VPS. With VPS, encryption and decryption (among other capabilities) take place transparently on an intermediary that negotiates all interactions with the Cloud. For example, buy one of the new generation of Cloud appliances, put it in your DMZ, and configure it to encrypt everything going from your network up to the Cloud, while decrypting in the other direction. From the user's perspective such security measures are entirely transparent; they don't have to worry about confidentiality or data remanence in the Cloud. From the perspective of the Cloud, none of your data are ever unencrypted, whether written to a hard drive or temporarily stored in a queue or a cache somewhere.

The Missing Piece: Meaningful Use
Unfortunately, neither VPS or media encryption is a complete solution, because they both limit what you can do in the Cloud environment. In essence, all of the encryption approaches we've discussed treat the Cloud as a storage option. It's true that Cloud storage is an essential part of the Infrastructure-as-a-Service (IaaS) story. But what if you want to do more with the Cloud than IaaS?

A wonderful example of this question comes from the healthcare industry. And even if you're not in healthcare, the same challenges may apply to your organization. As you might expect, there are stringent, heavily regulated standards for the confidentiality of Electronic Health Records (EHRs). Encryption techniques traditionally provide sufficient confidentiality for these sensitive data. As solution providers build Cloud-based EHR applications, however, the data remanence issue rears its ugly head.

Cloud storage itself isn't the issue. Put EHRs in the Cloud, move them around, and bring them back from the Cloud: no problem there. But the regulations require more than storage. In the US, for example, the HITECH Act "promotes the adoption and meaningful use of health information technology." It then goes into quite a bit of detail as to what "meaningful use" means, and it's a lot more than IaaS can provide. For example, e-prescribing (eRx) and clinical decision support are two obvious meaningful uses of EHRs that the healthcare industry requires from Cloud-based solutions.

The challenge is that both eRx and clinical decision support necessitate actually doing something interesting with EHRs in the Cloud-and that means decrypting EHRs in the Cloud, which brings us back to the data remanence issue. IaaS simply cannot fully solve this problem, because it's at the application level. Software-as-a-Service (SaaS) also cannot fully resolve the problem, because SaaS solutions alone cannot deal with the remanence issues inherent in having decrypted data in the Cloud.

The ZapThink Take
Fortunately, there is a third Cloud service model: Platform-as-a-Service (PaaS). ZapThink has lambasted PaaS as warmed-over middleware in the Cloud, and truth be told, many PaaS solutions are still little more than thinly veiled middleware. The fact still remains that it's up to the PaaS vendors to solve the Cloud data remanence problem, since all of the gaps in media encryption and application-level encryption are within the realm of PaaS.

It's not clear, however, that any PaaS vendor has fully solved this problem yet. There are many moving parts to a platform, after all: messaging, transactionality, data storage and caching, framework APIs, and more. Place those capabilities into the dynamically provisioned Cloud environment. Then, ensure the platform never writes unencrypted data to physical media, even for data in transit.

Essentially, the PaaS vendors must rise to this challenge and build their offerings from the ground up with data remanence in mind. Until they do, no organization should trust them with EHRs or data of similar sensitivity. Of course, with challenge comes opportunity. Are you a vendor who is working on a solution to the Cloud data remanence problem, or a Cloud user who is struggling to find such a solution? Drop us a line, or better yet, check out our new online Cloud Security Fundamentals course.

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.