From the Author of The Agile Architecture Revolution

Jason Bloomberg

Subscribe to Jason Bloomberg: eMailAlertsEmail Alerts
Get Jason Bloomberg via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Blog Feed Post

Anomaly Detection: The Big Data Whack-a-Mole

Wikipedia defines anomaly detection as the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Anomalies in operational data can indicate urgent problems, while anomalies in business data might also reflect positive events (like a spike in sales due to a new promotion).

There’s plenty of available science for detecting anomalies for traditional data sets. However, as the volume, variety, and velocity of big data increase, especially for time series data, anomaly detection faces an entirely new set of challenges.

At the root of such challenges: the fact that today’s data are always in flux.

Data sets themselves are dynamic to be sure. But the problem is worse than that.

The very nature of anomalies is also constantly in flux – and thus, any tools you might use to find them must be able to deal with such change.

We’ve just taken the anomaly detection Whack-a-Mole game to the next level. The moles aren’t simply jumping up and down. They’re multiplying – and every new mole is different.

Looks like we’re gonna need a better hammer.

The Anomaly Detection Squeeze

If you try to build your own anomaly detection tool, you’ll quickly find that you have to navigate between two undesirable extremes: detecting only obvious anomalies at one end, and detecting numerous false positives at the other. Unfortunately, both ends of this spectrum present Whack-a-Mole issues.

Just what it means for an anomaly to be obvious continues to shift, as technologies get better at recognizing increasingly subtle anomalies within increasingly noisy and unpredictable data sets.

Other challenges include problems with potential data sets, like missing data in time series, which can throw off the whole anomaly detection algorithm. In other cases, some of the data aren’t reliable, resulting from causes as wide-ranging as miscalibrated instruments to poor data entry techniques.

Seasonality also presents a common, but subtly complex set of challenges. Most tools understand daily and weekly patterns. Some annual patterns are also straightforward, like the Black Friday and Cyber Monday spikes all retailers lust after.

However, in many other cases, seasonal patterns are far more arbitrary, depending on business decisions like when to hold clearance sales, or notorious big data challenges like understanding patterns in weather data.

In any case, detecting obvious anomalies is nothing more than table stakes – and the minimum bet keeps going up. Anomaly detection tools must continually detect less and less obvious anomalies over time.

Similarly, the battle to reduce false positives continues to rage unabated. The more varied and dynamic the data sets become, the more careful the detection algorithm must be.

The good old days where you’d simply set a threshold and consider any data point that exceeded the threshold to be an alert are long gone.

Open Source Tools or Data Scientists to the Rescue?

If you’re struggling to solve the anomaly detection problem, you have two basic options: build vs. buy. The build option usually begins with an open source tool.

Using available open source tools like AnomalyDetection from Twitter or Weka out of the University of Waikato in New Zealand still typically requires custom development on your part, because these tools are more packages of do-it-yourself algorithms and components than usable applications.

So you take an available open source tool – or if you have a particular excess of chutzpah, start from scratch – and assign your crack team of data scientists to synthesize the perfect hammer for whacking all your moles.

Just one problem: good data scientists are virtually impossible to find, unless you’re a Google or a Facebook. And even if you’re lucky enough to hire one, they may or may not have the skills or predilection to work on that anomaly detection challenge you’ve been struggling with.

Remember, good data scientists have their pick of employers and their pick of interesting projects. Whacking your moles is unlikely to be high on their list.

Anodot is the Answer

Anomaly detection is so tough and such a dynamic challenge that the only practical way to address it is to find a vendor who has already invested the numerous person-years it takes to build such a tool. Anodot is just such a company.

Anodot automatically learns your data’s normal behavior and then identifies any deviations from that behavior in real-time, even for vast quantities of time series data. The tool is then able to detect subtle anomalies within many different patterns of data, at any level of granularity.

Anodot is thus able to automatically discover anomalies in vast amounts of data and turn them into business insights. Anomaly detection, after all, isn’t a carnival game. It is the key to squeezing the most value out of the flood of data organizations deal with every day.

Copyright © Intellyx LLC. Anodot is an Intellyx client. At the time of writing, none of the other organizations mentioned in this paper are Intellyx clients. Intellyx retains full editorial control over the content of this paper. Image credit: Valerie Hinojosa.

Read the original blog entry...

More Stories By Jason Bloomberg

Jason Bloomberg is the leading expert on architecting agility for the enterprise. As president of Intellyx, Mr. Bloomberg brings his years of thought leadership in the areas of Cloud Computing, Enterprise Architecture, and Service-Oriented Architecture to a global clientele of business executives, architects, software vendors, and Cloud service providers looking to achieve technology-enabled business agility across their organizations and for their customers. His latest book, The Agile Architecture Revolution (John Wiley & Sons, 2013), sets the stage for Mr. Bloomberg’s groundbreaking Agile Architecture vision.

Mr. Bloomberg is perhaps best known for his twelve years at ZapThink, where he created and delivered the Licensed ZapThink Architect (LZA) SOA course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, the leading SOA advisory and analysis firm, which was acquired by Dovel Technologies in 2011. He now runs the successor to the LZA program, the Bloomberg Agile Architecture Course, around the world.

Mr. Bloomberg is a frequent conference speaker and prolific writer. He has published over 500 articles, spoken at over 300 conferences, Webinars, and other events, and has been quoted in the press over 1,400 times as the leading expert on agile approaches to architecture in the enterprise.

Mr. Bloomberg’s previous book, Service Orient or Be Doomed! How Service Orientation Will Change Your Business (John Wiley & Sons, 2006, coauthored with Ron Schmelzer), is recognized as the leading business book on Service Orientation. He also co-authored the books XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996).

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting).