Top Digital Transformation and DevOps Influencer

Jason Bloomberg

Subscribe to Jason Bloomberg: eMailAlertsEmail Alerts
Get Jason Bloomberg via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Blog Feed Post

Anomaly Detection: The Big Data Whack-a-Mole

Wikipedia defines anomaly detection as the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Anomalies in operational data can indicate urgent problems, while anomalies in business data might also reflect positive events (like a spike in sales due to a new promotion).

There’s plenty of available science for detecting anomalies for traditional data sets. However, as the volume, variety, and velocity of big data increase, especially for time series data, anomaly detection faces an entirely new set of challenges.

At the root of such challenges: the fact that today’s data are always in flux.

Data sets themselves are dynamic to be sure. But the problem is worse than that.

The very nature of anomalies is also constantly in flux – and thus, any tools you might use to find them must be able to deal with such change.

We’ve just taken the anomaly detection Whack-a-Mole game to the next level. The moles aren’t simply jumping up and down. They’re multiplying – and every new mole is different.

Looks like we’re gonna need a better hammer.

The Anomaly Detection Squeeze

If you try to build your own anomaly detection tool, you’ll quickly find that you have to navigate between two undesirable extremes: detecting only obvious anomalies at one end, and detecting numerous false positives at the other. Unfortunately, both ends of this spectrum present Whack-a-Mole issues.

Just what it means for an anomaly to be obvious continues to shift, as technologies get better at recognizing increasingly subtle anomalies within increasingly noisy and unpredictable data sets.

Other challenges include problems with potential data sets, like missing data in time series, which can throw off the whole anomaly detection algorithm. In other cases, some of the data aren’t reliable, resulting from causes as wide-ranging as miscalibrated instruments to poor data entry techniques.

Seasonality also presents a common, but subtly complex set of challenges. Most tools understand daily and weekly patterns. Some annual patterns are also straightforward, like the Black Friday and Cyber Monday spikes all retailers lust after.

However, in many other cases, seasonal patterns are far more arbitrary, depending on business decisions like when to hold clearance sales, or notorious big data challenges like understanding patterns in weather data.

In any case, detecting obvious anomalies is nothing more than table stakes – and the minimum bet keeps going up. Anomaly detection tools must continually detect less and less obvious anomalies over time.

Similarly, the battle to reduce false positives continues to rage unabated. The more varied and dynamic the data sets become, the more careful the detection algorithm must be.

The good old days where you’d simply set a threshold and consider any data point that exceeded the threshold to be an alert are long gone.

Open Source Tools or Data Scientists to the Rescue?

If you’re struggling to solve the anomaly detection problem, you have two basic options: build vs. buy. The build option usually begins with an open source tool.

Using available open source tools like AnomalyDetection from Twitter or Weka out of the University of Waikato in New Zealand still typically requires custom development on your part, because these tools are more packages of do-it-yourself algorithms and components than usable applications.

So you take an available open source tool – or if you have a particular excess of chutzpah, start from scratch – and assign your crack team of data scientists to synthesize the perfect hammer for whacking all your moles.

Just one problem: good data scientists are virtually impossible to find, unless you’re a Google or a Facebook. And even if you’re lucky enough to hire one, they may or may not have the skills or predilection to work on that anomaly detection challenge you’ve been struggling with.

Remember, good data scientists have their pick of employers and their pick of interesting projects. Whacking your moles is unlikely to be high on their list.

Anodot is the Answer

Anomaly detection is so tough and such a dynamic challenge that the only practical way to address it is to find a vendor who has already invested the numerous person-years it takes to build such a tool. Anodot is just such a company.

Anodot automatically learns your data’s normal behavior and then identifies any deviations from that behavior in real-time, even for vast quantities of time series data. The tool is then able to detect subtle anomalies within many different patterns of data, at any level of granularity.

Anodot is thus able to automatically discover anomalies in vast amounts of data and turn them into business insights. Anomaly detection, after all, isn’t a carnival game. It is the key to squeezing the most value out of the flood of data organizations deal with every day.

Copyright © Intellyx LLC. Anodot is an Intellyx client. At the time of writing, none of the other organizations mentioned in this paper are Intellyx clients. Intellyx retains full editorial control over the content of this paper. Image credit: Valerie Hinojosa.

Read the original blog entry...

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.