Part Two: The Alchemy of Fast Data
Welcome to part two of our series on the transforming Corporate Data Architecture. In part one I described what we mean when we talk about the Corporate Data Architecture, and most notably why there are two distinct ways to interact with data – Fast and Big. Let’s dive into Fast.
When we started VoltDB, it was hard to imagine the sheer number of fast data sources that are all around us today. In four short years new data sources have proliferated, evolving from niche applications to nearly every industry and every customer I talk to. This is illustrated in a bunch of interesting research on the proliferation of data, the Internet of Things and the implications mobile computing has on data generation – some examples: EMC’s Digital Universe study, Mary Meeker’s annual Internet Trends slide deck, and Pew Research’s studies on mobile trends and the effects of 25 years of the Internet.
I think of the growth in data as the natural outcome of a few major forces: processing power has grown, processors have been miniaturized, and costs have declined. Add these together and the result is processing capability in nearly everything, from our watches to our refrigerators. It‘s instant Fast Data.
But why is it fast? Two reasons I can think of are: 1) millions of end points streaming data, and 2) if data updates every minute are good, every second is better. Tremendous value in these applications comes from the data that devices generate. Data enriches the user experience, optimizes interactions and drives value. People who are pushing the limits on ‘fast data’ are those who are building the next great application. This holds true whether we’re talking about sensor applications, log record management or website interactions.
As I will try to do throughout this series, let’s illustrate the point with specific customer use cases. We have been working with a partner that builds systems to manage physical assets in precious metal mines. The company has sensors on several hundred thousand “things” that are in the mine at any given time. If they are looking for a lost shovel, minutes or hours of reporting latency may be just fine. But if a sensor on a person indicates a stopped heart, I’d want to know instantly, if not sooner. If I’m developing a system to manage this data, I’m building it to receive data fast, very fast.
But data events don’t exist in isolation from other things. To continue the example above, I may not care if an expensive piece of equipment wanders outside its “authorized zone” if there is a work order on it and it is moving to the repair depot. In this case, I am able to make a smart decision on that sensor event because it is based on other data in my system. (Here’s a little industry secret: We used to call these “transactions”.)
Data is also valuable when it is counted, aggregated, trended and so forth, i.e. real-time analytics. I see the need to analyze data in real-time for two distinct purposes.
- A human wants to see a real-time representation of the mine, via a dashboard, e.g., how many sensors are active, how many are outside of their zone, what is the utilization efficiency, etc.
- Those real-time analytics are used in the automated decision-making process. For example, if a reading from a sensor on a human shows low oxygen for an instant, it is possible the sensor had an anomalous reading. But, if the system detects a rapid drop in ambient oxygen over the past five minutes for six workers in the same area, it’s an emergency that needs immediate attention.
Physical asset management in a mine is a real-world use case to illustrate what is needed from all the systems that manage Fast Data. But remember, it is just representative. The same pattern exists for DDoS detection, log file management, or optimizing ad placement. When dealing with fast data you need to:
- Ingest the data in a way that makes the data accessible and fast
- Make a decision on each event at its point of single highest value – as soon as it arrives.
- Analyze the data in real-time to enable automated decision-making and create human-readable dashboards.
If you do these things, you are making your Fast Data work for you. You are making it Smart Data, fast.
Building a Corporate Data Architecture that thrives in the face of smart data fast and is able to get the most from deep analytics requires a new architectural approach. In the next post, we’ll look at the architectures being deployed to address this growing need of Fast + Big Data.