Part One: The Imminent Fracture in Corporate Data Architecture: Fast + Big

written by Scott Jarr on May 28, 2014 with no comments

Fast + Big Data

Read Scott Jarr’s full series:

I am kicking off a blog series on the structural transformation of the Corporate Data Architecture. I have been struck by the rate of growth of new data AND the rate of growth of the perceived value of data. With new innovations, one typically lags the other. This unique dynamic is driving innovation and adoption of new technologies at an unbelievable pace and causing a huge change in the way companies manage data.

Technologists are realizing that the whole “data economy” is transforming with a very important distinction between the two major ways in which we interact with data. The real win will come when we bridge these two forms of interaction and start to realize the value we all dream will come from all this data!

People often ask if I think the impact of data is over-hyped. I do not. Based on my experience with real customers, I know it is vastly underestimated; you’ll see why in this series.

Throughout these posts I will try to ground the discussion in fact and share the experiences we have gained from our years in the market. However, there is no getting around the fact that I have some strong opinions on the topic. My hope is those opinions come across as respectful, and that they create a discussion in which many will join.

Corporate Data Architecture – An Introduction

This post is the introduction to the topic of a new, transformed Corporate Data Architecture and the major technology components necessary to make up what I believe will be ubiquitous in the not so distant future.

I’m excited to see the market is starting to get clearer than it was just a few years ago. We are starting to see segmentation of technologies into the big-bucket problems they solve, and as a result the whole picture is coming into better focus. Our Database Universe concepts were certainly not alone in describing the way things are shaking out; the content remains very relevant as an introduction to this series.

VoltDB has been building data management software for going on six years now. In that time, we have seen an undeniable trend in the way data applications are being thought of, designed and developed. Here’s a central learning from our work:

The value in Big Data is not purely from historical information.

Over this period of time we have seen what used to be two separate functions – the application and the analytics – begin to merge. Customers of all sizes are now examining how they build new applications and new analytics capabilities. More often than not, they are merging these two concepts. This natural progression quickly takes people to the point at which they realize they need a unifying architecture to serve as the basis for how data-heavy applications will be built across the company. As a result of this work, the modern Corporate Data Architecture will be born.

Data is Fast Before it’s Big

So, enough with the abstract concepts. I believe all serious data growth in the future will come from Fast Data. There is no magic to this statement. It is pretty common sense when you think about it. For us to achieve the data growth rates everyone seems to predict, data will have to come from sources that produce it at amazing rates. And this, it turns out, aligns with the reality we see all around us: Internet, mobile, social, sensor, the digitization of the world… all are generating vast amounts of fast-moving data.

Fast Data often comes into data systems in streams. They are fire hoses. (I’m going to ignore batch processing right now, because my belief is that the existence of batch data processes are often an admission that something is not working right – more on that in a coming post.) These streams look like observations, log records, interactions, sensor readings, clicks, game play, and so forth: things happening hundreds to millions of times a second.

Interacting with Fast Data is a fundamentally different process than interacting with Big Data that is at rest. These fundamentally different ways of interacting require systems that are architected differently. (Incidentally, this is the theory we founded VoltDB upon and, of course, it isn’t my co-founder Mike’s first time proving this theory is correct.

The things we do with Fast Data are fairly well described as:

  • Ingest – get millions of events per second into the system
  • Decide – make a data-driven decision on each event
  • Analyze in real time – provide visibility into operational trends of the events

Building high performance applications that can do these things with Fast Data is tough. Combining these capabilities with Big Data analytics into a Corporate Data Architecture is increasingly becoming table stakes. But not everyone is prepared to play.

This series will explore these challenges. In the next post, we’ll look at how to make Fast Data smart.

In the interim, I welcome your thoughts, here in the comments or on Twitter. Use the hashtag #CorporateDataArchitecture to join the conversation.