Skip directly to content

Big Data

Big DataCamp LA 2014 Wrap-Up

Monday, June 30, 2014 - 6:00pm

Written by Scott Jarr

 

Amazing. That’s about all I could say all day the Saturday before Father’s Day. I was asked to speak at the LA Big Data Camp (#BigDataCampLA) on that day, which I had happily agreed to before I actually realized it was _that_ weekend. But I went down as I agreed, thinking there couldn’t possibly be a big audience on that Saturday morning in LA to talk about data. Amazing!

 

I heard from the organizers (who did a really great job) that they had over 900 registrants! As of 10 am they had already exceeded the norms for attendants vs. registrants.

Connecting VoltDB to the Big Data Ecosystem

Monday, June 23, 2014 - 9:30pm

With its ability to act on data coming in at the rate of hundreds of thousands to millions of events a second, VoltDB is ideally suited as an operational database to process fast data.  Fast data creates Big Data, so moving data out of VoltDB becomes very important once that data has been processed and is no longer of immediate value. To move data out of VoltDB, use VoltDB Export. VoltDB Export allows you to transactionally push data from VoltDB into another system, similar to an ETL (extract, transform, load) process.

 

VoltDB lets you automate the export process by specifying certain tables

Cost accounting for SSDs – it’s RAM, not disk

Thursday, July 10, 2014 - 4:15pm

Most discussions I have seen about choosing SSDs vs. spinning disk arrays for databases tend to focus on SSDs as a replacement for disk. SSDs don’t replace disk; they replace the RAM you would be using to cache enough disk pages to make up for the terrible random IO performance of spinning disk arrays.

 

When you add a new disk to a disk array you get hundreds of IOPs, max. When you add an SSD to a RAID array – or better yet, packaged along with a new node in your cluster – you get hundreds of thousands more IOPs.

Five Things You Didn’t Know About VoltDB V4.0

Tuesday, August 12, 2014 - 2:30pm

#1) Fast Data Integrations

 

As we like to say at VoltDB, Big Data is created by Fast Data. Because VoltDB is able to ingest and transact on data at phenomenal rates, VoltDB often finds itself at the front end of Fast Data applications.

 

At the front end of Fast, VoltDB needs to get data in quickly, make decisions, and as the data ages and is no longer needed, get data out just as quickly as it arrived. To smoothly integrate into these Fast applications, VoltDB introduced several new messaging importers and exporters this year. They include:

 

  • Kafka Export
  • Kafka Import
  • RabbitMQ

Integrating VoltDB into the Hadoop ecosystem with Hive and Pig

Tuesday, October 28, 2014 - 11:30am

Imagine an online ad broker that has an extremely low latency decisioning system for bidding for online ad space and recording visitor ad views and click-throughs. The company also has a large Hive warehouse it analyzes to fashion ad campaigns for target consumers; this relies on the decisioning system data. In such a use case, using VoltDB is ideal because it excels at ingesting and processing data in real time, and it also features a high-performance data conduit to Hadoop via its export feature.

 

 

To enable this feature you need to configure the http export module in the deployment file

Part Four: You’d Better Do Fast Data Right – A Five Step approach

Wednesday, July 30, 2014 - 2:30pm

The last post defined what the Corporate Data Architecture of the future will look like and how “Fast” and “Big” will work together. This one will delve into the details of how to do Fast Data right.


Many solutions are popping onto the scene from some serious tech companies, a testament to the fact that a huge problem is looming. Unfortunately, these solutions miss a huge part of the value you can get from Fast Data. If you go down these paths, you will be re-writing your systems far sooner than you thought.

 

I am fully convinced that Fast Data is a new frontier.

Part Three: Designing a Data Architecture to Support Both Fast and Big Data

Monday, July 7, 2014 - 4:30pm

In post one of this series, we introduced the ideas that a Corporate Data Architecture was taking shape and that working with Fast Data is different from working with Big Data. In the second post we looked at examples of Fast Data and what is required of applications that interact with Fast Data. In this post, I will illustrate how I envision the corporate architecture that will enable companies to achieve the data dream that integrates Fast and Big.


The following diagram depicts a basic view of how the “Big” side of the picture is starting to fill out.

Simplifying the (complex) Lambda architecture

Monday, December 1, 2014 - 2:00pm

 

The Lambda Architecture defines a robust framework for ingesting streams of fast data while providing efficient real-time and historical analytics. In Lambda, immutable data flows in one direction: into the system. The architecture’s main goal is to execute OLAP-type processing faster - in essence,  reduce columnar analytics from every couple of seconds to 100ms or so, without actually enabling interesting new applications like real time application of user segments/scoring.

The rise of NoSQL is an opportunity for new RDBMS solutions

Friday, July 18, 2014 - 3:00pm

Written by Francis Pelland

 

It should come as no surprise that NoSQL has become popular over the past few years. This popularity has been driven in large part by the app revolution. Many new apps are hitting millions of users in less than a week, some in a day. This presents a scaling problem for app developers who seek a large audience.


Scaling a typical RDBMS system like MySQL or MS SQL Server from 0 to 1 million users has never been easy.  You have to setup master and slave servers, shard and balance the data, and ensure you have resources in place for any unexpected events.