Most Popular Big Data (Bases) and Tools

The following is a description of current and emerging big data database technologies. The discussion is based on the objectives of these technologies as well as the type of data involved.

The key takeaway in most Big Data systems is recognized by the fact that the value of an individual piece of data goes down with time and the value of a collection of data rises with time. The value of aggregated data should increase over time, and closing the gap in the time taken to extract, transform, and load a data item will increase the value of the data more rapidly as the system tries to approach the theoretical concept of real-time decision making. 

So how do we achieve our Big Data decision making objectives given the tools available today? By selecting the proper database management tool.

In the world of database management systems used today for processing Big Data we have the following solutions:

1. RDMS/SQL  - These are the traditional relational Database management systems that use the traditional relational tables and indexes that we're used to. Some examples are Microsoft SQL, Oracle, MySQL, etc.

 

Benefits:

A well understood and consistent model meaning an application than runs on MySQL can be altered to run on Oracle without changing its basic assumptions.

Maintain relational integrity. ACID guarantees, ie  ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

Comprehensive OLTP/transaction support. Strong OLAP/analysis tools, often built in (MS Analysis Services, Oracle OLAP)

 

Problems:

Most solutions are expensive.

Scales up (i.e. bigger servers), but struggles to scale out (i.e. lots of servers). Also expensive. Not 'natural' for developers, which results in translation overhead and common mistakes like N+1 errors.

 

2. NoSQL - In-memory non-relational databases

These don't support the SQL language (hence the name) but more significantly don't support ACID or relationships between tables. Instead they're designed to query document data very quickly.

Examples: Hadoop, MongoDB, CouchDB, Riak, Redis, Cassandra, Neo4J, MemBase, HBase, etc

Benefits:

Cheap, mostly open source implementations. Systems can scale out very easily, tables can be readily sharded/federated across servers.

Most store native programmer objects, so no translation to tables.

Very VERY fast at finding records from massive datasets.

Problems:

No common model: there's quite a lot of differences between the many solutions.

No ACID guarantees, instead high fault tolerance must be built into the application.

Transactions are at the row level only (if supported at all).

Poor at aggregation - where an RDMS solution would use SUM, AVG and GROUP BY a NoSQL solution has map-reduce, which 

(some minor optimisations aside) has to do the equivalent of a table-scan.

Poor at complex joins, although arguably this is something you'd design differently for.

 

3. NewSQL- In-memory relational databases

NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the ACID guarantees of a traditional single-node database system.

These maintain ACID and relational integrity, but are in memory (like NoSQL) and readily scalable. They support SQL syntax. These are relatively new implementations and many traditional database vendors have rolled out their own solutions with the same capabilities. Think Oracle, Sybase, and even SAP with their in-memory HANA solution.

 

The most popular NewSQL systems attempt to distribute query fragments to different data nodes. These are designed to operate in a distributed cluster of shared-nothing nodes. Here nodes typically own a subset of the data. SQL Queries are split into query fragments and sent to the nodes that own the data. These databases are able to scale linearly as additional nodes are added.

 

Examples: Clustrix, VoltDB, GenieDB, etc.

 

 

 

 

 

Enjoyed the article?

Sign-up for our free newsletter to kick off your day with the latest technology insights, or share the article with your friends and contacts on Facebook, Twitter or Google+ using the icons below.



Enjoyed the article?

Sign-up for our free newsletter to kick off your day with the latest technology insights, or share the article with your friends and contacts on Facebook, Twitter or Google+ using the icons at the top of the article.


E-mail address


Comments

Subscribe To Our Newsletter

bigTech Opportunities