Why is VoltDB So Fast?
How is VoltDB This Fast?
VoltDB is based on 3 big concepts. While none of these concepts are new ideas individually, VoltDB builds all three into the core of the product.
Concept 1: Exploit repeatable workloads.
VoltDB exclusively uses a stored procedure interface. It expects an application’s complete set of procedures to be known in advance. This allows it to pre-optimize execution paths for incoming transactions. Since most OLTP applications perform the same set of operations over and over, this model is a good fit for OLTP.
That’s not to say VoltDB is inflexible. When the application changes, the set of stored procedures can be amended or updated. VoltDB also supports some ad-hoc access for administrative tasks.
Concept 2: Partition data to horizontally scale.
VoltDB divides data among a set of machines (or nodes) in a cluster to achieve parallelization of work and near linear scale-out. Unlike custom sharding or partitioning solutions, this functionality is a native and fundamental feature. VoltDB manages consistency and distribution across cluster nodes and replicates data to multiple nodes to ensure smooth operation during many types of failure.
Concept 3: Build a SQL executor that’s specialized for the problem you’re trying to solve.
OLTP means high throughput of write-heavy transactions, each operating on a relatively small subset of data returning relatively small result sets. Unlike analytical workloads, few transactions scan more a few thousand rows, and most scan only a handful. Since our stored procedures a) never wait for disk access, b) never wait for external input in the middle of a transaction and c) never read or modify huge amounts of data, they can be executed on modern processors in microseconds. If they take microseconds, why interleave their execution with a complex system of row and table locks and thread synchronization? It’s much faster and simpler just to execute work serially.
In order to run single-threaded in a world of multi-core, VoltDB partitions data by core, not by cluster node. A three node cluster with 8 cores / node will run about 24 lean mean single-threaded SQL machines.
Bonus: Leverage the 3 concepts together.
Because workloads are known in advance, replication can be done by performing the same procedure twice in two places. This is simpler and faster than log-shipping changed tuples.
Because data is replicated synchronously, don’t bother with a write-ahead-log at each SQL executor to maintain consistency on a single node.
So How Fast is VoltDB?
Let’s look at three kinds of workloads:
The TPC-C benchmark is supposed to represent an almost canonical OLTP workload, with a commerce database containing customers, orders, inventory, etc… We made some minor changes to it, so we call our benchmark, TPC-C-like, but the workload is nearly identical and still contains an average of 26 SQL statements per transaction. VoltDB runs this workload at tens of thousands of transactions per second on modern commodity servers. On our cluster of dual Xeon 5500s, it runs about 40 million SQL statements per minute per machine while maintaining redundancy. To go faster, simply add more machines.
Our Voter benchmark simulates an “American Idol”-style voting contest where a flood of votes are incoming and each needs to be transactionally verified against some set of rules. For example, each voter is allowed to vote no more than 10 times. Note that this kind of workload is much more difficult to get right in a system without transactions. While many view cloud-hosted services as a way to handle elastic demand, a commodity VoltDB server could handle tens of millions of votes in a single commercial break. With more servers, you can achieve millions of transactions per second, like we did in this benchmark with SGI. VoltDB is fantastic for these short term workloads where it’s easy to spend more time tuning a system than actually running it.
VoltDB is also useful for streaming data. We run into many use cases where a user has a stream of incoming messages that each trigger a transactional update to the database’s state. VoltDB can be used as a filter, a processor, a dashboard generator, or all of the above. Whether data is human or machine generated, there are few streams too large for VoltDB. If you need to process a million updates per second and update a dashboard with simple analytics several times per second, then VoltDB may be a good choice.
John Hugg
Software Engineer
VoltDB
Tags: Benchmarking, Performance, VoltDB Internals


Comments
No comments so far.