The second aspect of database sizing is capacity. Capacity describes the maximum volume of data that the database can hold.
Since VoltDB is an in-memory database, the capacity is constrained by the total memory of all of the nodes in the cluster. Of course, one can never size servers too exactly. It is important to allow for growth over time and to account for other parts of the database server that use memory.
Chapter 4, Sizing Memory explains in detail how memory is assigned by the VoltDB server for database content. Use that chapter to perform accurate sizing when you have a known schema. However, as a rough estimate, you can use the following table to approximate the space required for each column. By adding up the columns for each table and index (including index pointers) and then multiplying by the expected number of rows, you can determine the total amount of memory required to store the database contents.
Table 3.1. Quick Estimates for Memory Usage By Datatype
Having calculated the total storage required for the database content, you can estimate the memory required for the server overall by adding the memory required by the server process itself (one gigabyte is recommended) and adding 30% as a buffer.
Server memory = (total content size + 1GB) + 30%
When sizing for a cluster, where the content is distributed across the servers, the calculation for the memory required for content on each server is the total content size divided by the number of servers, plus some percentage for replicated tables. For example, if 20% of the tables are replicated, a rough estimate of the space required for each server is given by the following equation:
Per server memory = ( (total content size / number of servers) + 20% + 1GB ) + 30%
When sizing memory for VoltDB servers, it is important to keep in mind the following points:
Memory usage includes not only storage for the data, but also temporary storage for processing transactions, managing queues, and the server processes themselves.
Even in the best partitioning schemes, partitioning is never completely balanced. Make allowances for variations in load across the servers.
If memory usage exceeds approximately 70% of total memory, the operating system can start paging and swapping, severely impacting performance.
Keep memory usage per server within 50-70% of total memory.
Memory technology and density is advancing so rapidly, (similar to the increase in processor cores per server), it is feasible to configure a small number of servers with extremely large memory capacities that provide capacity and performance equivalent to a larger number of smaller servers. However, the amount of memory in use can impact the performance of other aspects of database management, such as snapshots and failure recovery. The next section discusses some of the trade offs to consider when sizing for these features.