Friday, March 6, 2015

Cassandra Compaction and Tombstone Behavior: Leveled vs. SizeTiered Compaction

Compactions in Cassandra can be contentious due to their impact on I/O load as well as increased disk space availability requirements. A primer in compaction will be provided, and the differences in Cassandra's data organization and tombstone handling between Leveled and SizeTiered compaction strategies will be discussed.

What is compaction?

Compaction is a maintenance process which re-organizes SSTables to optimize data structures on disk as well as reclaim unused space. It is helpful to understand how Cassandra handles commits to the datastore to understand why compaction is so important to Cassandra's performance and health.

When writing to Cassandra, the following steps take place:

  1. The commit is logged to disk in a commit log entry, and inserted into an in-memory table
  2. Once the memtable reaches a limit on entries, it is flushed to disk
  3. Entries from the memtable being flushed are appended to a current SSTable in the column family
  4. If compaction thresholds are reached, a compaction is run
The key takeaway is that the entry is appended to the current SSTable. Since SSTable entries are immutable, a row in an SSTable cannot be changed once written. For example, a simple schema for a column family might look like:

CREATE TABLE simple_cf (
 id int,
 text1 text,
 text2 text,
 PRIMARY KEY (id)
)

Some initial data is populated into the column family:

cqlsh:test> INSERT INTO simple_cf (id, text1, text2) VALUES (1, 'This is a test 1', NULL);
cqlsh:test> UPDATE simple_cf SET text2='This is a test 2' WHERE id=1;

The Cassandra server is flushed (nodetool flush). A (partial) update is performed after the flush:

cqlsh> UPDATE simple_cf SET text2='This is a test 3' WHERE id=1;