Storage Capacity Planning

I have two tables in my database with these schemas:

CREATE TABLE 'historian' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 14000 CACHE,
  float_val DOUBLE,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;
CREATE TABLE 'historian_str' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 14000 CACHE,
  str_val VARCHAR,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;

The first one has 38,324,466,335 rows, and the second one has 72,188 rows.

The DB takes up 1.53 TB on the disk. Is this normal?
(it just seems like a lot compared to other databases storing the same type of data)

Hi @djs ,

That does seem like a lot. I see you use DAY partitioning? If you run table_partitions('historian'), you should see a breakdown of the metadata about your partitions.

Are many partitions sparse i.e just a couple of rows?

It could be that you have very few rows in each partition, causing wasted space in each page. With a lot of partitions like this, your rows-to-disk-space ratio could get worse.
If this is the issue, you could use larger partitions.

Also, are you able to deploy on Linux with zfs? zfs will compress your data and help to mitigate this issue.

We are also releasing Parquet support soon, which will allow you to store your data in Parquet files, which support compression.

I ran table_partitions('historian'). All the partitions (except for the last one) are between 102,813,479 and 111,099,512 rows (or 4.2 and 4.6 GiB). Doesn’t look like I have any sparse partitions…

Linux with zfs can be an option. Do you have any estimate of how much space that can save?

We normally see about a 5x reduction in size for most people’s data - its certainly worth a try!

LZ4 is your best bet for a first go: Enable compression with ZFS | QuestDB

What options do we have if installing ZFS on the machine is not possible / permitted? The current storage model is exploding ~2Tb even on sparse data (10mm rows per day partition with 50 columns) with 30% cells having nulls. Any roadmap for table based compression? thanks.

QuestDB native format uses sentinel values for nulls, so sparse data takes as much room to store as dense data.

Work is in progress to support Parquet format for table partitions. This will mean that cold data (i.e older than the latest partition, so with DAY partitions, anything before today()) will be converted to Parquet. This will compress the data, especially nulls, most likely with around a 5x reduction (or more depending on the specific data).

The aim is release this in Q4 2024/Q1 2025.

If you have un-used partitions, you could detach and compress them manually.

Then re-attach them later when you need them. That way, you keep your data but free up some room!

Great - appreciate the quick response and the details. This will help me plan accordingly for short term till the support is released. Looking forward for the release.

Unfortunately - no un-used partitions given active historical trend views / summarizations.

1 Like

If it fits for your use case, you can also down-sample the data i.e SAMPLE BY 1d etc. ahead-of-time, store that in another table, and then detach/drop the original data. Of course, this only works if you only use the historical data in summarised forms!

But in this case, that would help to reduce storage size, and speed up queries, since less data is involved and its already sampled.

Hopefully you can manage it in the meantime, and we’ll ship improvements ASAP!

Agree, thanks for the suggestion. The summarization queries are dynamic (investigative) at least for now.