Storage Capacity Planning

djs · October 30, 2024, 4:44pm

I have two tables in my database with these schemas:

CREATE TABLE 'historian' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 14000 CACHE,
  float_val DOUBLE,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;
CREATE TABLE 'historian_str' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 14000 CACHE,
  str_val VARCHAR,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;

The first one has 38,324,466,335 rows, and the second one has 72,188 rows.

The DB takes up 1.53 TB on the disk. Is this normal?
(it just seems like a lot compared to other databases storing the same type of data)

nwoolmer · October 30, 2024, 4:55pm

Hi @djs ,

That does seem like a lot. I see you use DAY partitioning? If you run table_partitions('historian'), you should see a breakdown of the metadata about your partitions.

Are many partitions sparse i.e just a couple of rows?

It could be that you have very few rows in each partition, causing wasted space in each page. With a lot of partitions like this, your rows-to-disk-space ratio could get worse.
If this is the issue, you could use larger partitions.

Also, are you able to deploy on Linux with zfs? zfs will compress your data and help to mitigate this issue.

We are also releasing Parquet support soon, which will allow you to store your data in Parquet files, which support compression.

djs · October 30, 2024, 7:05pm

I ran table_partitions('historian'). All the partitions (except for the last one) are between 102,813,479 and 111,099,512 rows (or 4.2 and 4.6 GiB). Doesn’t look like I have any sparse partitions…

Linux with zfs can be an option. Do you have any estimate of how much space that can save?

nwoolmer · October 31, 2024, 4:39pm

We normally see about a 5x reduction in size for most people’s data - its certainly worth a try!

LZ4 is your best bet for a first go: Enable compression with ZFS | QuestDB

ack22 · November 20, 2024, 2:54pm

What options do we have if installing ZFS on the machine is not possible / permitted? The current storage model is exploding ~2Tb even on sparse data (10mm rows per day partition with 50 columns) with 30% cells having nulls. Any roadmap for table based compression? thanks.

nwoolmer · November 20, 2024, 3:00pm

QuestDB native format uses sentinel values for nulls, so sparse data takes as much room to store as dense data.

Work is in progress to support Parquet format for table partitions. This will mean that cold data (i.e older than the latest partition, so with DAY partitions, anything before today()) will be converted to Parquet. This will compress the data, especially nulls, most likely with around a 5x reduction (or more depending on the specific data).

The aim is release this in Q4 2024/Q1 2025.

If you have un-used partitions, you could detach and compress them manually.

Then re-attach them later when you need them. That way, you keep your data but free up some room!

ack22 · November 20, 2024, 3:28pm

Great - appreciate the quick response and the details. This will help me plan accordingly for short term till the support is released. Looking forward for the release.

Unfortunately - no un-used partitions given active historical trend views / summarizations.

nwoolmer · November 20, 2024, 3:42pm

If it fits for your use case, you can also down-sample the data i.e SAMPLE BY 1d etc. ahead-of-time, store that in another table, and then detach/drop the original data. Of course, this only works if you only use the historical data in summarised forms!

But in this case, that would help to reduce storage size, and speed up queries, since less data is involved and its already sampled.

Hopefully you can manage it in the meantime, and we’ll ship improvements ASAP!

ack22 · November 20, 2024, 8:08pm

Agree, thanks for the suggestion. The summarization queries are dynamic (investigative) at least for now.

ack22 · January 28, 2025, 12:59pm

Hi, any update on when the Parquet based storage support will be released? Are we still looking for Q1 2025.
Thanks.

nwoolmer · January 28, 2025, 1:13pm

We will be shipping materialised views (release after next) and arrays first. Parquet is high on the priority list for completion after those, so likely Q2 onwards.

It is already possible to copy the contents of a Parquet file into a native table, using read_parquet. Furthermore, the database can convert partitons to Parquet format, convert them back to native, and query them directly.

However, there are still some pieces to be completed around ingestion mechanics and UPDATE queries.

If there are quick wins based on what we currently have, we can consider them sooner, before the full feature is completed - for example, exporting a query result to Parquet format.

Hope that helps!

ack22 · January 28, 2025, 5:46pm

Thanks for the update. Looking forward for this so that I can use storage efficiently. Can you please provide details on “the database can convert partitions to Parquet format, convert them back to native, and query them directly” - How do we do this?
Thanks

nwoolmer · January 28, 2025, 5:52pm

Before performing any of these actions, make sure to backup your data with a snapshot or by copying it into a different table - These are beta commands! Also, you should only do this for partitions you are querying and not ingesting into.

You can run a command like this:

ALTER TABLE foo 
CONVERT PARTITION TO PARQUET 
WHERE timestamp <= 'some timestamp';

To convert back, swap PARQUET to NATIVE. Backing up is important as this is an unreleased feature and still may have bugs.

For absolute query performance, NATIVE format will remain faster, but we will narrow this gap before we make it generally available. Also, NATIVE format will continue to be used for the most recent partition.

ack22 · January 28, 2025, 6:07pm

Thanks. Will give it a try on old data.

Topic		Replies	Views
Storage "explosion" with INDEX CAPACITY Community	1	37	February 21, 2025
Increased disk and storage with many tables Community question	1	185	August 2, 2024
Config parameters suggested for many tables or many columns Community	1	162	July 26, 2024
Weird behavior of questdb under out of order sparse data Community question	1	37	February 14, 2025
Which strategy with this Use Case Community sql , suggestion	6	70	April 22, 2025

Storage Capacity Planning

Related topics