Hi all! I’m new to QuestDB, and am setting up a test to understand how much space per record it actually uses. Not sure if I’m doing thing correctly, but the results I’m getting are
100.7 bytes per record (which seems to be a bit too much)
For this schema:
CREATE TABLE 'ohlc_data' (
timestamp TIMESTAMP,
symbol SYMBOL capacity 256 CACHE,
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE,
volume LONG,
trades INT
) timestamp (timestamp) PARTITION BY DAY WAL;
I’ve inserted 3001000 records in the DB, and here are the stats I get using the following:
SELECT sum(diskSize) as total_size
FROM table_partitions('ohlc_data')
Storage Metrics:
- Total records: 3,001,000
- Total database size: 0.28 GB
- Bytes per record: 100.7
Am I doing anything wrong? Does compression improve the more data is inserted into the DB?
Thanks!
Hey,
Please can you run table_partitions('ohlc_data'), and post the result here?
Storage sizes are here: Data types | QuestDB
Per-row, I’d expect at minimum:
| column |
per-record |
for all |
| timestamp |
8 |
24_008_000 |
| symbol |
4 |
12_004_000 |
| open |
8 |
24_008_000 |
| high |
8 |
24_008_000 |
| low |
8 |
24_008_000 |
| close |
8 |
24_008_000 |
| volume |
8 |
24_008_000 |
| trades |
4 |
12_004_000 |
| total |
56 |
168_056_000 |
Therefore, at the minimum, uncompressed, it should be around 168 MB.
Then you need to tag on maybe 5-10% extra for metadata, symbol values etc.. And then the latest partition is always oversized, so not you will have a bit of extra space there.
We also oversize the latest partition, which gets truncated at rollover, so there is likely a fair bit of wasted space in the partitions.
If you want to compress the files in NATIVE format, you should deploy on a filesystem that supports compression, such as zfs. That can give you anywhere from 2x-10x compression ratios, depending on the data. For the schema you posted above, I’d expect perhaps 3x.
In the long run, you’ll convert your older partitions to PARQUET, with varying compression ratios.