Hello. I recently tried to ingest data (via influx protocol and c++ client) of financial nature into single table. Tables were partitioned on hour criteria. Most data are sparse - among of 15 fields (of integer types) only about 4 have data - according docs questdb will correctly handle such cases with minimal memory footprint. But I was surprised because 2.7gb data were loaded during about half hour and my resulting harddisk consumption was about 90gb! It’s about 600 millions of records! I thought that questdb via columnar model will optimize storage, but on my filesystem for each partition I see that all column files have the equal size. For example some column which almost does not have any data occupy almost 512mb like others. Can it be a bug? I tried to minimize out of order hits (because data have floating
timestamp, not monotonically increasing) but they still are. Maybe is there any good article about such behavior? thanks in advance
Hi @alex-aparin ,
There is some info on the storage model here: Storage model | QuestDB
We use sentinel values to keep the columns aligned, which take disk space.
If you expect to have very sparse data, you should deploy to zfs
. You can also convert partitions to parquet
format, which will compress them, but this is a beta feature, so I would hold off for a little bit.
Please could you share your schema? Perhaps we can alter it a little. It may be better to store your data row-modelled and then convert it into an appropriate dense format later.
You could also normalise the data and then join it back together, so you don’t write so many unnecessary values.
P.S. We will be releasing PIVOT hopefully in the next release, which will help to re-map your data from narrow to wide schema.