Hello QuestDB team,
I am using QuestDB to store data from Siemens PLC machine (IPC227G). I have setup a data pipeline that uses MQTT and telegraf to route data to QuestDB (which is running fine). Now, I want to estimate the storage consumption of QuestDB data. I believe QuestDB does not apply any special data compression steps, rather just store the respective data under columnar storage.
My questions:
- If I want to estimate the storage consumption, I can do it by simply multiplying the number of rows with storage consumed by respective data-type? (also, I am aware that I have to add some offset to maintain indexes, symbol dictionaries, etc.)
- Also, I will have data being ingested continuously and also being queried by my application (using Python API). On enabling ZFS compression, will this ZFS step put additional load on RAM/CPU?
- As storage consumption is a prime factor for us, I was wondering if the parquet file storage will be introduced in the near future? (I believe parquet file takes way less storage compared to the current file-type?)
TIA and regards!
Thanks for the quick feedback! I happen to check with Siemens device (currently under use), and they do not yet support ZFS compression. Plus, most of our server machines run on Windows (hence becomes more complex). So, it seems wise to wait for Parquet version.
Is there a fixed date when the stable version of parquet based storage version is expected to release? A quick PoC with any such stable version would be great to have!
Many thanks and regards
We will ship 9.2.0 with a new parquet export API imminently. This will allow you to test what kind of compression ratios you can get.
In already-released OSS versions, it is already possible to convert native table partitions to Parquet. Querying them is a little slower than native. But that may be less of a worry versus storage size for your use case.
Please feel free to feed back any issues you encounter. We are sequencing optimisations for parquet reads/writes over the next couple of months.
1 Like
With QuestDB’s native columnar data format, data ingestion and query performance is immensely great, no question at all! But with no compression support possible on our end (Ex: ZFS), data storage poses a problem.
I will be more interested in “in-place” conversion where actual data resides as “Parquet” (and not QuestDB columnar format). A quick test shows the data being compressed by 2.6X after being converted to Parquet. I know this compression ratio is further configurable 
With the version I use currently (V8.3.3), this conversion has to be performed externally everytime (as SQL command). It would be more interesting to see if the future QuestDB versions can directly store data as *.parquet (as InfluxDB V3Ent does it).
I believe QuestDB Enterprise might have such options to automate this process along with optimized performance for query/writes? Like you said, future versions could have much efficient performance than the current ones.
It will be on a TTL setting. Current TTL only drops data, we have not released the TTL syntax to auto-convert to parquet files yet.
Most recent partition will always be in native format, for best realtime read/write performance.
Enterprise is separated from OSS by further extensions to move the data away from local disk.
A quick test shows the data being compressed by 2.6X after being converted to Parquet. I know this compression ratio is further configurable
In 8.3.3, it will probably write uncompressed files. You should move to a newer version and reconfigure.
In 9.2.0, the default is ZSTD with level 9, but LZ4_RAW is also a good option.
1 Like