Hi all, I’m running QuestDB on ZFS and noticed extremely high disk write activity (almost ~1 GB/s) from a kernel thread: [flush-zfs-2]. due to whcih my disk util is always 100%.
Is this level of ZFS background flushing expected?
Is this level of ZFS background flushing expected?
Hi Aditya,
Usually, ZFS doesn’t write unexpectedly high volume of data to disk.
What’s your ingestion scenario? Are you inserting lots of rows? Which client/protocol do you use to insert data? Understanding your load might help us understand what’s going on with ZFS.
We are inserting approximately 2 MB of data (~7.5k rows) every 7 to 25 millis using ILP over TCP, with 18 concurrent connections writing in parallel to different tables.
We’ve also disabled ZFS sync on the dataset (sync=disabled
), but it hasn’t had any noticeable impact on performance or flushing behavior.
That’s a relatively high ingestion rate. In our practice, ZFS becomes a bottleneck under high ingestion load. How is important the compression in your case? If it’s not terribly important, you should try ext4 or xfs. And in case if you’re using a network attached disk, like AWS EBS, make sure you have it configured to the maximum possible throughput and IOPS.
I’m okay with using XFS for real-time ingestion to avoid performance issues, but I plan to migrate the data to a ZFS disk afterward. If compression is enabled on the ZFS dataset, will the data be compressed during migration? Will this setup work reliably — ingesting on XFS, then storing long-term on a separate compressed ZFS disk?
Since the streaming database can stay on XFS for performance, but the historical data needs to go on ZFS due to large storage requirements, what would be the best and most efficient way to handle the migration?
Also, what are the recommended settings for the XFS filesystem in a high-throughput write workload?
Using a 4TB NVME M2 Gen4 Drive (Model : CT4000P3PSSD8).
With a single database instance this may be tricky as you’d have to detach partitions, copy them to the ZFS volume, and attach them via a sym link:
We’re working on native Apache Parquet support which will provide better compression ratio than ZFS. The idea is to support SQL statements for converting the older partitions from/to Apache Parquet format, with enabled encoding and compression. Hard to name an ETA for this feature, but hopefully it’s ready in a few months.
As for special XFS tuning, so far the defaults worked fine for us.
We do have IN VOLUME
, but this is for entire tables only, not just some partitions. It also has some bugs when dropping and re-creating tables.
That being said, you could have a separate table on a zfs
volume with historical data, and then one on xfs
with real-time data. Then you periodically copy data across in bulk via INSERT INTO SELECT
, and drop it from your real-time xfs
table.
Then you can JOIN across both tables.
These are the bugs I know of related to that feature:
Whether they would be dealbreakers, or worth working around in this case, is up to you!