QuestDB import with out of order data

Hello,

we have the requirement to import an amount of 4.800.000.000 rows as fast as possible into questdb. Source of the data are 50 million Kafka events. Each event has 96 15-minutes values of a energy meter (a whole day). The timestamps in each event are ordered but the events are not sorted on the Kafka broker so it should be mostly out of order data.

We are using a java service to read the data from Kafka and submitting it to questdb via the questdb-java-client and http+ilp.

The questdb server has 32 cpu cores and ~396 GB RAM. QuestDB and Service are running on the same machine. Java service never used more than four cores and 4 GB of RAM

Our first attempt started okay with ~ 850000 rows/s but the perfromance decreased fast and ended below 40000 rows/s. The java service wrote the data very fast to WAL, so the pending rows grew temporarily up to 3.000.000.000 rows. The WAL to table process seemed to run in intervals with gaps getting larger with the time.

We are desparate looking out for some informations to get the import faster. Or is it hopeless with this kind of data. Maybe there are more options that we do not see. If you need more informations please ask.

Kind regards
Sebastian

This is out table:

CREATE TABLE ‘import_values’ (
id SYMBOL,
value DECIMAL(15,6),
timestamp TIMESTAMP
) timestamp(timestamp) PARTITION BY HOUR
DEDUP UPSERT KEYS(id,timestamp);

We have adjusted the following config parameters, all others are on default:

shared.network.worker.count=10
cairo.max.uncommitted.rows=10000000
cairo.o3.min.lag=5s
cairo.o3.max.lag=60s
cairo.o3.column.memory.size=32M
cairo.writer.data.append.page.size=64M
cairo.iouring.enabled=true

Hi @Sebastian_Kiontke ,

These are the settings to tune:

CAIRO_O3_LAST_PARTITION_MAX_SPLITS("cairo.o3.last.partition.max.splits"),
CAIRO_O3_MID_PARTITION_MAX_SPLITS("cairo.o3.mid.partition.max.splits"),
CAIRO_O3_PARTITION_SPLIT_MIN_SIZE("cairo.o3.partition.split.min.size"),

The first setting defaults to 20, the middle 1, and the last to 50 MB.

This means that the last partition of the table can have up to 20 sub-partitions, with a minimum size of 50 MB each.

Any other partition will have effectively no splits.

Increasing the number of splits and decreasing their size could potentially help to reduce the write amplification.

Please revert the other settings you tried.

In general, it would be better to vibe code a k-way merge of your data to convert it into a global timestamp ordering, before sending it to QuestDB. You could do this and create parquet files, one per day, and then load them using INSERT INTO tbl SELECT * FROM read_parquet('file.parquet');

Please note that it is not required that each batch of data you send be internally ordered. However, you should try to ensure that subsequent batches are chronologically ordered in time.

i.e.

Batch 0: [2, 7, 3, 15] Batch 1: [18, 21, 20, 35] ← ok

Batch 0: [4,5,6,7] Batch 1: [0,1,2,3] ← not so good

Performing this load on a faster disk and then migrating to a cheaper, slower disk, might be a sensible option too.

Hi @nwoolmer

Thank you very much for your advices. We will test the settings you mentioned.

The pre-sort of the values for the batches would be a challenge. The values are all submitted individual by energy meters and there is absolute no order in the Kafka partitions. I fear the sorting will eat up some time we save on the import.

We are using not the slowest disks they come up to 1,2 GB/s.

If there is anything other we can do, it would be great if you could send it in a folow up post! But I assume from your answer, that sorting the batches would have the most impact.

You can try setting them to 100 splits in last partition and 5 MB minimum as a starting point. You can do a sweep of these parameters to figure out what gives you the best throughput.

If you are writing very heavily out of order, then you can change the mid partition setting too. But ideally, you should start from an empty table and go forward as much as possible.

You can also try working with parquet partitions, in some cases O3 writes can be cheaper.