Delayed Updates/Commits via Influx Line Protocol (ILP) in QuestDB

chris · January 21, 2025, 5:13pm

Hi,

We are currently benchmarking QuestDB 8.2.0.

Setup:
Using a go client streaming data to quest db GitHub - questdb/go-questdb-client: Golang client for QuestDB's Influx Line Protocol
QuestDB with “factory defaults” - no special config.

Streaming to a table with the following definition:
CREATE TABLE %s (timestamp TIMESTAMP, tick INT symbol SYMBOL, volume INT) TIMESTAMP(timestamp) PARTITION BY DAY WAL

qdbClient.Table(v.table).
Symbol(“symbol”, strconv.Itoa(int(instrumentID))).
Int64Column(“tick”, int64(tick)).
Float64Column(“price”, price).
Int64Column(“volume”, int64(volume)).
At(ctx, time.Unix(0, int64(ts)))

The actual data being sent to QuestDB can only be queried minutes(!) later. We output the sent data in the client for a specific symbol right before the “Table” call on the qdbClient. The logged data is available several minutes (like 15-20 minutes) later in QuestDB. This is far from real-time?

How should we reduce the ILP ingestion delay?
Max number of uncomitted rows is set to 500000. However we insert around 100k rows per second so this should not matter?

Maybe inserting “older” data (now - 3 days) is causing trouble here? However, when the stream reaches “now”, the situatuon is still the same. Commits are delayed by several minutes.

Thanks
Chris

nwoolmer · January 21, 2025, 5:19pm

Hi @chris ,

Here are a few things to check.

You can call wal_tables() and check writerTxn and sequencerTxn. The delta between these numbers is how many transactions the database has committed but not made visible for read.
Check your server logs for ingestion info. From this, look for the number of rows per transaction. If you are using ILP/HTTP (default for Go client), this should be relatively large i.e 50-100k rows per transaction.

Very small transactions will slow down progress - there is a fixed per transaction overhead, which for small or single row transactions, starts to dominate.

How out-of-order is the data? Inserts behind ‘current time’ will initiate a copy-on-write process to split and merge partitions. In this case, you will be rewriting the partition from 3 days ago multiple times!

This should be avoided as much as possible, or minimised to lightly out-of-order. If you need to backfill, you either need much more IOPS/CPU overhead to keep live, or you should backfill into a separate table and JOIN the two in your queries.

Vlad_Ilyushchenko · January 21, 2025, 7:09pm

hi Chris, sometimes shape of data is causing excessive write amplification, which is likely what is causing the delay. Best course of action would be so share database logs, perhaps as a DM on Slack and we can work out what’s happening.

Additionally, what asset class is this bench for and how many distinct instrumentID values are there?

chris · January 26, 2025, 3:40pm

Hi Vlad,

This is a relatively small stream of real-time stock prices - only ~ 1200 distinct instrumentID values. We could persist and share the logs (currently only stdout). The commit delay seems to be related to rather old data being backfilled until we reach the “real-time” data.

To explain that in more detail - our incoming data is stored in RabbitMQ streams (Streams and Super Streams (Partitioned Streams) | RabbitMQ). From there on, we have consumers that ingest the data into QuestDB. The RabbitMQ setup is clustered and fault tolerant. The consumers and QuestDB instances could crash and are not fault tolerant. In such crash situations we need to backfill / replay data from RabbitMQ to QuestDB. This data can be several hours old when the consumers and/or QuestDB come up again. Exactly in such an edge case, we noticed the heavy commit delay.

chris · January 26, 2025, 3:41pm

Thanks for your quick response. We checked the delta between writerTxn and sequencerTxn. It was huge when we started backfilling. e.g. writerTxn = 131267 and sequencerTxn = 2617857

The data itself is not out of order. Use case is - we need to backfill/insert historic data if our ingesting nodes crash. Data is inserted with ascending timestamps, although this can be several hours old. The thing is, we would like to backfill as fast as possible after a possible downtime.

Are there any other ways? Let’s imagine, we have the data on another QuestDB instance - can we somehow copy the “files” or send the table via remote calls?

Vlad_Ilyushchenko · January 26, 2025, 7:01pm

Hi Chris, thanks for your response. Logs around the slow down would help us learn the exact events that cause the slowdown.

How much data do keep in questdb (in hours or days) relative to your backfill size? Also do you have a dedup enabled and if you do, what columns types are involved.

It is hard to me to reconcile the transaction numbers you provided. The way I read them is that you had 130k transactions and backfill introduced another 2.6m. If we assume that the whole table is republished, the batch sizes for normal and backfill flow are dramatically different. This is something that would be obvious from logs.

Anyway, logs will help us with accurate response to the problem. There is also release that’s coming out in 4 weeks to deal with small transactions, assuming that it is the cause

chris · February 2, 2025, 4:03pm

Hi Vlad,

Thanks - I guess the slowdown was related to an error on our end. We didn’t properly “truncate” the table when starting the benchmark, so lots of data had to be rewitten.

However, we are still evaluating the characteristics of QuestDB. Here you find a simple benchmark setup GitHub - godmodelabs/questdb-benchmark we made for this case.

We are ingesting 100k updates per second spread across 2.5 million unique symbols (that’s one of our feeds we have to process). When you run the benchmark via docker-compose, you’ll notice the increase of writer/readerTXN delta after every second. Furthermore this means that queried data from the benchmark table is not real-time anymore (we also report this delta in the output).

Maybe you can suggest sth for this scenario? The quest db logs should be easily consumable when running the docker setup. Can we tune anything for high symbol counts?

Thanks!

nwoolmer · February 3, 2025, 9:40am

Hi @chris ,

As a starting point, you should pre-create the table setting SYMBOL capacity. Currently, you leave it at the default, which is 256.

Change this:

_, err = Query("CREATE TABLE " + TABLE_NAME + " (timestamp TIMESTAMP, tick INT, instrumentID SYMBOL, price DOUBLE, volume INT) TIMESTAMP(timestamp) PARTITION BY DAY WAL;")

to something like:

_, err = Query("CREATE TABLE " + TABLE_NAME + " (timestamp TIMESTAMP, tick INT, instrumentID SYMBOL CAPACITY 4194304, price DOUBLE, volume INT) TIMESTAMP(timestamp) PARTITION BY DAY WAL;")

Since your symbol capacity is so undersized (256 versus 2.5 million), the symbol table has to be rewritten many times.

See how this affects the performance, and then we can dive deeper if need be!

chris · February 3, 2025, 6:30pm

Hi Nick,

That did the trick in this case - perfect! We’re now able to stream with 400k ticks/second spread across 2.5 mio symbols with minimal delay. Thanks @quest db team for your help!

Just a final question: are any native RabbitMQ sink connectors planned besides Kafka & co? RabbitMQ streams (Streams and Super Streams (Partitioned Streams) | RabbitMQ) provide a good foundation for financial tick/price data.

nwoolmer · February 3, 2025, 6:37pm

Great news!

Our roadmap for the next few months is Materialised Views, Arrays, Parquet. We do try to bring usability and integration improvements alongside this, but RabbitMQ is not a priority at the moment. We also integrate with Telegraf, so many users choose that as a workaround for sinking data to QuestDB. However, you could open an issue for it RabbitMQ support on GitHub, in the meantime.

We do accept open-source contributions, so if you or your team benefit from it (or were otherwise sufficiently motivated), we could figure something out!

Topic		Replies	Views
Reducing ILP Ingestion Delay in QuestDB Community config , question	1	183	June 26, 2024
Ingestion slowness with Tick Data Community question	1	65	January 24, 2025
Avoid query timeouts after ILP ingestion Community question	6	75	November 20, 2024
How does O3 parameters work? Community question	3	81	September 16, 2024
QuestDB High CPU Usage with Table-Per-Metric Approach (Ex-InfluxDB Refugee) Community question	4	91	May 2, 2025

Delayed Updates/Commits via Influx Line Protocol (ILP) in QuestDB

Related topics