Streaming to a table with the following definition:
CREATE TABLE %s (timestamp TIMESTAMP, tick INT symbol SYMBOL, volume INT) TIMESTAMP(timestamp) PARTITION BY DAY WAL
The actual data being sent to QuestDB can only be queried minutes(!) later. We output the sent data in the client for a specific symbol right before the “Table” call on the qdbClient. The logged data is available several minutes (like 15-20 minutes) later in QuestDB. This is far from real-time?
How should we reduce the ILP ingestion delay?
Max number of uncomitted rows is set to 500000. However we insert around 100k rows per second so this should not matter?
Maybe inserting “older” data (now - 3 days) is causing trouble here? However, when the stream reaches “now”, the situatuon is still the same. Commits are delayed by several minutes.
You can call wal_tables() and check writerTxn and sequencerTxn. The delta between these numbers is how many transactions the database has committed but not made visible for read.
Check your server logs for ingestion info. From this, look for the number of rows per transaction. If you are using ILP/HTTP (default for Go client), this should be relatively large i.e 50-100k rows per transaction.
Very small transactions will slow down progress - there is a fixed per transaction overhead, which for small or single row transactions, starts to dominate.
How out-of-order is the data? Inserts behind ‘current time’ will initiate a copy-on-write process to split and merge partitions. In this case, you will be rewriting the partition from 3 days ago multiple times!
This should be avoided as much as possible, or minimised to lightly out-of-order. If you need to backfill, you either need much more IOPS/CPU overhead to keep live, or you should backfill into a separate table and JOIN the two in your queries.
hi Chris, sometimes shape of data is causing excessive write amplification, which is likely what is causing the delay. Best course of action would be so share database logs, perhaps as a DM on Slack and we can work out what’s happening.
Additionally, what asset class is this bench for and how many distinct instrumentID values are there?
This is a relatively small stream of real-time stock prices - only ~ 1200 distinct instrumentID values. We could persist and share the logs (currently only stdout). The commit delay seems to be related to rather old data being backfilled until we reach the “real-time” data.
To explain that in more detail - our incoming data is stored in RabbitMQ streams (Streams and Super Streams (Partitioned Streams) | RabbitMQ). From there on, we have consumers that ingest the data into QuestDB. The RabbitMQ setup is clustered and fault tolerant. The consumers and QuestDB instances could crash and are not fault tolerant. In such crash situations we need to backfill / replay data from RabbitMQ to QuestDB. This data can be several hours old when the consumers and/or QuestDB come up again. Exactly in such an edge case, we noticed the heavy commit delay.
Thanks for your quick response. We checked the delta between writerTxn and sequencerTxn. It was huge when we started backfilling. e.g. writerTxn = 131267 and sequencerTxn = 2617857
The data itself is not out of order. Use case is - we need to backfill/insert historic data if our ingesting nodes crash. Data is inserted with ascending timestamps, although this can be several hours old. The thing is, we would like to backfill as fast as possible after a possible downtime.
Are there any other ways? Let’s imagine, we have the data on another QuestDB instance - can we somehow copy the “files” or send the table via remote calls?
Hi Chris, thanks for your response. Logs around the slow down would help us learn the exact events that cause the slowdown.
How much data do keep in questdb (in hours or days) relative to your backfill size? Also do you have a dedup enabled and if you do, what columns types are involved.
It is hard to me to reconcile the transaction numbers you provided. The way I read them is that you had 130k transactions and backfill introduced another 2.6m. If we assume that the whole table is republished, the batch sizes for normal and backfill flow are dramatically different. This is something that would be obvious from logs.
Anyway, logs will help us with accurate response to the problem. There is also release that’s coming out in 4 weeks to deal with small transactions, assuming that it is the cause
Thanks - I guess the slowdown was related to an error on our end. We didn’t properly “truncate” the table when starting the benchmark, so lots of data had to be rewitten.
However, we are still evaluating the characteristics of QuestDB. Here you find a simple benchmark setup GitHub - godmodelabs/questdb-benchmark we made for this case.
We are ingesting 100k updates per second spread across 2.5 million unique symbols (that’s one of our feeds we have to process). When you run the benchmark via docker-compose, you’ll notice the increase of writer/readerTXN delta after every second. Furthermore this means that queried data from the benchmark table is not real-time anymore (we also report this delta in the output).
Maybe you can suggest sth for this scenario? The quest db logs should be easily consumable when running the docker setup. Can we tune anything for high symbol counts?
That did the trick in this case - perfect! We’re now able to stream with 400k ticks/second spread across 2.5 mio symbols with minimal delay. Thanks @quest db team for your help!
Our roadmap for the next few months is Materialised Views, Arrays, Parquet. We do try to bring usability and integration improvements alongside this, but RabbitMQ is not a priority at the moment. We also integrate with Telegraf, so many users choose that as a workaround for sinking data to QuestDB. However, you could open an issue for it RabbitMQ support on GitHub, in the meantime.
We do accept open-source contributions, so if you or your team benefit from it (or were otherwise sufficiently motivated), we could figure something out!