How to speed up WAL Processing

djs · October 15, 2024, 1:58pm

Problem:

I wrote a rust program that spins up a number of parallel connections my QuestDB instance and inserts timeseries data into it. I want to push a lot of data into QuestDB continuously but I’ve found that the TableWriter struggles to keep up with the load. On a machine with 64GB of ram and 20 cores given to the DB, it’s processing around 190 transactions per hour. The problem is that I get 30,000+ transactions behind when moving data into the DB for a day. I’m guessing I can’t just let the sequencerTxn keep growing over the writerTxn forever…

My method:

I’m moving a large amount data from a different timeseries database into QuestDB. There’s thousands of tags that have all periodically logged values for decades.

This is my table:

CREATE TABLE 'historian' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 256 CACHE,
  float_val DOUBLE,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;

My rust program spins up async connections to the other DB and queries data for a specified amount of time for a specified batch size of tags.

For example, if moving a month of data: a worker would take 16 tags, query 2 days of time, throw it in the questDB client’s buffer, query another 2 days and so on… until it moves the entire month. After that it grabs the next 16 tags, and so on… until all tags have been queried. The buffer gets flushed whenever its size exceeds a certain amount.

This process happens asynchronously between a number of workers. Because of the async nature of the querying and processing, I’m guessing that there’s a lot of out of order writes.

Question:

Why is the TableWriter this slow? Is there any way I can config QuestDB to help it keep up?
(I don’t care to see the data anytime soon. All I care about is moving it as quick as possible.)
How could I improve my Rust program to minimize out of order writes in a way that is effective?

I’ll look into keeping the data sorted in a Heap structure before sending it to the QuestDB buffer and flushing, but I’d appreciate any advice or insight.

javier · October 15, 2024, 2:18pm

How many different values do you have for the tag_name?

If you have less than 256, all good. But if you have more than that, specially if the number is significantly bigger, it is a good idea to create the table with the expected capacity, as that has been known to cause slow downs on ingestion.

javier · October 15, 2024, 2:20pm

Also, when writing data it is a good idea to flush to the server in batches, so ideally send several thousand records in a single go (if you are using the official questdb clients and using autoflush, it defaults to 75K rows and you don’t need to do periodical flushing)

djs · October 15, 2024, 2:49pm

We have almost 20,000 different values for tag_name. I had understood that as the number of characters… This is good to know; I’ll update it.

Is there a way to change the capacity of the symbol column without recreating the table?

djs · October 15, 2024, 2:54pm

I’ve been flushing the buffer whenever it reaches 104,857,000 rows. How do I enable autoflush for the Rust client? (I don’t see a function for it in the docs)

edit: nevermind, I found it. It’s in the configuration string.

djs · October 15, 2024, 4:08pm

Would making sure that the data is sorted by time before it is inserted into the buffer be helpful at all? (If, so does it have to be sorted in ascending order?)

nwoolmer · October 16, 2024, 9:26am

Hi @djs ,

Yes, ascending order chunks will be much, much faster. Single servers can achieve 4m+ rows/second ingested with that ingest pattern.

Sending data out-of-order (O3) requires splitting and re-writing partitions. Simply, you re-write larger blocks of data much more frequently.

Its much more important for it to be globally ordered (each chunk is entirely after the prior chunk) than for the chunk itself to be sorted. This is so that we always append to the table.

javier · October 16, 2024, 9:33am

You could create a new table with the right definition, INSERT INTO the table, then drop the old one and rename the new one to the old name.

You could do something similar by creating a new column within the existing table and doing an UPDATE to that column, then dropping old column and renaming new one. This would be slower as UPDATES are not too efficient, but on the plus side it would need less storage as you are not duplicating the whole table.

djs · October 17, 2024, 4:40pm

Thank you all so much for the timely responses.

I modified my program to send sorted data to quest DB. It’ll need a lot more stress-testing, but it seems to work a lot better already.

nwoolmer · October 18, 2024, 9:34am

Great news! Let us know you need anything else

Topic		Replies	Views
Delayed Updates/Commits via Influx Line Protocol (ILP) in QuestDB Community config , question	9	97	February 3, 2025
QuestDB High CPU Usage with Table-Per-Metric Approach (Ex-InfluxDB Refugee) Community question	4	69	May 2, 2025
How to await for rows ingested to WAL table to become visible in QuestDB? Community sql , question	1	124	June 26, 2024
QuestDB stopped writing data to the table, ILP gives no errors, table is not suspended Community question	9	230	October 10, 2024
Which strategy with this Use Case Community sql , suggestion	6	63	April 22, 2025

How to speed up WAL Processing

Problem:

My method:

Question:

Related topics