How to speed up WAL Processing

Problem:

I wrote a rust program that spins up a number of parallel connections my QuestDB instance and inserts timeseries data into it. I want to push a lot of data into QuestDB continuously but I’ve found that the TableWriter struggles to keep up with the load. On a machine with 64GB of ram and 20 cores given to the DB, it’s processing around 190 transactions per hour. The problem is that I get 30,000+ transactions behind when moving data into the DB for a day. I’m guessing I can’t just let the sequencerTxn keep growing over the writerTxn forever…

My method:

I’m moving a large amount data from a different timeseries database into QuestDB. There’s thousands of tags that have all periodically logged values for decades.

This is my table:

CREATE TABLE 'historian' (
  t_stamp TIMESTAMP,
  tag_name SYMBOL capacity 256 CACHE,
  float_val DOUBLE,
  qual INT
) timestamp (t_stamp) PARTITION BY DAY WAL;

My rust program spins up async connections to the other DB and queries data for a specified amount of time for a specified batch size of tags.

  • For example, if moving a month of data: a worker would take 16 tags, query 2 days of time, throw it in the questDB client’s buffer, query another 2 days and so on… until it moves the entire month. After that it grabs the next 16 tags, and so on… until all tags have been queried. The buffer gets flushed whenever its size exceeds a certain amount.

This process happens asynchronously between a number of workers. Because of the async nature of the querying and processing, I’m guessing that there’s a lot of out of order writes.

Question:

  • Why is the TableWriter this slow? Is there any way I can config QuestDB to help it keep up?
    (I don’t care to see the data anytime soon. All I care about is moving it as quick as possible.)
  • How could I improve my Rust program to minimize out of order writes in a way that is effective?

I’ll look into keeping the data sorted in a Heap structure before sending it to the QuestDB buffer and flushing, but I’d appreciate any advice or insight.

How many different values do you have for the tag_name?

If you have less than 256, all good. But if you have more than that, specially if the number is significantly bigger, it is a good idea to create the table with the expected capacity, as that has been known to cause slow downs on ingestion.

1 Like

Also, when writing data it is a good idea to flush to the server in batches, so ideally send several thousand records in a single go (if you are using the official questdb clients and using autoflush, it defaults to 75K rows and you don’t need to do periodical flushing)

1 Like

We have almost 20,000 different values for tag_name. I had understood that as the number of characters… This is good to know; I’ll update it.

Is there a way to change the capacity of the symbol column without recreating the table?

I’ve been flushing the buffer whenever it reaches 104,857,000 rows. How do I enable autoflush for the Rust client? (I don’t see a function for it in the docs)

edit: nevermind, I found it. It’s in the configuration string.

Would making sure that the data is sorted by time before it is inserted into the buffer be helpful at all? (If, so does it have to be sorted in ascending order?)

Hi @djs ,

Yes, ascending order chunks will be much, much faster. Single servers can achieve 4m+ rows/second ingested with that ingest pattern.

Sending data out-of-order (O3) requires splitting and re-writing partitions. Simply, you re-write larger blocks of data much more frequently.

Its much more important for it to be globally ordered (each chunk is entirely after the prior chunk) than for the chunk itself to be sorted. This is so that we always append to the table.

1 Like

You could create a new table with the right definition, INSERT INTO the table, then drop the old one and rename the new one to the old name.

You could do something similar by creating a new column within the existing table and doing an UPDATE to that column, then dropping old column and renaming new one. This would be slower as UPDATES are not too efficient, but on the plus side it would need less storage as you are not duplicating the whole table.

1 Like

Thank you all so much for the timely responses.

I modified my program to send sorted data to quest DB. It’ll need a lot more stress-testing, but it seems to work a lot better already.

1 Like

Great news! Let us know you need anything else :slight_smile: