Insert Performance Degradation

jsmart · February 10, 2025, 2:16pm

HI

I’m loading daily Klines archives from Binance (https://data.binance.vision/data/futures/um/daily/klines/${symbol}/1m/${file}) in a 1-minute interval. Each file contains 1440 candles, which I insert into the database using a single INSERT statement.

I need to load one year of data (~525,600 rows) for 350 symbols. However, as the table grows, the insert performance progressively degrades.

Issue

Initially, inserting 1440 rows took ~5ms, but as the number of records reached ~26,975,520, the same insert now takes 6.5 seconds.

Database Setup

I’m using PostgreSQL protocol for query, and my table is structured as follows:

CREATE TABLE IF NOT EXISTS klines (
    timestamp TIMESTAMP,
    symbol SYMBOL CAPACITY 4096,
    open DOUBLE,
    high DOUBLE,
    low DOUBLE,
    close DOUBLE,
    volume DOUBLE,
    quote_volume DOUBLE,
    taker_buy_volume DOUBLE,
    taker_buy_quote_volume DOUBLE,
    meta_id STRING
) TIMESTAMP (timestamp)
PARTITION BY DAY
BYPASS WAL;

Aggregation Query Example

To retrieve aggregated data, I use the following query:

SELECT 
    floor(timestamp / (1 * 60 * 1000)) * (1 * 60 * 1000) AS timestamp,
    first(open) AS open,
    max(high) AS high,
    min(low) AS low,
    last(close) AS close,
    sum(volume) AS volume,
    sum(quote_volume) AS quote_volume,
    sum(taker_buy_volume) AS taker_buy_volume,
    sum(taker_buy_quote_volume) AS taker_buy_quote_volume
FROM klines
WHERE symbol = 'BTCUSDT'
GROUP BY timestamp
ORDER BY timestamp;

Questions

Why is insert performance degrading as the table size grows?
How can I optimize the insert process to maintain high performance as data volume increases?
Are there better table structures or indexing strategies that could help?

Any insights or recommendations would be greatly appreciated!

nwoolmer · February 10, 2025, 2:25pm

Firstly, this isn’t TimescaleDB!

Common causes of slowdown:

Inappropriate partition sizing (but in this case DAYs should be fine)
Out-of-order (O3) inserts - probably your issue.
Small batch sizes
Not enough disk IOPS/throughput

You should probably enable WAL unless you are certain what you are doing.

You should make sure your data is sorted by timestamp first, not by symbol. This will ensure that your data is ingested in an append-only pattern.

You should try ingesting via ILP instead of Postgres Wire; this is usually faster.

You should increase your batch sizes if ablei.e send 50k+ rows at a time, instead of 1440.

QuestDB can sustain millions of rows per second in the chronologically ordered, appending case. Your schema and use case can be much faster that this, even on small devices (i.e a Rasperry Pi 5 can do 250-300k+ rows/second).

jsmart · February 10, 2025, 2:46pm

Thank you for your response!

You’re right—mentioning TimescaleDB was a typo. I’m actually using QuestDB, and for inserts, I’m using PostgreSQL wire protocol via import { Client } from ‘pg’ (Node.js PostgreSQL client).

nwoolmer · February 10, 2025, 2:47pm

Try giving the Node ILP client a go: Node.js Client Documentation | QuestDB

It will batch in larger chunks by default. The API offered is per-row, so you can just keep reading the data and adding rows to the buffer, which will be automatically flushed every 75k rows/1 second.

When you finish adding rows, perform a final flush to ensure no data hangs around.

jsmart · February 11, 2025, 4:00pm

it works that fast:

import * as net from "node:net";

const payload = "...ILP...";

const client = net.createConnection(9009, "localhost", () => {
                client.write(payload, () => {
                    client.end();
                    resolve();
                });
            });

Is it better to store timestamps in seconds or milliseconds in the database?

nwoolmer · February 11, 2025, 4:01pm

Store the timestamp in a TIMESTAMP column, which is microseconds.

Whilst we do have a DATE type (milliseconds), it should not be used over TIMESTAMP.

Also, try using ILP/HTTP, which works over port 9000. This will give you better error feedback if a write fails.

Topic		Replies	Views
Query data delay Community question , client	3	27	June 17, 2025
Which strategy with this Use Case Community sql , suggestion	6	68	April 22, 2025
Long list capacity overflow Community question	12	71	October 9, 2024
Ingestion slowness with Tick Data Community question	1	62	January 24, 2025
What are the implementation strategies for K-line data filling? Community sql , question	2	33	December 4, 2024

Insert Performance Degradation

Issue

Database Setup

Aggregation Query Example

Questions

Related topics