Insert Performance Degradation

HI

I’m loading daily Klines archives from Binance (https://data.binance.vision/data/futures/um/daily/klines/${symbol}/1m/${file}) in a 1-minute interval. Each file contains 1440 candles, which I insert into the database using a single INSERT statement.

I need to load one year of data (~525,600 rows) for 350 symbols. However, as the table grows, the insert performance progressively degrades.

Issue

Initially, inserting 1440 rows took ~5ms, but as the number of records reached ~26,975,520, the same insert now takes 6.5 seconds.

Database Setup

I’m using PostgreSQL protocol for query, and my table is structured as follows:

CREATE TABLE IF NOT EXISTS klines (
    timestamp TIMESTAMP,
    symbol SYMBOL CAPACITY 4096,
    open DOUBLE,
    high DOUBLE,
    low DOUBLE,
    close DOUBLE,
    volume DOUBLE,
    quote_volume DOUBLE,
    taker_buy_volume DOUBLE,
    taker_buy_quote_volume DOUBLE,
    meta_id STRING
) TIMESTAMP (timestamp)
PARTITION BY DAY
BYPASS WAL;

Aggregation Query Example

To retrieve aggregated data, I use the following query:

SELECT 
    floor(timestamp / (1 * 60 * 1000)) * (1 * 60 * 1000) AS timestamp,
    first(open) AS open,
    max(high) AS high,
    min(low) AS low,
    last(close) AS close,
    sum(volume) AS volume,
    sum(quote_volume) AS quote_volume,
    sum(taker_buy_volume) AS taker_buy_volume,
    sum(taker_buy_quote_volume) AS taker_buy_quote_volume
FROM klines
WHERE symbol = 'BTCUSDT'
GROUP BY timestamp
ORDER BY timestamp;

Questions

  1. Why is insert performance degrading as the table size grows?
  2. How can I optimize the insert process to maintain high performance as data volume increases?
  3. Are there better table structures or indexing strategies that could help?

Any insights or recommendations would be greatly appreciated!

Firstly, this isn’t TimescaleDB!

Common causes of slowdown:

  • Inappropriate partition sizing (but in this case DAYs should be fine)
  • Out-of-order (O3) inserts - probably your issue.
  • Small batch sizes
  • Not enough disk IOPS/throughput

You should probably enable WAL unless you are certain what you are doing.

You should make sure your data is sorted by timestamp first, not by symbol. This will ensure that your data is ingested in an append-only pattern.

You should try ingesting via ILP instead of Postgres Wire; this is usually faster.

You should increase your batch sizes if ablei.e send 50k+ rows at a time, instead of 1440.

QuestDB can sustain millions of rows per second in the chronologically ordered, appending case. Your schema and use case can be much faster that this, even on small devices (i.e a Rasperry Pi 5 can do 250-300k+ rows/second).

Thank you for your response!

You’re right—mentioning TimescaleDB was a typo. I’m actually using QuestDB, and for inserts, I’m using PostgreSQL wire protocol via import { Client } from ‘pg’ (Node.js PostgreSQL client).

Try giving the Node ILP client a go: Node.js Client Documentation | QuestDB

It will batch in larger chunks by default. The API offered is per-row, so you can just keep reading the data and adding rows to the buffer, which will be automatically flushed every 75k rows/1 second.

When you finish adding rows, perform a final flush to ensure no data hangs around.

it works that fast:

import * as net from "node:net";

const payload = "...ILP...";

const client = net.createConnection(9009, "localhost", () => {
                client.write(payload, () => {
                    client.end();
                    resolve();
                });
            });

Is it better to store timestamps in seconds or milliseconds in the database?

Store the timestamp in a TIMESTAMP column, which is microseconds.

Whilst we do have a DATE type (milliseconds), it should not be used over TIMESTAMP.

Also, try using ILP/HTTP, which works over port 9000. This will give you better error feedback if a write fails.