HI
I’m loading daily Klines archives from Binance (https://data.binance.vision/data/futures/um/daily/klines/${symbol}/1m/${file}) in a 1-minute interval. Each file contains 1440 candles, which I insert into the database using a single INSERT statement.
I need to load one year of data (~525,600 rows) for 350 symbols. However, as the table grows, the insert performance progressively degrades.
Issue
Initially, inserting 1440 rows took ~5ms, but as the number of records reached ~26,975,520, the same insert now takes 6.5 seconds.
Database Setup
I’m using PostgreSQL protocol for query, and my table is structured as follows:
CREATE TABLE IF NOT EXISTS klines (
timestamp TIMESTAMP,
symbol SYMBOL CAPACITY 4096,
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE,
volume DOUBLE,
quote_volume DOUBLE,
taker_buy_volume DOUBLE,
taker_buy_quote_volume DOUBLE,
meta_id STRING
) TIMESTAMP (timestamp)
PARTITION BY DAY
BYPASS WAL;
Aggregation Query Example
To retrieve aggregated data, I use the following query:
SELECT
floor(timestamp / (1 * 60 * 1000)) * (1 * 60 * 1000) AS timestamp,
first(open) AS open,
max(high) AS high,
min(low) AS low,
last(close) AS close,
sum(volume) AS volume,
sum(quote_volume) AS quote_volume,
sum(taker_buy_volume) AS taker_buy_volume,
sum(taker_buy_quote_volume) AS taker_buy_quote_volume
FROM klines
WHERE symbol = 'BTCUSDT'
GROUP BY timestamp
ORDER BY timestamp;
Questions
- Why is insert performance degrading as the table size grows?
- How can I optimize the insert process to maintain high performance as data volume increases?
- Are there better table structures or indexing strategies that could help?
Any insights or recommendations would be greatly appreciated!