I’m a bit confused with the performance of QuestDB. I’m running a QuestDB 8.3.3 into a podman container on my MacBookPro M4. The database is store into my laptop hardrive.
I’ve created a table to record transactions like this :
CREATE TABLE 'TRANSACTION_V2’ (
date TIMESTAMP,
dateUtc TIMESTAMP,
openDateUtc TIMESTAMP,
instrumentName VARCHAR,
period VARCHAR,
profitAndLoss VARCHAR,
transactionType VARCHAR,
reference VARCHAR,
openLevel FLOAT,
closeLevel FLOAT,
size FLOAT,
currency VARCHAR,
cashTransaction BOOLEAN)
timestamp (dateUtc) PARTITION BY DAY WAL;
The ingestion works like a charm.
But since I’ve loaded the 10 years transaction, when I perfomr a “Select * FROM TRANSACTION_V2; The request is very slow. It’s the same slowness from the webUI and my python program (using SQLachemy or pscyopg2).
There is 12866 row (which seems to be ‘nothing’ for QuestDB).
The first time I perform my SQL query, QuestDB took somehow 1min/2min.
The next request the SQL query perfrom in subsecond.
Why could explain this slowness ? I’m stuck with this problem I’ve found nothing similar across the web.
Hi, thank you for the question! The behavior you’re experiencing is due to the difference between “cold” and “hot” data access in QuestDB.
When you run a query for the first time after loading or restarting, QuestDB needs to read the relevant data from disk into memory. This initial read can be slow, especially if your data spans many partitions. The slowness is not specific to the client (Web UI, SQLAlchemy, psycopg2), but rather to the state of the data in memory.
Once the data is in memory (i.e., “hot”), subsequent queries are much faster because QuestDB can serve them directly from memory.
To address this, you can use the touch() function to pre-fetch (or “warm up”) the data into memory without actually running a full query that returns all the data. Here is more information about touch(), example: SELECT touch(SELECT * FROM x WHERE k IN ‘1970-01-22’);
Hope this helps, let us know if you have more questions!
It really depends. No partition is not recommended, as non partitioned tables will not have many of the benefits of partitioned ones.
Then for partitioning, it depends on query granularity and on how data is ingested. If you typically query data for a small time range, like a few days or a few weeks, then partitioning by YEAR would be a bad idea, as QuestDB typically reads whole partitions when data rows are needed. If you read a whole year to present just a few days, it is not ideal.
On the other hand, if your queries are typically for a whole year, and you partition per hour, then questdb needs to open a ridiculously high number of column files just to retrieve data for one query. Again, not ideal.
When it comes to ingestion, the partitioning granularity also affects to things like data coming out of order and so on. Ideally you should go for the partitioning strategy that matches the time range you use in your most frequent queries.