Dear QuestDB community,
We are currently evaluating QuestDB for our company, and I need some help determining the most suitable table structure for our use case.
Our application consists of 30 sensors in total, each collecting data at a sampling rate of approximately 4 ms over periods of several weeks. Each sensor generates a high-precision value (HP) as well as 5 peaks, with each peak consisting of a distance and an amplitude value.
The sensors are distinguished by their side (A or B), as well as by Position_B and Position_L. The measurement type is either “Measurement” (M) or “Reference” (R).
Currently, I’ve set up two separate tables — one for the HP data and one for the peak data — as this seemed to be the most reasonable structure for filtering.
For our data processing, it is important that we can efficiently retrieve data for specific sensors in Python (e.g., HP values from site A, Position_B 1 to 3, for values between 1 and 3). Efficient, targeted retrieval is our first major requirement.
From the processed data, we identify time intervals ranging between 5 and 10 seconds (e.g., 2025-05-06T00:01:18.395000Z
to 2025-05-06T00:01:22.519000Z
). There can be as many as 1,000 such intervals per day.
Our next requirement is to be able to query these intervals as efficiently as possible from the database.
I’ve experimented with different approaches, such as nested queries that batch the intervals using OR conditions. This method has shown the most promise so far, but with several trillion rows, it’s still too slow.
SELECT timestamp, value
FROM HP_Values_Table
WHERE (timestamp >= '2025-05-06T00:00:00Z' AND timestamp <= '2025-05-06T23:59:59Z') AND
((timestamp BETWEEN '2025-05-06 18:55:17.456000+00:00' AND '2025-05-06 18:55:21.976000+00:00') OR
(timestamp BETWEEN '2025-05-06 18:55:30.744000+00:00' AND '2025-05-06 18:55:35.304000+00:00') OR
(timestamp BETWEEN '2025-05-06 18:55:38.384000+00:00' AND '2025-05-06 18:55:42.480000+00:00') OR
(timestamp BETWEEN '2025-05-06 18:55:47.712000+00:00' AND '2025-05-06 18:55:52.312000+00:00') OR
(timestamp BETWEEN '2025-05-06 18:55:57.160000+00:00' AND '2025-05-06 18:55:58.784000+00:00') OR
(timestamp BETWEEN '2025-05-06 18:56:18.344000+00:00' AND '2025-05-06 18:56:22.732000+00:00') OR
(timestamp BETWEEN '2025-05-06 19:20:42.044000+00:00' AND '2025-05-06 19:20:53.756000+00:00') OR
(timestamp BETWEEN '2025-05-06 19:21:03.316000+00:00' AND '2025-05-06 19:21:17.676000+00:00') OR
(timestamp BETWEEN '2025-05-06 19:21:44.484000+00:00' AND '2025-05-06 19:21:45.332000+00:00')) AND
value >= 1.8 AND
value <= 2.3 AND
Position_B = 1 AND
site = 'A';
Another idea was to add a new boolean flag column to mark the relevant time intervals, but updating the table turned out to be very inefficient.
I also considered storing the relevant data in a new table — but I would prefer to avoid that redundancy.
Ideally, the flexibility to define intervals using different algorithms should be preserved, which is why I would prefer to query the intervals directly via SQL.
Apologies if my explanation is a bit confusing — English is not my first language. I would be grateful for any input from you!
Sam