High Dimensional Tables

Hello, we are currently evaluating QuestDB to adopt it as a timeseries database for our company. We are interested in testing it out to store IoT sensor data for up to 5000 sensors that are typically recorded at 10 Hz. With our use case, its expected to have multiple parallel streams of 10Hz data for the same sensor since we may have multiple equipment that generate this data. Out-of-order writes are also common and expected.

We prefer to avoid normalizing the data such as having a SensorID, Value schema since it can easily generate an extremely large number of rows given our data volume.

We experimented with having each sensor as a column in one table such as
sensor1,sensor2,sensor3,…etc. but given our use case, this table could have 5000+ columns and we found that ingestion performance tends to degrade over time especially with out of order writes. (Applying the Wal to the table seems to be taking a long time)

One table would be highly preferred but if that’s not an option, we are leaning toward a small number of tables – we didn’t want to have to keep track of 5000+ tables so we are not trying to do the 1 sensor per table approach.

Given this, is there a way to make one table with 5000 columns work efficiently? Are there any configuration settings we can tweak that would help improve performance for a table like this?

We are also open to recommendations for any schema design that would be more efficient for QuestDB given our use case.

Thanks,
AG

5000 columns is supported, but is a lot. Furthermore, if you are not ingesting for all sensors in lockstep, you are likely to have lots of null entries, which will still take up storage space.

Generally, I would suggest splitting sensors into tables by type. But this sounds like you have identical sensors across the board.

Therefore, splitting this into a few tables is probably a better way to go.

Another path is also to go with a row-modelled schema (I know that is not your preference!). QuestDB can handle extremely high volumes of data - 5000 sensors at 10 Hz is only 50,000 rows/second.

A small computer like a Raspberry Pi 5 can do upwards of 250k rows/second, when ingesting in chronological order.

That could be a starting point, and then you can use Materialized Views and INSERT INTO SELECT to reshape the data into more convenient tables post-ingestion.

I appreciate you expect out-of-order ingestion. Fundamentally, the closer you can get your ingestion to globally chronological for that table, the better. So if you can get 99% of your sensor data written at the same time, and only delay a few, that is preferred.

Hey @A_G, any update?