bitemporal data and questdb performance

If I want to support bitemporal data with rare corrections, is the right thing to do to have a table designed like this:

CREATE TABLE events ( event_id SYMBOL, event_time TIMESTAMP, -- designated timestamp record_time TIMESTAMP, -- when we learned about it -- other data columns value DOUBLE, -- ... ) TIMESTAMP(event_time) PARTITION BY DAY;

and then query it like this:

-- Get the most recent version of each event as of now SELECT * FROM events LATEST ON event_time PARTITION BY event_id;

Is this the right approach? Is it going to use indexes / avoid scans or be slow? What if I then put a materialized view over further aggregations of the query that involves this latest clause - will it still incrementally update the view efficiently?

Hi @nick ,

It depends. It’s best to avoid UPDATE where possible, as this has impact on any materialized views that depend on the table. But, if you aren’t relying on materialized views, and updates are rare, then it may be best to just use that.

LATEST ON is a good solution if the dataset is not significantly skewed. However, with a skewed dataset, the query can become slower over time as it has to scan more and more historical data to find the infrequently updated symbol.

There is also the option of a caching daily last values using a materialized view: Materialized views | QuestDB

This is not a good fit if you have very high cardinality event_id, and where most of them only have a handful of rows. It is more for the case where you have an event_type and you want the latest row for each type, where cardinality is bounded.

In summary, I think LATEST ON is the way to go, so long as you bound you the query to ensure it does not scan too much historical data. Adding an index can speed this up, but will consume additional storage space, and should not be used for infinite cardinality fields like an auto-generated id.

We are adding some optimisations for this kind of thing in the near future, which will specifically help for cases where the query is not time-bounded and the filtering key is high cardinality.