Advice on Relational + Time Series?

Just getting started with QuestDB and I don’t see any appealing ways to blend relational with time series. For this application I’m working on, the most interesting data is time series data (tracking product sales daily, maybe more frequently in future, and then trying to spot trends and predictive analytics, etc). However, the basic product details (product name, description, image url) don’t change often. (star schema)

I don’t really want to put this in a separate database, and I don’t really want to store it in a flat table because most of it would be repetitive. I can put it in a “legacy” table without a timestamp, but then I can’t enable WAL. I am also thinking that some of the product names might change over time, so maybe it’s okay to store it with a timestamp and just select or join by LATEST whenever I need the product details.

The other problem is a classic one, dealing with INSERT or UPDATE of this basic product detail information. When data is collected, I will have the entire record (product id, name, price, number sold) but most of it (the product details) will not change frequently. The goal would be to efficiently UPSERT the basic product details (insert or update only if something changed).

I was contemplating if maybe this just isn’t the right DB for this project, but the thing is, over time, when I scale it up, the sales data will add up to quite a bit of time series data (something like 100k-200k products sampled several times a day continually). When I start doing trend analysis, etc, over a year’s worth of data like that, I’ll need a DB that can handle that task efficiently.

p.s., I have done a bunch with relational and non-relational in the past, but it’s been a while, and I haven’t done much with time series. So any advice or thoughts are greatly appreciated

Hi @cardilloscreations , trying to parse this so let me know if I’ve misunderstood anything!

For your ‘static’ data, you could hack your way round this by adding a dummy 0 timestamp to your table, and partitioning by this timestamp. Then just insert every record timestamp=0, and voila, it is now a WAL table, so you can use features like DEDUP inserts, and insert into it using ILP/HTTP.

You can also take an event-sourced or event-logged approach, e.g.:

  • event-logged: on each update, you insert a new record with the updated state, then query for the latest.
  • event-sourced: you insert ‘update events’ to that table and aggregate all of the rows into a single ‘final state’

We also support UPDATE so you could update rows individually (WAL or non-WAL) if you needed to.

We will be improving our DEDUP feature soon to add DEDUP SKIP and DEDUP MERGE functionality, which would also help you to UPSERT more flexibility.

QuestDB doesn’t seem like a bad fit based on what you’ve described. Maybe you can share some example schemas to help me get a better picture, and we can talk more?

EDIT: Adding a P.S - we have users who had similar star-schema (or I suppose snowflake-schema) tables in Snowflake and other data warehouses, who successfully shifted them to QuestDB. Maybe that is a confidence vote!

The tip for the dummy timestamp is definitely useful. Thanks for that! I was contemplating this but haven’t got around to trying it yet.

At least with that approach, if I don’t care about time or history on those detail records, I can just use something like DEDUP UPSERT KEYS(ts,product_id) and then INSERT will handle the INSERT/UPDATE (by way of deduplication) for me.

The event-logged approach sounds like what I was contemplating if I do want to track the changes over time, but the more I think about it, maybe that’s best reserved for change data capture in the future. IDK.

The thing is, I really don’t want to keep writing the same data with every update (storage space and processing power will be a factor in the near future). I only want to write a new entry if something changed. I suppose I can do that with a few “chatty” statements or a transaction, but that isn’t very efficient, so I was hoping to figure out a clever way to handle it better.

Are there any plans to support something like INSERT … ON CONFLICT … DO UPDATE in the future? Something like that would be perfect for this because then I could only do another insert only if something changed. Would the new DEDUP features be able to help with something like this?

BTW, the sample schema isn’t much at the moment. Just a product table with the details and a time series table with the timestamp, product ID, and units sold.

We have an issue for DEDUP MERGE i.e INSERT and if there is an existing record, merge the new record values into it.

1 Like