Using a materialized view as another materialized view as base timeout / OOM

Hey guys! Been loving what you guys are doing with QuestDB <3

The current issue i’m having is with trying to have rollup MVs for a price system. I have a base table that has ~700M rows (between 1-10M rows per partition, which are by DAY here)

CREATE TABLE ‘accounting_delta’ (
timestamp TIMESTAMP,
user varchar,
asset varchar,
is_deleted BOOLEAN,
amount_approx DOUBLE,
amount_part1 LONG,
amount_part2 LONG,
amount_part3 LONG,
amount_part4 LONG,
amount_part5 LONG,
amount_part6 LONG,
amount_part7 LONG,
amount_part8 LONG,
is_inbound BOOLEAN,
tx_id varchar
) timestamp(timestamp) PARTITION BY DAY WAL
WITH maxUncommittedRows=500000, o3MaxLag=600000000us
DEDUP UPSERT KEYS(timestamp, user, asset, tx_id);

My first MV works great:

CREATE MATERIALIZED VIEW ‘holdings_hourly’ WITH BASE ‘accounting_delta’ REFRESH INCREMENTAL AS (
SELECT
timestamp,
asset,
user,
sum(amount_approx) as amount_approx,
sum(CASE WHEN is_inbound THEN amount_part1 ELSE -amount_part1 END) AS amount_part1,
sum(CASE WHEN is_inbound THEN amount_part2 ELSE -amount_part2 END) AS amount_part2,
sum(CASE WHEN is_inbound THEN amount_part3 ELSE -amount_part3 END) AS amount_part3,
sum(CASE WHEN is_inbound THEN amount_part4 ELSE -amount_part4 END) AS amount_part4,
sum(CASE WHEN is_inbound THEN amount_part5 ELSE -amount_part5 END) AS amount_part5,
sum(CASE WHEN is_inbound THEN amount_part6 ELSE -amount_part6 END) AS amount_part6,
sum(CASE WHEN is_inbound THEN amount_part7 ELSE -amount_part7 END) AS amount_part7,
sum(CASE WHEN is_inbound THEN amount_part8 ELSE -amount_part8 END) AS amount_part8
FROM accounting_delta
SAMPLE BY 1h
) PARTITION BY DAY;

for holdings_hourly, I have around 1-5M rows per partition (DAY).

What i want is being able to create this MV, that uses as its base the hourly MV result set, narrowing down aggregations, and being able to be more realtime.

CREATE MATERIALIZED VIEW ‘holdings_daily’ WITH BASE ‘holdings_hourly’ REFRESH INCREMENTAL AS (
SELECT
timestamp,
asset,
user,
sum(amount_approx) as amount_approx,
sum(amount_part1) AS amount_part1,
sum(amount_part2 ) AS amount_part2,
sum(amount_part3 ) AS amount_part3,
sum(amount_part4 ) AS amount_part4,
sum(amount_part5 ) AS amount_part5,
sum(amount_part6 ) AS amount_part6,
sum(amount_part7 ) AS amount_part7,
sum(amount_part8 ) AS amount_part8
FROM holdings_hourly
SAMPLE BY 1d
) PARTITION BY DAY;

This would result in about 1-3M rows per partition (day too), but everytime I do run it, I either get an RSS OOM error (I have ~40G ram, 15 cores), or a timeout (Materialized View is invalid [-1] timeout, query aborted)

Any recommendations on how doing cascading MVs or achieving some sort of continuous aggregation like I’m trying to achieve?

** Note that the MVs work great when I create them while the first base table is empty, then backfill the whole thing, maybe this is related to the initial full mv refresh?

Thanks guys, appreciate it

Hi @JForsaken ,

What version of QuestDB are you using?

Materialized Views became GA in 8.3.1. If you are using an older version, then OOMs are likely during full refreshes.

If it isn’t this, then we can dive deeper :slight_smile:

Oh sorry to point out the version my bad, was seeing this in 8.3.1, updated to 8.3.2 this morning and still having the issue!

Could you share some logs, please, covering the rebuold time and its failures?

The rebuild is incremental, so for this to happen, I imagine the query itself would fail with OOM?

SELECT
timestamp,
asset,
user,
sum(amount_approx) as amount_approx,
sum(amount_part1) AS amount_part1,
sum(amount_part2 ) AS amount_part2,
sum(amount_part3 ) AS amount_part3,
sum(amount_part4 ) AS amount_part4,
sum(amount_part5 ) AS amount_part5,
sum(amount_part6 ) AS amount_part6,
sum(amount_part7 ) AS amount_part7,
sum(amount_part8 ) AS amount_part8
FROM holdings_hourly
SAMPLE BY 1d;

Does it fail on its own?

Yes this does fail with:

global RSS memory limit exceeded [usage=40563170337, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29]

I wasn’t expecting this, i was expecting the query to run the aggregation iteratively by time window? It feels like it’s loading all of it in memory I’m assuming?

If this helps, here are the sizes and partitions for the hourlies:

{
    "query": "select * from table_storage() where tableName ilike '%holdings_hourly%'",
    "columns": [
        {
            "name": "tableName",
            "type": "STRING"
        },
        {
            "name": "walEnabled",
            "type": "BOOLEAN"
        },
        {
            "name": "partitionBy",
            "type": "STRING"
        },
        {
            "name": "partitionCount",
            "type": "LONG"
        },
        {
            "name": "rowCount",
            "type": "LONG"
        },
        {
            "name": "diskSize",
            "type": "LONG"
        }
    ],
    "dataset": [
        [
            "holdings_hourly",
            true,
            "DAY",
            "292",
            "227751696",
            "32975724015"
        ]
    ]
}

and partitions (here is a small subset, but assume they are all balanced like this image):

That failure (memoryTag=29) is most likely a failure to expand the OrderedMap. At this point, it is only trying to expand to 67 MB, which isn’t huge.

Have you otherwise got very high memory pressure? Are you running many other queries concurrently?

Does it fail if you limit it to just one day (with a WHERE)? If not, then there must be an issue with the incremental build, which can debug further.

Currently, there are almost no concurrent queries – I’m trying to solve this before rolling it out in production to start receiving reads, so as of right now, the database is pretty much idle.

It receives new delta batches every 5m, all ordered to prevent o3 too, the DB is pretty much chilling lol

Ran:


SELECT 
    timestamp,
    asset,
    user,
    sum(amount_approx) as amount_approx,
sum(amount_part1) AS amount_part1,
sum(amount_part2 ) AS amount_part2,
sum(amount_part3 ) AS amount_part3,
sum(amount_part4 ) AS amount_part4,
sum(amount_part5 ) AS amount_part5,
sum(amount_part6 ) AS amount_part6,
sum(amount_part7 ) AS amount_part7,
sum(amount_part8 ) AS amount_part8
  FROM holdings_hourly
  where timestamp >= '2025-05-06' and timestamp < '2025-05-09'
  SAMPLE BY 1h;

And it works:

3,756,736 rows in 1.62s
Execute: 1.48sNetwork: 133.27msTotal: 1.62s
Count: 29.66μsAuthentication: 345nsCompile: 2.92ms

Wouldn’t the full MV refresh run this in chunks based on the time window, to make the necessary inserts? Idon’t understand why it would be any different to run this for the full range compared to a subset like the query above

Also ran this daily like your suggestion, sucessfully:

3,072,485 rows in 903ms
Execute: 788.67msNetwork: 114.33msTotal: 903ms
SELECT 
    timestamp,
    asset,
    user,
    sum(amount_approx) as amount_approx,
sum(amount_part1) AS amount_part1,
sum(amount_part2 ) AS amount_part2,
sum(amount_part3 ) AS amount_part3,
sum(amount_part4 ) AS amount_part4,
sum(amount_part5 ) AS amount_part5,
sum(amount_part6 ) AS amount_part6,
sum(amount_part7 ) AS amount_part7,
sum(amount_part8 ) AS amount_part8
  FROM holdings_hourly
  where timestamp >= '2025-05-06' and timestamp < '2025-05-09'
  SAMPLE BY 1d;

I don’t think it runs quite like you describe. There is an estimator which can alter how large each batch is.

Bear with, let me look into it a bit!

1 Like

Additionally, the partitions are quite small (100-150 MB). Unless your plan is to drop them one day at at time, increasing them could be sensible. The views can have different sized partitions from the parent table(s).

Another source of failure during queries can be too many open files. More partitions, more columns, more files!

I see! I’ve created daily partitions to try to limit how big the impact would be to re-sum the deltas into holdings for the different users, but I can try partitioning the hourly holdings by month to see the impact, trying now.

Tried to repartition the hourlies by MONTH:

CREATE MATERIALIZED VIEW 'holdings_hourly_2' WITH BASE 'accounting_delta' REFRESH INCREMENTAL AS ( 
SELECT 
    timestamp,
    asset,
    user,
    sum(amount_approx) as amount_approx,
    sum(CASE WHEN is_inbound THEN amount_part1 ELSE -amount_part1 END) AS amount_part1,
    sum(CASE WHEN is_inbound THEN amount_part2 ELSE -amount_part2 END) AS amount_part2,
    sum(CASE WHEN is_inbound THEN amount_part3 ELSE -amount_part3 END) AS amount_part3,
    sum(CASE WHEN is_inbound THEN amount_part4 ELSE -amount_part4 END) AS amount_part4,
    sum(CASE WHEN is_inbound THEN amount_part5 ELSE -amount_part5 END) AS amount_part5,
    sum(CASE WHEN is_inbound THEN amount_part6 ELSE -amount_part6 END) AS amount_part6,
    sum(CASE WHEN is_inbound THEN amount_part7 ELSE -amount_part7 END) AS amount_part7,
    sum(CASE WHEN is_inbound THEN amount_part8 ELSE -amount_part8 END) AS amount_part8
  FROM accounting_delta
  SAMPLE BY 1h
) PARTITION BY MONTH;

Experienced the same issue doing both the MV and :

SELECT 
    timestamp,
    asset,
    user,
    sum(amount_approx) as amount_approx,
sum(amount_part1) AS amount_part1,
sum(amount_part2 ) AS amount_part2,
sum(amount_part3 ) AS amount_part3,
sum(amount_part4 ) AS amount_part4,
sum(amount_part5 ) AS amount_part5,
sum(amount_part6 ) AS amount_part6,
sum(amount_part7 ) AS amount_part7,
sum(amount_part8 ) AS amount_part8
  FROM holdings_hourly_2
  SAMPLE BY 1d;

Hello,

Could you check the refreshMinTs/refreshMaxTs values in the server logs when the materialized view is refreshed and share them here?

2025-05-27T06:34:35.744641Z I i.q.g.e.QueryProgress exe [id=1410, sql=`select sym, last(price) as price, ts from base_price sample by 1h`, principal=admin, cache=false, refreshMinTs=2024-09-10T12:00:00.000000Z, refreshMaxTs=2024-09-10T12:59:59.999999Z, jit=false]

These timestamps stand for the time boundaries: when refreshing a materialized view we split the full time interval into smaller intervals to avoid OOMs. This is based on an estimate controlled by the cairo.mat.view.rows.per.query.estimate config property (10M by default). In your case asset, user grouping keys are likely to have high cardinality, so that the estimation doesn’t do its job. You should try setting cairo.mat.view.rows.per.query.estimate to a smaller value, e.g. 1M or even 500K, and see if it helps.

One more thing: you could also try speeding up the first materialized view’s query by rewriting CASE

sum(CASE WHEN is_inbound THEN amount_part1 ELSE -amount_part1 END) AS amount_part1

into something like

sum((2 * is_inbound::long - 1) * amount_part1) AS amount_part1

Here, 2 * is_inbound::long - 1 produces long 1 or -1 for true and false is_inbound column values respectively.

1 Like

Cool will do this! thanks for the recommendation :flexed_biceps:

Just ran it again to see the logs you’re referencing, seeing this:

2025-05-27 10:36:07.575 EDT
2025-05-27T14:36:07.574301Z E i.q.g.e.g.GroupByMergeShardJob merge shard failed [error=
2025-05-27 10:36:07.575 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=43569652761, RSS_MEM_LIMIT=40587440940, size=1073741824, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.restoreInitialCapacity(OrderedMap.java:347) at io.questdb.cairo.map.OrderedMap.reopen(OrderedMap.java:332) at io.questdb.griffin.engine.table.AsyncGroupByAtom.reopenDestShard(AsyncGroupByAtom.java:535) at io.questdb.griffin.engine.table.AsyncGroupByAtom.mergeShard(AsyncGroupByAtom.java:436) at io.questdb.griffin.engine.groupby.GroupByMergeShardJob.run(GroupByMergeShardJob.java:72) at io.questdb.griffin.engine.groupby.GroupByMergeShardJob.doRun(GroupByMergeShardJob.java:87) at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:50) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-27 10:36:07.575 EDT
]

and the timestamps:

principal=admin, cache=false, refreshMinTs=2025-01-28T00:00:00.000000Z, refreshMaxTs=2025-05-27T23:59:59.999999Z, jit=false]

For this range, this is the total count of the base table (hourlies)

132768162

based on the refreshMinTs/refreshMaxTs I’ve posted, I don’t understand why the range is so big knowing that the base table is partitioned by DAY (holdings_hourly), and that the aggregations that are requested are by SAMPLE 1d – we’re overly estimating the refresh batch size? And the count in that range is over 100M, shouldn’t the range of refreshMinTs/refreshMaxTs reflect what we see in cairo.mat.view.rows.per.query.estimate (10M) as an upper bound to calculate the range?

Thanks again!

The estimate is approximate and based on the following formula:

Math.max(1, (rows * bucketMicros) / (partitionMicros * partitionCount))

Notice that it doesn’t consider per-partition min/max timestamps and assumes that the timestamps are spread evenly across each partition. Due to this, the actual number of aggregated rows may be larger or less than the estimate.

Have you already tried changing the cairo.mat.view.rows.per.query.estimate config property?