Using a materialized view as another materialized view as base timeout / OOM

Just tried dropping cairo.mat.view.rows.per.query.estimate to 500k like you’ve suggested,
confimed it is applied through (SHOW PARAMETERS) WHERE reloadable = true;, having 500k as a value.

Was able to create the holdings_hourly view using the base table (accounting_delta) :+1:
Was able to create the daily view using the holdings_hourly as base :+1:
But with this same setting, getting an OOM doing holdings_monthly using the holdings_daily as my base :cross_mark:

Here is the MV creation DDL and the logs:

2025-05-28 20:36:52.573 EDT
2025-05-29T00:36:52.572163Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-28 20:36:52.573 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-28 20:36:52.573 EDT
, id=2, taskType=1, frameIndex=2094, frameCount=2159]
2025-05-28 20:36:52.622 EDT
2025-05-29T00:36:52.620794Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-28 20:36:52.622 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095511, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-28 20:36:52.622 EDT
, id=2, taskType=1, frameIndex=2096, frameCount=2159]
2025-05-28 20:36:54.782 EDT
2025-05-29T00:36:54.781619Z E i.q.g.e.QueryProgress err [id=8, sql=`
2025-05-28 20:36:54.782 EDT
SELECT
2025-05-28 20:36:54.782 EDT
timestamp,
2025-05-28 20:36:54.782 EDT
asset,
2025-05-28 20:36:54.782 EDT
user,
2025-05-28 20:36:54.782 EDT
sum(amount_approx) as amount_approx,
2025-05-28 20:36:54.782 EDT
sum(amount_part1) AS amount_part1,
2025-05-28 20:36:54.782 EDT
sum(amount_part2) AS amount_part2,
2025-05-28 20:36:54.782 EDT
sum(amount_part3) AS amount_part3,
2025-05-28 20:36:54.782 EDT
sum(amount_part4) AS amount_part4,
2025-05-28 20:36:54.782 EDT
sum(amount_part5) AS amount_part5,
2025-05-28 20:36:54.782 EDT
sum(amount_part6) AS amount_part6,
2025-05-28 20:36:54.782 EDT
sum(amount_part7) AS amount_part7,
2025-05-28 20:36:54.782 EDT
sum(amount_part8) AS amount_part8
2025-05-28 20:36:54.782 EDT
FROM holdings_daily
2025-05-28 20:36:54.782 EDT
SAMPLE BY 1M
2025-05-28 20:36:54.782 EDT
`, principal=admin, cache=false, refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z, jit=true, time=31096398203, msg=global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29], errno=-1, pos=0]
2025-05-28 20:36:54.782 EDT
2025-05-29T00:36:54.782159Z E i.q.c.m.MatViewRefreshJob could not refresh materialized view [view=holdings_monthly~507, ex=
2025-05-28 20:36:54.782 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29]
	at io.questdb.cairo.CairoException.instance(CairoException.java:370)
	at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:198)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138)
	at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179)
	at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96)
	at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419)
	at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472)
	at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921)
	at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862)
	at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403)
	at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645)
	at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120)
	at io.questdb.mp.Worker.run(Worker.java:152)

Any ideas to make the initial refresh job use more appropriate chunks? If not, is it possible for me to create the materialize view in INCREMENTAL mode, without it doing the FULL refresh upon creation, so that i can use RANGE to manually refresh it month by month initially?

Thanks again!

If this helps, this is the output for table_partitions() of holdings_daily , the base table for the holdings_monthly from the above – I’m just giving you context since since these are trades, the first days / months are pretty empty, but after ~100 days its really balanced at > 1M rows of aggregated 1d user/asset pairs a day, to see if the starting empty-ish partitions contribute to an eager estimation, even when I have cairo.mat.view.rows.per.query.estimate set to 500000:









refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z

The queried range is still very large. Could you try setting cairo.mat.view.rows.per.query.estimate to a smaller value like 10k and then check if these queried intervals get smaller? Since the “density” of the partitions varies a lot, the estimation ends up being too optimistic.

Tried just now at 10k, same RSS OOM issue!

The window didn’t change when changing to 10k (made sure the settings have been reloaded and applied)

2025-05-29 10:27:04.411 EDT
2025-05-29T14:27:04.410298Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-29 10:27:04.411 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548379804, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:106) at io.questdb.cairo.sql.async.PageFrameSequence.stealWork(PageFrameSequence.java:574) at io.questdb.cairo.sql.async.PageFrameSequence.dispatch(PageFrameSequence.java:526) at io.questdb.cairo.sql.async.PageFrameSequence.next(PageFrameSequence.java:345) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:187) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96) at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419) at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862) at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403) at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645) at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:04.411 EDT
, id=181, taskType=1, frameIndex=2092, frameCount=2166]
2025-05-29 10:27:04.413 EDT
2025-05-29T14:27:04.413491Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-29 10:27:04.413 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:04.413 EDT
, id=181, taskType=1, frameIndex=2091, frameCount=2166]
2025-05-29 10:27:06.529 EDT
2025-05-29T14:27:06.529508Z E i.q.g.e.QueryProgress err [id=5702, sql=`
2025-05-29 10:27:06.529 EDT
SELECT
2025-05-29 10:27:06.529 EDT
timestamp,
2025-05-29 10:27:06.529 EDT
asset,
2025-05-29 10:27:06.529 EDT
user,
2025-05-29 10:27:06.529 EDT
sum(amount_approx) as amount_approx,
2025-05-29 10:27:06.529 EDT
sum(amount_part1) AS amount_part1,
2025-05-29 10:27:06.529 EDT
sum(amount_part2) AS amount_part2,
2025-05-29 10:27:06.529 EDT
sum(amount_part3) AS amount_part3,
2025-05-29 10:27:06.529 EDT
sum(amount_part4) AS amount_part4,
2025-05-29 10:27:06.529 EDT
sum(amount_part5) AS amount_part5,
2025-05-29 10:27:06.529 EDT
sum(amount_part6) AS amount_part6,
2025-05-29 10:27:06.529 EDT
sum(amount_part7) AS amount_part7,
2025-05-29 10:27:06.529 EDT
sum(amount_part8) AS amount_part8
2025-05-29 10:27:06.529 EDT
FROM holdings_daily
2025-05-29 10:27:06.529 EDT
SAMPLE BY 1M
2025-05-29 10:27:06.529 EDT
`, principal=admin, cache=false, refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z, jit=true, time=28941259832, msg=global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29], errno=-1, pos=0]
2025-05-29 10:27:06.529 EDT
2025-05-29T14:27:06.529765Z E i.q.c.m.MatViewRefreshJob could not refresh materialized view [view=holdings_monthly~509, ex=
2025-05-29 10:27:06.529 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:198) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96) at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419) at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862) at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403) at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645) at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:06.530 EDT
]

The window didn’t change when changing to 10k (made sure the settings have been reloaded and applied)

This is very puzzling. I’ll try reproducing your partition layout locally to see what’s wrong.

In the meantime, could you try one more thing? Namely, could you disable parallel SQL execution in materialized view refresh? cairo.mat.view.parallel.sql.enabled=false - with this setting the query will run on a single thread, so that we properly handle OOM errors if they occur and retry with a smaller interval. Note: technically this should also work by default, in case of parallel SQL, but there is a bug around propagating OOM error between worker and main threads - we’ll fix it in the upcoming release.

1 Like

This worked! I was able to successfully run the MV sequence of deltas => hourly => daily => monthly successfully, having 500k as the estimate and cairo.mat.view.parallel.sql.enabled=false if this helps debug! Will monitor this thread to see when we can retry with parallelism, thanks again!

select * from table_storage() where tableName ilike '%mv%';

tableName walEnabled partitionBy partitionCount rowCount diskSize
mv_month true MONTH 12 71837449 10512077049
mv_day true DAY 297 203203070 29512835061
mv_year true YEAR 2 33158206 7132666053
mv_hourly true DAY 298 237809285 34481490219
1 Like

Glad it is working!

The fix for the parallel OOM execution was merged: fix(sql): unreliable out-of-memory error propagation in parallel SQL by puzpuzpuz · Pull Request #5711 · questdb/questdb · GitHub

When the next version is released, you could revert that config and give it another go, just to confirm that the problem is fixed for next time.

nice thats was fast lol :mechanical_arm: Will be looking forward to bump to the next version when its out to try it out with parallelism!

1 Like

It was, thanks to @puzpuzpuz :folded_hands:!