Using a materialized view as another materialized view as base timeout / OOM

JForsaken · May 29, 2025, 7:01am

Just tried dropping cairo.mat.view.rows.per.query.estimate to 500k like you’ve suggested,
confimed it is applied through (SHOW PARAMETERS) WHERE reloadable = true;, having 500k as a value.

Was able to create the holdings_hourly view using the base table (accounting_delta)
Was able to create the daily view using the holdings_hourly as base
But with this same setting, getting an OOM doing holdings_monthly using the holdings_daily as my base

Here is the MV creation DDL and the logs:

2025-05-28 20:36:52.573 EDT
2025-05-29T00:36:52.572163Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-28 20:36:52.573 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-28 20:36:52.573 EDT
, id=2, taskType=1, frameIndex=2094, frameCount=2159]
2025-05-28 20:36:52.622 EDT
2025-05-29T00:36:52.620794Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-28 20:36:52.622 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095511, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-28 20:36:52.622 EDT
, id=2, taskType=1, frameIndex=2096, frameCount=2159]
2025-05-28 20:36:54.782 EDT
2025-05-29T00:36:54.781619Z E i.q.g.e.QueryProgress err [id=8, sql=`
2025-05-28 20:36:54.782 EDT
SELECT
2025-05-28 20:36:54.782 EDT
timestamp,
2025-05-28 20:36:54.782 EDT
asset,
2025-05-28 20:36:54.782 EDT
user,
2025-05-28 20:36:54.782 EDT
sum(amount_approx) as amount_approx,
2025-05-28 20:36:54.782 EDT
sum(amount_part1) AS amount_part1,
2025-05-28 20:36:54.782 EDT
sum(amount_part2) AS amount_part2,
2025-05-28 20:36:54.782 EDT
sum(amount_part3) AS amount_part3,
2025-05-28 20:36:54.782 EDT
sum(amount_part4) AS amount_part4,
2025-05-28 20:36:54.782 EDT
sum(amount_part5) AS amount_part5,
2025-05-28 20:36:54.782 EDT
sum(amount_part6) AS amount_part6,
2025-05-28 20:36:54.782 EDT
sum(amount_part7) AS amount_part7,
2025-05-28 20:36:54.782 EDT
sum(amount_part8) AS amount_part8
2025-05-28 20:36:54.782 EDT
FROM holdings_daily
2025-05-28 20:36:54.782 EDT
SAMPLE BY 1M
2025-05-28 20:36:54.782 EDT
`, principal=admin, cache=false, refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z, jit=true, time=31096398203, msg=global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29], errno=-1, pos=0]
2025-05-28 20:36:54.782 EDT
2025-05-29T00:36:54.782159Z E i.q.c.m.MatViewRefreshJob could not refresh materialized view [view=holdings_monthly~507, ex=
2025-05-28 20:36:54.782 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40557095895, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29]
	at io.questdb.cairo.CairoException.instance(CairoException.java:370)
	at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:198)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246)
	at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138)
	at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179)
	at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96)
	at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419)
	at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472)
	at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921)
	at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862)
	at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403)
	at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645)
	at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120)
	at io.questdb.mp.Worker.run(Worker.java:152)

Any ideas to make the initial refresh job use more appropriate chunks? If not, is it possible for me to create the materialize view in INCREMENTAL mode, without it doing the FULL refresh upon creation, so that i can use RANGE to manually refresh it month by month initially?

Thanks again!

JForsaken · May 29, 2025, 7:01am

If this helps, this is the output for table_partitions() of holdings_daily , the base table for the holdings_monthly from the above – I’m just giving you context since since these are trades, the first days / months are pretty empty, but after ~100 days its really balanced at > 1M rows of aggregated 1d user/asset pairs a day, to see if the starting empty-ish partitions contribute to an eager estimation, even when I have cairo.mat.view.rows.per.query.estimate set to 500000:

puzpuzpuz · May 29, 2025, 7:25am

refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z

The queried range is still very large. Could you try setting cairo.mat.view.rows.per.query.estimate to a smaller value like 10k and then check if these queried intervals get smaller? Since the “density” of the partitions varies a lot, the estimation ends up being too optimistic.

JForsaken · May 29, 2025, 9:50pm

Tried just now at 10k, same RSS OOM issue!

JForsaken · May 29, 2025, 9:50pm

The window didn’t change when changing to 10k (made sure the settings have been reloaded and applied)

2025-05-29 10:27:04.411 EDT
2025-05-29T14:27:04.410298Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-29 10:27:04.411 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548379804, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:106) at io.questdb.cairo.sql.async.PageFrameSequence.stealWork(PageFrameSequence.java:574) at io.questdb.cairo.sql.async.PageFrameSequence.dispatch(PageFrameSequence.java:526) at io.questdb.cairo.sql.async.PageFrameSequence.next(PageFrameSequence.java:345) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:187) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96) at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419) at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862) at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403) at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645) at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:04.411 EDT
, id=181, taskType=1, frameIndex=2092, frameCount=2166]
2025-05-29 10:27:04.413 EDT
2025-05-29T14:27:04.413491Z E i.q.c.s.a.PageFrameReduceJob reduce error [error=
2025-05-29 10:27:04.413 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:352) at io.questdb.std.Unsafe.realloc(Unsafe.java:280) at io.questdb.cairo.map.OrderedMap.resize(OrderedMap.java:591) at io.questdb.cairo.map.OrderedMap$Key.checkCapacity(OrderedMap.java:868) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:649) at io.questdb.cairo.map.OrderedMap$FixedSizeKey.init(OrderedMap.java:621) at io.questdb.cairo.map.OrderedMap.withKey(OrderedMap.java:388) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregateSharded(AsyncGroupByRecordCursorFactory.java:359) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursorFactory.aggregate(AsyncGroupByRecordCursorFactory.java:242) at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:241) at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:189) at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:150) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:04.413 EDT
, id=181, taskType=1, frameIndex=2091, frameCount=2166]
2025-05-29 10:27:06.529 EDT
2025-05-29T14:27:06.529508Z E i.q.g.e.QueryProgress err [id=5702, sql=`
2025-05-29 10:27:06.529 EDT
SELECT
2025-05-29 10:27:06.529 EDT
timestamp,
2025-05-29 10:27:06.529 EDT
asset,
2025-05-29 10:27:06.529 EDT
user,
2025-05-29 10:27:06.529 EDT
sum(amount_approx) as amount_approx,
2025-05-29 10:27:06.529 EDT
sum(amount_part1) AS amount_part1,
2025-05-29 10:27:06.529 EDT
sum(amount_part2) AS amount_part2,
2025-05-29 10:27:06.529 EDT
sum(amount_part3) AS amount_part3,
2025-05-29 10:27:06.529 EDT
sum(amount_part4) AS amount_part4,
2025-05-29 10:27:06.529 EDT
sum(amount_part5) AS amount_part5,
2025-05-29 10:27:06.529 EDT
sum(amount_part6) AS amount_part6,
2025-05-29 10:27:06.529 EDT
sum(amount_part7) AS amount_part7,
2025-05-29 10:27:06.529 EDT
sum(amount_part8) AS amount_part8
2025-05-29 10:27:06.529 EDT
FROM holdings_daily
2025-05-29 10:27:06.529 EDT
SAMPLE BY 1M
2025-05-29 10:27:06.529 EDT
`, principal=admin, cache=false, refreshMinTs=2024-07-01T00:00:00.000000Z, refreshMaxTs=2025-05-31T23:59:59.999999Z, jit=true, time=28941259832, msg=global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29], errno=-1, pos=0]
2025-05-29 10:27:06.529 EDT
2025-05-29T14:27:06.529765Z E i.q.c.m.MatViewRefreshJob could not refresh materialized view [view=holdings_monthly~509, ex=
2025-05-29 10:27:06.529 EDT
io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=40548381724, RSS_MEM_LIMIT=40587440940, size=67108864, memoryTag=29] at io.questdb.cairo.CairoException.instance(CairoException.java:370) at io.questdb.cairo.CairoException.nonCritical(CairoException.java:128) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMap(AsyncGroupByRecordCursor.java:198) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.buildMapConditionally(AsyncGroupByRecordCursor.java:246) at io.questdb.griffin.engine.table.AsyncGroupByRecordCursor.hasNext(AsyncGroupByRecordCursor.java:138) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.sortValues(LongSortedLightRecordCursor.java:179) at io.questdb.griffin.engine.orderby.LongSortedLightRecordCursor.hasNext(LongSortedLightRecordCursor.java:96) at io.questdb.griffin.engine.QueryProgress$RegisteredRecordCursor.hasNext(QueryProgress.java:419) at io.questdb.cairo.mv.MatViewRefreshJob.insertAsSelect(MatViewRefreshJob.java:472) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental0(MatViewRefreshJob.java:921) at io.questdb.cairo.mv.MatViewRefreshJob.refreshIncremental(MatViewRefreshJob.java:862) at io.questdb.cairo.mv.MatViewRefreshJob.incrementalRefresh(MatViewRefreshJob.java:403) at io.questdb.cairo.mv.MatViewRefreshJob.processNotifications(MatViewRefreshJob.java:645) at io.questdb.cairo.mv.MatViewRefreshJob.run(MatViewRefreshJob.java:120) at io.questdb.mp.Worker.run(Worker.java:152)
2025-05-29 10:27:06.530 EDT
]

puzpuzpuz · May 30, 2025, 6:09am

The window didn’t change when changing to 10k (made sure the settings have been reloaded and applied)

This is very puzzling. I’ll try reproducing your partition layout locally to see what’s wrong.

In the meantime, could you try one more thing? Namely, could you disable parallel SQL execution in materialized view refresh? cairo.mat.view.parallel.sql.enabled=false - with this setting the query will run on a single thread, so that we properly handle OOM errors if they occur and retry with a smaller interval. Note: technically this should also work by default, in case of parallel SQL, but there is a bug around propagating OOM error between worker and main threads - we’ll fix it in the upcoming release.

JForsaken · June 2, 2025, 7:38pm

This worked! I was able to successfully run the MV sequence of deltas => hourly => daily => monthly successfully, having 500k as the estimate and cairo.mat.view.parallel.sql.enabled=false if this helps debug! Will monitor this thread to see when we can retry with parallelism, thanks again!

select * from table_storage() where tableName ilike '%mv%';

tableName	walEnabled	partitionBy	partitionCount	rowCount	diskSize
mv_month	true	MONTH	12	71837449	10512077049
mv_day	true	DAY	297	203203070	29512835061
mv_year	true	YEAR	2	33158206	7132666053
mv_hourly	true	DAY	298	237809285	34481490219

nwoolmer · June 2, 2025, 7:41pm

Glad it is working!

The fix for the parallel OOM execution was merged: fix(sql): unreliable out-of-memory error propagation in parallel SQL by puzpuzpuz · Pull Request #5711 · questdb/questdb · GitHub

When the next version is released, you could revert that config and give it another go, just to confirm that the problem is fixed for next time.

JForsaken · June 3, 2025, 6:42am

nice thats was fast lol Will be looking forward to bump to the next version when its out to try it out with parallelism!

nwoolmer · June 3, 2025, 6:42am

It was, thanks to @puzpuzpuz !

Topic		Replies	Views
Material View & Disk access Community	1	89	March 31, 2025
bitemporal data and questdb performance Community	1	24	February 17, 2026
Avoid query timeouts after ILP ingestion Community question	6	202	November 20, 2024
SELECT from multiple timeseries tables with the same query Community sql , question	17	578	December 31, 2024
Enabling jit on sql queries Community sql	21	237	March 4, 2025

Using a materialized view as another materialized view as base timeout / OOM

Related topics