Material View & Disk access

Hello, I have created a materialized view and the query performances is great. We were waiting for it and it will help us a lot.

On the other hand, it is writing a lot on the disk and I would like to see if there are any plans or config to improve it.

The request is as follows:

CREATE MATERIALIZED VIEW IF NOT EXISTS 'aggregated_15s' WITH BASE 'object_stat' AS (
  SELECT 
    timestamp, symbol, 
    count(*) as number, 
    sum(weight) as weight, 
    sum(surface) as surface
  FROM (
    SELECT s.timestamp,s.symbol, s.surface, w.weight, w.eId from 'object_stat' as s 
    ASOF JOIN 'weight' as w on symbol where eId in ('default', 'specifiId')
  )
  SAMPLE BY 15s
) PARTITION BY MONTH;

Data is injected in every 5 seconds in the object_stat table… therefore, the last sample of the materialized view is recomputed 2 times, and then a new one is added. This provoke the copy/write of the partition every 5 seconds which is affecting the disk usage.

Could the last sample be written on disk only when it is completed ? and kept in the memory before ? Or any suggestion the reduce the disk usage ?

Thank you

Hey Gabriel,

Thanks for the feedback. We are currently making some changes to improve efficiency of the update. It is already incremental, but still undergoes the dedupe copy-on-write process at the moment.

We have plans to introduce finer control over refreshes i.e. only on demand, on a timer etc.

Your suggestion of only updating when data completes a full bucket is interesting. It also makes sense if you are already unioning your view with the latest raw data, since that bucket would be overwritten.

Some use cases benefit from the last bucket (i.e. views for the latest values), but not all, and it would aid write performance.

I will ask the team if they have any other suggestions in the meantime.