INSERT or UPDATE with DEDUP UPSERT

Hi to everyone, I have a problem to solve.

Is there a way to ignore omitted columns in case of update during an insert with dedup upsert?

For example:
CREATE TABLE ‘MyTab’ (
index_time TIMESTAMP,
ColA DOUBLE,
ColB DOUBLE,
ColC DOUBLE,
ColD DOUBLE
) timestamp(index_time) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(index_time);

INSERT INTO MyTab (index_time, ColA, ColB, ColC, ColD)
VALUES
(‘2025-04-09 15:00:00.000’, 100, 101, 102, 103),
(‘2025-04-09 15:00:01.000’, 110, 111, 112, 113),
(‘2025-04-09 15:00:02.000’, 120, 121, 122, 123),

I have this result

2025-04-09T15:00:00.000000Z 100 101 102 103
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

I run this query
INSERT INTO MyTab (index_time, ColA, ColB, ColC)
VALUES (‘2025-04-09 15:00:00.000’, 200, 201, 202);

2025-04-09T15:00:00.000000Z 200 201 202 null
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

I would like to achieve this result
2025-04-09T15:00:00.000000Z 200 201 202 103
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

Thanks.

Currently, DEDUP only supports REPLACE (last-writer-wins). We plan to add SKIP (first-writer-wins) and MERGE-REPLACE (new columns override old ones) and MERGE-SKIP (new columns override only if null).

In the meantime, you would need to use another method to achieve this, like an UPDATE, a RMW, or an event logging table.

1 Like

Thanks, I thought so, but I tried anyway :slightly_smiling_face:

Always worth trying!

In my usage, I split tables into two sets:

a) tables where I want one row constantly kept up to date
b) tables where I want a log of recent states

For a), I used a dummy timestamp which was always 0. Then I would keep the contents of the table cached. I would then bulk upsert to it every 5 mins or so by modifying my cached version.

For b), I just dumped new states into the table and queried for the latest one. This slows down queries in the case where the target row hasn’t had a new state in a while. This can be worked around now by using materialized views, which can be used to automatically maintain a ‘latest’ per day snapshot of the data.

For now I solve it this way:

instead of the normal INSERT
INSERT INTO MyTab (index_time, ColA, ColB, ColC)
VALUES (‘2025-04-09 15:00:00.000’, 200, 201, 202);

I use a sub-query
WITH MyTmp AS (
SELECT
‘2025-04-09 15:00:00.000’ as index_time,
200 as ColA,
201 as ColB,
202 as ColC,
ColD as ColD
FROM MyTab WHERE index_time = ‘2025-04-09 15:00:00.000’
UNION
SELECT
‘2025-04-09 15:00:00.000’ as index_time,
200 as ColA,
201 as ColB,
202 as ColC,
null as ColD
)
INSERT INTO MyTab (index_time, ColA, ColB, ColC, ColD)
select index_time, ColA, ColB, ColC, ColD from MyTmp limit 1

1 Like

This is better, because the ‘null’ value in INSERT is converted to a meaningless number

UPDATE MyTab
SET “ColA”=200, “ColB”=201,“ColC”=202 WHERE “index_time” = ‘2025-04-09 15:00:00.000’;
INSERT INTO MyTab (index_time, “ColA”,“ColB”, “ColC”)
SELECT * from (select ‘2025-04-09 15:00:00.000’, 100, 201, 202)
WHERE NOT ‘2025-04-09 15:00:00.000’ = (SELECT “index_time” FROM MyTab WHERE “index_time” = ‘2025-04-09 15:00:00.000’);