INSERT or UPDATE with DEDUP UPSERT

BepTheWolf · April 9, 2025, 10:43pm

Hi to everyone, I have a problem to solve.

Is there a way to ignore omitted columns in case of update during an insert with dedup upsert?

For example:
CREATE TABLE ‘MyTab’ (
index_time TIMESTAMP,
ColA DOUBLE,
ColB DOUBLE,
ColC DOUBLE,
ColD DOUBLE
) timestamp(index_time) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(index_time);

INSERT INTO MyTab (index_time, ColA, ColB, ColC, ColD)
VALUES
(‘2025-04-09 15:00:00.000’, 100, 101, 102, 103),
(‘2025-04-09 15:00:01.000’, 110, 111, 112, 113),
(‘2025-04-09 15:00:02.000’, 120, 121, 122, 123),

I have this result

2025-04-09T15:00:00.000000Z 100 101 102 103
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

I run this query
INSERT INTO MyTab (index_time, ColA, ColB, ColC)
VALUES (‘2025-04-09 15:00:00.000’, 200, 201, 202);

2025-04-09T15:00:00.000000Z 200 201 202 null
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

I would like to achieve this result
2025-04-09T15:00:00.000000Z 200 201 202 103
2025-04-09T15:00:01.000000Z 110 111 112 113
2025-04-09T15:00:02.000000Z 120 121 122 123

Thanks.

nwoolmer · April 9, 2025, 10:47pm

Currently, DEDUP only supports REPLACE (last-writer-wins). We plan to add SKIP (first-writer-wins) and MERGE-REPLACE (new columns override old ones) and MERGE-SKIP (new columns override only if null).

In the meantime, you would need to use another method to achieve this, like an UPDATE, a RMW, or an event logging table.

BepTheWolf · April 10, 2025, 9:05am

Thanks, I thought so, but I tried anyway

nwoolmer · April 10, 2025, 9:07am

Always worth trying!

In my usage, I split tables into two sets:

a) tables where I want one row constantly kept up to date
b) tables where I want a log of recent states

For a), I used a dummy timestamp which was always 0. Then I would keep the contents of the table cached. I would then bulk upsert to it every 5 mins or so by modifying my cached version.

For b), I just dumped new states into the table and queried for the latest one. This slows down queries in the case where the target row hasn’t had a new state in a while. This can be worked around now by using materialized views, which can be used to automatically maintain a ‘latest’ per day snapshot of the data.

BepTheWolf · April 10, 2025, 1:22pm

For now I solve it this way:

instead of the normal INSERT
INSERT INTO MyTab (index_time, ColA, ColB, ColC)
VALUES (‘2025-04-09 15:00:00.000’, 200, 201, 202);

I use a sub-query
WITH MyTmp AS (
SELECT
‘2025-04-09 15:00:00.000’ as index_time,
200 as ColA,
201 as ColB,
202 as ColC,
ColD as ColD
FROM MyTab WHERE index_time = ‘2025-04-09 15:00:00.000’
UNION
SELECT
‘2025-04-09 15:00:00.000’ as index_time,
200 as ColA,
201 as ColB,
202 as ColC,
null as ColD
)
INSERT INTO MyTab (index_time, ColA, ColB, ColC, ColD)
select index_time, ColA, ColB, ColC, ColD from MyTmp limit 1

BepTheWolf · April 11, 2025, 4:00pm

This is better, because the ‘null’ value in INSERT is converted to a meaningless number

UPDATE MyTab
SET “ColA”=200, “ColB”=201,“ColC”=202 WHERE “index_time” = ‘2025-04-09 15:00:00.000’;
INSERT INTO MyTab (index_time, “ColA”,“ColB”, “ColC”)
SELECT * from (select ‘2025-04-09 15:00:00.000’, 100, 201, 202)
WHERE NOT ‘2025-04-09 15:00:00.000’ = (SELECT “index_time” FROM MyTab WHERE “index_time” = ‘2025-04-09 15:00:00.000’);

Topic		Replies	Views
Null, -2147483648 or what am I doing wrong? Community	5	49	April 14, 2025
Using DATEADD() in SELECT column list causes later columns to be ignored Community	5	27	November 25, 2024
Can QuestDB automatically track/timestamp when an existing row is modified Community question	2	70	September 13, 2024
Programatically getting information about table DEDUP Community question	1	25	August 2, 2024
New rows not ingested into existing table Community	13	373	July 12, 2024

INSERT or UPDATE with DEDUP UPSERT

Related topics