Is there a way to LAG() with VARCHAR data?

:waving_hand: hi there,

I’m trying out QuestDB for the first time, so this testing is in dev (i.e. not sure if the enterprise edition has differences).

One of the things I’m trying to do is find rows in a dataset whose value (in a column) has changed in the last two records.

There’s a sample query in the demo app that looks like this:

WITH trade_and_previous AS (
SELECT timestamp, symbol, side, price,
       LAG(price) OVER(PARTITION BY symbol, side ORDER BY timestamp) as prev_price
FROM trades
WHERE timestamp IN today()
)
select * from trade_and_previous where price <> prev_price;

I basically want to do something very similar, except that instead of price (which I assume is a double), I want to use a VARCHAR column (that has some JSON data).

So, the query would look something like this:

WITH current_and_previous AS (
SELECT timestamp, id, data,
       lag(data) OVER (partition by id order by timestamp) as prev_data
FROM events
WHERE timestamp IN today()
)
select * from current_and_previous where data <> prev_data;

or even better, I’d really like to compare the hash of those two data fields:

WITH current_and_previous AS (
SELECT timestamp, id, data, md5(data) as hashed_data,
       lag(md5(data)) OVER (partition by id order by timestamp) as prev_data
FROM events
WHERE timestamp IN today()
)
select * from current_and_previous where hashed_data <> prev_data;

But anytime I do either of the above, I get an error:

// or the hashed version
inconvertible value: `{'city': 'new york'}` [VARCHAR -> DOUBLE]

// or the hashed version
inconvertible value: `be6cebb64adff1a48a89ac0f8b6c81ed` [VARCHAR -> DOUBLE]

Is LAG() only meant to work with DOUBLEs? Or am I doing something wrong?

Hi,

You are doing nothing wrong here. At the moment the LAG and LEAD functions support only a subset of the datatypes, and varchar is not one of them.

For this particular query, you could fall back to an LT JOIN. It would be something like this

WITH today_events AS (
SELECT timestamp, id, data, md5(data) as hashed_data
FROM events  
WHERE timestamp IN today()
)
SELECT e.timestamp, e.id, e.data, e.hashed_data, prev.data, prev.hashed_data
FROM today_events e LT JOIN today_events prev 
ON (id)
WHERE e.hashed_data <> prev.hashed_data
1 Like

Thanks so much Javier!

I didn’t realize LAG only supported a subset of datatypes (though I guessed from the error). But part of me thought I must’ve missed something. So, I really appreciate you clarifying that.

And I think I was so focused on LAG that I didn’t discover LT JOIN. Pretty cool!

And I see now looking at the docs there’s a note on this:

As seen on this example, LT join is often useful to join a table to itself in order to get preceding values for every row.

Thanks again! That’ll work well enough for me.

1 Like