Can anyone share some window function magic on uniquely identifying groups of sequential values, where I might have some repeated values later in the set?
For instance, given the events below with grouping values X
and Y
with a natural order order
, how do I group them such that every time X
or Y
changes, naturally ordered, I get a new group identifier?
Example input:
X | Y | data | order
--------------------
A | B | 1 | 1
A | B | 2 | 2
A | C | 1 | 3
B | C | 3 | 4
A | B | 2 | 5
A | C | 1 | 6
B | C | 3 | 7
Desired output:
X | Y | data | order | group
-----------------------------
A | B | 1 | 1 | 1
A | B | 2 | 2 | 1
A | C | 1 | 3 | 2
B | C | 3 | 4 | 3
A | B | 2 | 2 | 4
A | C | 1 | 6 | 5
B | C | 3 | 7 | 6
I can tell when X
or Y
changes with lag
, and I can create group identifiers without repeats with rank() over (order by X, Y)
, but I can’t quite figure out how to do both at once.
Thanks!