## How to infer which sequence of events are more likely to result in an event of interest?

1

Background

I have a sequence/series of events. Some sequences will end up with an event of interest $$y$$, while others won't.

$$S_1 = [a, b, c, d, ..., y]$$

$$S_2 = [a, b, b, e, ..., a]$$

$$...$$

$$S_n = [a, a, f, m, ..., y]$$

Sequences can be of varying lengths; all sequences are independent of one another; events are linearly spaced within a sequence. Within a sequence, memory of previous events is important (i.e., it's not just the previous event that is important), and the order of events is important, too.

My use-case is e-commerce (i.e., website navigation/browsing, where my event of interest is a transaction being made at the end of the customer journey, $$y$$). I guess this could generalise to many fields: words in a sentence, political events, component failure, personal development, etc.

I think I'm looking for an Association Rules Mining-type approach, but where the order in which items are added to the basket is important. If that makes sense.

Solution 1

Is there some method whereby I can find which sequence of consecutive events are more likely to end up with $$y$$ somewhere down the line? The event of interest, $$y$$ doesn't have to come immediately after the chain of events. For example, if $$[a, b]$$ is important, then I don't mind if the sequence is $$[..., a, b, y]$$ or $$[a, b, ..., y]$$, etc.

Solution 2

To make this even better, is there a method where the important sequence of events don't even have to be 'touching' each other? For example, the $$[a, b, y]$$ sequence might be equivalent to $$[a, ..., b, y]$$ or $$[a, ..., b, ..., y]$$, etc.