I have a sequence/series of events. Some sequences will end up with an event of interest $y$, while others won't.
$$S_1 = [a, b, c, d, ..., y]$$
$$S_2 = [a, b, b, e, ..., a]$$
$$S_n = [a, a, f, m, ..., y]$$
Sequences can be of varying lengths; all sequences are independent of one another; events are linearly spaced within a sequence. Within a sequence, memory of previous events is important (i.e., it's not just the previous event that is important), and the order of events is important, too.
My use-case is e-commerce (i.e., website navigation/browsing, where my event of interest is a transaction being made at the end of the customer journey, $y$). I guess this could generalise to many fields: words in a sentence, political events, component failure, personal development, etc.
I think I'm looking for an Association Rules Mining-type approach, but where the order in which items are added to the basket is important. If that makes sense.
Is there some method whereby I can find which sequence of consecutive events are more likely to end up with $y$ somewhere down the line? The event of interest, $y$ doesn't have to come immediately after the chain of events. For example, if $[a, b]$ is important, then I don't mind if the sequence is $[..., a, b, y]$ or $[a, b, ..., y]$, etc.
To make this even better, is there a method where the important sequence of events don't even have to be 'touching' each other? For example, the $[a, b, y]$ sequence might be equivalent to $[a, ..., b, y]$ or $[a, ..., b, ..., y]$, etc.