Choosing a model for input: categorised, weighted sequence, output: binary variable

1

1

What would be an appropriate model for predicting a binary target variable, given a weighted sequence?

Sequences will be reasonably short, typically between ~ 1 and 5 elements.

Illustrated example

Say I have the following categories: A, B, C, D.

Each category can have a weight between zero and one.

Example sequences:

  • A (0.33), B (0.71), C (0.0), D (0.95) => 1
  • C (0.21), A (0.67) => 0
  • B (0), D (1) => 1

Andy

Posted 2020-12-12T13:39:30.037

Reputation: 23

Answers

1

One option is a model comparison approach. Start with simpler models then progressively trying more complex models. Along the way, checking if the additional complexity is resulting in increased predictive ability.

  1. If you take a simplifying representation with a bag-of-words model, you can fit a Naive Bayes classifier.

  2. If you make the Markov assumption, you'll only need to look at the last element in the sequence. Then it can be modeled as a simple Probabilistic Graphical Model (PGM).

  3. Then start relaxing the Markov assumption by looking back at progressively more time steps.

  4. Recurrent Neural Network (RNN) might work if you have thousands of labeled data points and the relationship is very complex.

  5. Long Short Term Memory (LSTM) does not make much sense because the sequences are so short. LSTMs work better RNNs when the relevant information could possible be many time steps back.

My hypothesis is that a Naive Bayes classifier will be a good-enough model because there are only 4 features and the sequences are so short they can be ignored.

Brian Spiering

Posted 2020-12-12T13:39:30.037

Reputation: 10 864

0

I think they are plenty valid approaches.

I would first test a simple dense neural network architecture.
You could have 2 * number_of_sequences input neurons, each pair being one neuron for the weight and another neuron for the sequence presence (0: not present and 1: present).
Hidden layers depending on the complexity.
Output 0 or 1 using a sigmoid.

etiennedm

Posted 2020-12-12T13:39:30.037

Reputation: 1 005