I'm trying to train a seq2seq model that for every timestep in a given timeseries sample will output 1 of 6 possible labels.
Furthermore, the training data is constructed in such a way that
Each sample can only contain a maximum of 2 types of labels
Label 1 can co-occur with any other label. No other labels can co-occur.
Is there any way to specifically guide training with rules like these, or is my best bet to hope that the model will pick up on this? In roughly 85-90% of cases the model behaves as hoped, but in a few noisy edge-cases it sometimes oscillates wildly between different labels, throughout the sample.