During prediction (not training), is it normal to get different loss for different batch size? Worst case happens when I use batch_size=1 for test dataset. The prediction performance get pretty bad. Performance get better, if I increase the batch size. During training, both training and validation batch size was 32.
Model consist of Convolutional Neural Network and input is word embeddings.
My thought: In my opinion, only thing change with different batch is the padding. For batch size=1, there will be no padding. Since, I used zero for padding, the weights (w) of the model will have no effect because of zero but the bias (b) will be added. Do you think, this can change (moderately) the prediction and therefore the calculated loss?