I reduced the original Imagenet dataset to 1024 images of 1 category -- but setted 1000 categories to classify on the networks.
So I trained the CNNs, varying processing unit (CPU/GPU) and batches sizes, and I observed that the losses converges fastly to near zero (in mostly times before 1 epoch be completed), like in this graph (Alexnet on Tensorflow):
In portuguese, 'Épocas' is epochs and 'Perda' is loss.
The weight decays and initial learning rate are the same as used on models that I downloaded, I only changed the dataset and the batch sizes.
Why my networks are converging this way, and not like this way?