I want to build CNN from scratch only using NumPy in Python. And I going to test it with CIFAR-10. I have used plain NumPy with im2col trick, but computations are slow, ~15 sec for 1 minibatch(size => 64 for per each minibatch) or fully ~4-5 hour for one epoch(1 epoch => 784 minibatches). I have tried to use plain TensorFlow computations (without native conv2d), but couldn't write im2col and col2im tricks. So, now I want to use FFT. For optimizing and getting fast results I decided to use FFT-Covolution between 4D Image with shape
[batch_size, width, height, channels] and 4D filters with shape
[filter_width, filter_height, in_channel, out_channel], and then add 1D bias with shape
[out_channel, ]. Before FFT_Conv2D(4D data), as I understood, I should pad Image, then got out shapes. Then I need to do FFT_Conv2D(image, filter, pad, stride) => IFFT(FFT(image)*FFT(filter)). But this formula is not full for me.
And now I against with some questions about FFT:
- Is there any full formula of FFT-Convolution between 4D data?
- How to use Convolution stride(also called pixel stride) with FFT?
- How to back-propagate with all this?