How do I handle such large image sizes without downsampling?
I assume that by downsampling you mean scaling down the input before passing it into CNN. Convolutional layer allows to downsample the image within a network, by picking a large stride, which is going to save resources for the next layers. In fact, that's what it has to do, otherwise your model won't fit in GPU.
- Are there any techniques to handle such large images which are to be trained?
Commonly researches scale the images to a resonable size. But if that's not an option for you, you'll need to restrict your CNN. In addition to downsampling in early layers, I would recommend you to get rid of FC layer (which normally takes most of parameters) in favor of convolutional layer. Also you will have to stream your data in each epoch, because it won't fit into your GPU.
Note that none of this will prevent heavy computational load in the early layers, exactly because the input is so large: convolution is an expensive operation and the first layers will perform a lot of them in each forward and backward pass. In short, training will be slow.
- What batch size is reasonable to use?
Here's another problem. A single image takes
2400x2400x3x4 (3 channels and 4 bytes per pixel) which is ~70Mb, so you can hardly afford even a batch size 10. More realistically would be 5. Note that most of the memory will be taken by CNN parameters. I think in this case it makes sense reduce the size by using 16-bit values rather than 32-bit - this way you'll be able to double the batches.
- Are there any precautions to take, or any increase and decrease in hardware resources that I can do?
Your bottleneck is GPU memory. If you can afford another GPU, get it and split the network across them. Everything else is insignificant compared to GPU memory.