The point is to have more resolution than the typical final output so that there is less rounding error. 96khz is chosen because it is exactly twice 48khz which is the standard audio sampling rate for video. This way, you can cut the audio from 96khz to 48khz by cutting the number of samples in half, so there aren't aliasing problems. Similarly, 24 bit audio gives 8 bits more precision than 16. It's the minimum increase that makes sense since you want the samples to be a set number of bytes.
This allows for 256 times the resolution in terms of intensity and twice the temporal resolution while editing and limits rounding error so that the final output is higher quality when it is mixed down to 48khz or 44.1 khz and 16 bit.
If the soundcard is only going to be playing 16 bit, 48khz sources, then there isn't any gain by supporting the additional bits and samples as the device on the other end could accomplish the same by simply sampling the audio at the desired rate and bit depth since the information is already lost, but it makes a lot of sense when using higher quality inputs.