In state-of-the-art papers. researcher managed to achieve 90-100% accuracy on gender classification on just human faces, so both method may work just fine, with minor improvements maybe using the first method. However using the image of the entire body enhances overfitting as there is more input features and increases overfitting. For a very large dataset it might be better to use the first method as chances of overfitting are low while for a smaller one you should probably go for the second method or even just the face as that won't make such a huge differnce.
A better way to input the data to the network is through landmarks of a body.
Inputting the landmarks of the body helps reduce the features to the minimal and also keeps only the relevant features. The main difference in body between women and men is perhaps the torso and shoulder width, which you can get from body landmark detection. You can input both the image of the face and the body landmark as input to the network, and it would probably increase in accuracy. However this solution is not an one stage method, meaning it requires an extra model to predict the landmark.
Hope I can help you.