Game developers use a software framework to manage dynamically rendered sound effects (dynamically meaning the sound designer cannot know in advance where this sound is supposed to come from or at which intensity as it depends on the way the game is played). This software framework is often called the audio engine.
This software is able to determine which sound effect shall be played and what are it's characteristic in relation to the player's position and orientation.
A NPC walking will trigger footsteps coming from where this NPC is relatively to the player's position, with an intensity and effects (reverberation, attenuation, filtering, ...) depending on the distance to the player and the environment.
Once the software knows where from and with which effects a given localized sound effect must be rendered, the mix engine must compute how this is to be rendered in a given listening environment (stereo loudspeakers, 5.1 loudspeakers, headphones).
In the case of headphones, it can use binaural rendering to position the effect in a 3D space for the headphone listener. It uses a model called Head Related Transfer Function (HRTF) that gives what filtering to apply to simulate a given 3D position in the headset listening. This is usually an approximation as each human head has it's own HRTF depending on the head size, shape of the ears and other parameters. But an average HRTF can give some results satisfying most listeners.