From experience on using NN on tabular data, having too much variable doesn't seem to directly hurt statistical performance that much. However it has a lot of impact on memory usage, calculation time and explainability of the model. Reducing memory usage and calculation time allows to calibrate more models (more random initialisations) and build better ensembles. In turn that allows for slightly better performance, and more importantly for models that are more stable (i.e. performance don't depend on random initialisation). Depending on the application and who is going to use the model (the data scientist or someone operationnal), explainability might be the main driver for feature selection. (Model stability often imply explainability stability too).
Outside of careful Exploratory Data Analysis / a priori expert based selection, the most practical approach for variable selection in NN is to add regularisation to your network calibration process. Namely, the $L1$ penalisation, by tending to reduce weights to 0 would act as feature selection. It might require to do some hyper-parameter tuning (calibrate multiple NN and see which value is better). The parallel use of others regularisation techniques like drop-outs, generaly help application of weight regularisation and allow for sturdier models.
There seems to be some ongoing work on pruning (removing connections / neurons) that seems to work similarly and achieve good results. Intuitively it should work better as it will adapt the NN architecture. Not sure those techniques are implemented in any popular library.
Another approach is to work a posteriori. With some feature importance you can remove variables that weren't useful overall. You might even do that iteratively... but this require a lot of time and work.
To be honest, those approaches seems to work to remove some weigths / non informative variables locally, but I am not sure there is a garanty that they would perfectly remove a duplication of a meaningfull feature like a tree technique would by selecting one of them. Regarding the question of duplicated meaningfull feature I tried to do some work on a posteriori importance to check If I could find them by looking at correlated importance, but got nothing really pratical / generalisable to linear dependance between more than 2 variables. So the real answer to your question might be a torough multivariate EDA to remove variables that are too correlated...
For a general solution there seems to be some ongoing work on adding variable selection gates before the main model (see here for example: Feature Selection using Stochastic Gates), but I haven't had the occasion to test something like this yet.