If you can keep adding new data (based on a main concept such as area i.e. the ZIP code) **and** the performance of your model improves, then it is of course allowed... assuming you only care about the final result.

There are metrics that will try to guide you with this, such as the Akaike Information Criterion (AIC) or the comparable Bayesian Information Criterion (BIC). These essentially help to pick a model based on its performance, being punished for all additional parameters that are introduced and that must be estimated. The AIC looks like this:

$${\displaystyle \mathrm {AIC} =2k-2\ln({\hat {L}})}$$

where $k$ is the number of parameters to be estimated, i.e. number of features you apply, because each one will have one coefficient in your logistic regression. $\hat{L}$ is the maximum value of the Maximum Likelihood (equivalent to the optimal score). BIC simply uses $k$ slightly differently to punish models.

These criteria can help tell you when to stop, as you can try models with more and more parameters, and simply take the model which has the best AIC or BIC value.

If you still have other features in the model, which are not related to the ZIP, they could potentially become overwhelmed - that depends on the model you use. However, they may also explain things about the dataset which simply cannot be contained in the ZIP information, such as a house's floor area (assuming this is relatively independent from ZIP code).

In this case you might compare these to something like Principal Component Analysis, where a collection of features explain one dimention of the variance in data set, while other features explain another dimension. So no matter how many ZIP-related features you have, you may never explain importance of floor area.

Is @user3768495 evaluating the performance of the model out-of-sample using e.g. cross-validation? If so, multi-collinearity should not be a problem and he should not worry about overfitting as he will get an indication of overfitting through the validation performance decreasing. – rinspy – 2018-06-14T09:47:43.230

@rinspy overfitting has many faces. Involving a validation set can help avoiding overfitting but cannot solve the problem. For example, the inconsistent distribution between training data (which is split into training set and validation set) and real population. Even model performs well in the training data, it may not be generalized to real world situation. The reference from my answer also talked about overfitting. – Fansly – 2018-06-14T15:36:55.287

True, but avoiding multicollinearity will not help with 'overfitting' arising from covariate shifts. I am just saying that multicollinearity is likely not an issue if he is interested in building a predictive (and not a descriptive) model. – rinspy – 2018-06-15T08:26:24.983

My concept about overfitting is about when a model failing to generalized to a new dataset, not wihin the training data. Please see this

– Fansly – 2018-06-15T15:10:28.537