4

1

I want to estimate the average income for a location. I have nested data in the following way: A block is inside a neighborhood, which is inside a zipcode, which is inside a district, which is inside a region, which is inside a state.

I want to estimate the average income at a block level, and the issue is that I don't have much data at that level. I have much more data at a state level, but it is not such a good approximation.

How would you deal with this problem? Are there any ways to incorporate the uncertainty of not having many data points at a block level? Are there any Bayesian frameworks that allow us to incorporate data of all levels? Is it possible that mixed models are able to do so?

If you explain any method, if you can provide a python package where that method is built, it'll be great!

Thanks!

1What did you try so far? What comes to my mind is a dummy-fixed effects model, where you incorporate dummies for some spatial level (e.g. region) for which you have "okay" data and a dummy for each "block" in a linear regression. You could test if the block-level is statistically different from the higher spatial level. – Peter – 2020-05-19T11:04:36.807

1

I'm just trying a damped mean from city to block, a Bayesian-like thing, where the prior is the city mean and the block mean is estimated via the likelihood and the posterior update rule (as in https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf). The issue is that I don't know how to account for all the other levels

– David Masip – 2020-05-19T13:06:12.7232

this blog post could be related https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data/

– oW_ – 2020-05-22T00:02:05.733