Weighting for spatial raster as Training Data

0

I have a spatial raster which I am using as Input for a Random Forest Regression Model. My Goal is a prediction of occurrences of a certain property of individuals for each cell based on cell properties. Each cell includes different numbers of individuals, varying between 1 and 2000. Due to the very different number of individuals in each cell, the data looks like this (cell_id is not used in training):

(cell_id) |   cell_data    | property occurences
    1     |  0.1 0.9 0.1   |         1
    2     |  0.1 0.8 0.1   |        670
    3     |  0.7 0.1 2.4   |         9

To overcome this problem, I predicted "property occurrences per Individual" for each cell from which I am able to calculate the actual occurrences afterwards.

(cell_id) |   cell_data    | property occurrences per Individual
    1     |  0.1 0.9 0.1   |              0.1
    2     |  0.1 0.8 0.1   |              0.1
    3     |  0.7 0.1 2.4   |              0.8

However, there are still two problems:
- A row based on only a few Individuals has the same weight as a row based on 1000
- If the number of individuals gets too small, the property occurrence is almost always 0 - regardless of the cell data.

I am thinking about adding a weight based on the number of individuals. Alternatively, I might add a column with the number of individuals and again predict the actual occurrence right away. Are these solutions I should follow or seen as bad ideas? Which is probably the better idea?

slim

Posted 2020-05-07T12:17:48.227

Reputation: 1

No answers