## How to normalize the data when mapping crime reports?

24

5

Most maps using count of reports of crime end up being maps of the population density of cities. An attractive response would be to turn these counts into a rate, but a rate per what?

Most commonly, people seem to divide the counts by the population number from the census.

However, that's more or less a count of where people sleep. In most American cities, relatively few people live in the Central Business District, and it doesn't seem to make a lot of sense to divide by that number for that area. Most homicides in my city are of minority, male youth between the ages of 15 and 30. Should I divide by the total population or try to identify the population who's at risk?

Maybe there are two questions: 1. if you had a magic wand, how would you normalize crime reports, and 2. how in practice have people solved this problem?

1I just had a random thought -- why only normalize to make one map? If you normalize on three different things, and assign them each to red, green and blue, you could then overlay them. – Joe – 2014-01-25T00:11:30.060

1This problem is the bane of my existence for the exact reasons you describe) one thing that I do is per capita crime rates and then per capita rates by subpopulations- my multivariate regression analysis attempts (alas) don't fit very well into government reports. – batpigandme – 2013-05-16T14:10:35.653

5

Beware that crime scales in a non-trivial way with the population density, see Growth, innovation, scaling, and the pace of life in cities.

– Piotr Migdal – 2013-05-16T18:39:08.997

18

I asked a data analyst at the Bureau of Justice Statistics who provided this answer:

"I would say that the answer really depends on what information they are trying to show. There are many different way to normalize crime data and even multiple different ways of doing population based rates.

For example, I've even seen some people playing around with creating rates using the "flow" of people through the area, where the denominator for the rate calculation is the number of people who pass through a given area during the day--for example at an airport which has no population per se, but does have counts of the number of people who go through the airport during a certain period of time.

You can also create rates for specific crime times as a proportion of all crime in a given area, which helps to identify areas where particular crimes are more likely to occur than other types of crime. This is often done in types of hot-spot mapping where the interest is to identify across a given area where whether burglary (for examples) is more common than other types of crime and how that differs by city block."

Direct contact information and help is at AskBJS@usdoj.gov and they are happy to work directly with folks in this community on such issues. In the future, as this private beta goes public, I'll invite such experts to answer directly in this forum.

(Disclaimer: I'm the Evangelist with Data.gov)

7

The FBI collects common Uniform Crime Reporting (UCR) data from all municipalities. These include things like murder, rape, assaults, property crimes, vehicle theft etc. Their primary site is here: http://www.fbi.gov/about-us/cjis/ucr/ucr and they have common stats going back decades.

Typically, municipalities use a /1000 population rate which can also be broken down by city, zip, census tract, police beat or similar geographic boundaries. These numbers can be skewed as discussed by a variety of demographic and environmental factors.

When looking at the municipal level there are many other issues such as the codes law enforcement cite the offender with. In California for instance one might be cited under the state Business & Professions Code, Penal Code, Vehicle Code and others are cited under local ordinance (i.e. minor in possession of alcohol). This example also illustrates that the offender might be cited under local or state laws at the discretion of the officer.

Another major factor is the police staffing and service levels. This can vary widely but, is identifiable. The police aren't too keen on revealing these numbers largely I believe out of institutional pride. However, if an area has a higher level of crime it might be partly attributable to low staffing levels. Or, they may have high staffing but, low in-service (available for dispatch to crime) times.

Finally, there has to be a way to somehow regionalize this analysis because in my city (San Diego) for instance, although we have a large population we have abnormally low crime. We also have critically understaffed police because of chronic budget cuts (about 12% understaffed or over 250 officers). One might suggest if we have such low crime why would we need more officers. Another might suggest that fewer crimes are reported because people's tolerance is up (understanding that if they call a cop they'll likely not show up). Other factors include the departmental resources i.e. a low level of Vice officers when compared to staffing standards.

What would be ideal is for municipalities to have a tool to look at similar citizen demographic and police-resourced municipalities that have lower crime rates for certain offenses (i.e. the minor in possession example) and compare their municipal codes. Community leaders might find alternative ordinances that help them reduce those crimes in their areas.

There are other factors relating to business types and densities. Areas with high levels of shopping malls or tourist zones will have higher auto prowls (break-ins) and auto theft. Areas highly concentrated with bars likely have higher levels of DUI incidences but, not high DUI-offense rates because of low DUI patrols etc.

This is definitely complicated and challenging topic but, one that we must work to improve because not only does this improve our ability to pursue happiness in safe communities, people's well-being and lives are on the line.

6

I can throw in 2 cents on this subject. First, I would not use population of a point-of-interest as the sole normalizer in a crime analysis. Crime is more tied to the combination of population, economic activity and social factors at both the POI and surrounding area. Below are examples of some of the factors in developing an algorithm:

1. The population of the surrounding area that supports the economic activity. This is referred to in Census terms as the 'Urban agglomeration'. Factors to consider

A. Population Density
B. Available Housing Density
C. Occupied Housing Unit Density
D. Unemployment rate

2. The overall economic activity at the POI. This may be a combination:

A. Total size of employed workforce
B. Total wages of employed workforce
C. Total value of goods produced
D. Total value of all economic activity (goods produced and sold).
E. Types of the major economic activity (e.g., SIC or NAICS codes) and employment diversification.

• some of this data is obtainable from counties; otherwise I look for data from the BLS.
3. Social Factors

A. Density of medical facilities
B. Density of non-food alcohol establishments
C. Ratios of Grocery Stores/Full-Service Restaurants to Limited-Service Restaurants (sometimes referred to Food Desert)
D. Ratio of students in public schools to private/charter

I have some experience aggregating datasets for analyzing crime factors based on UCR codes.

Recently, I've been looking into how to use daily weather conditions (obtained from NOAA) as factor to predict variance.

4

I'm a bit confused about the problem you're ultimately trying to solve, because you mention maps but then indicate a desire to convert "counts" into rates. In any case, your primary question is

How would you normalize crime reports?

to which I would answer DON'T (whether you mean normalizing the source data before generating reports or maps, or whether you mean how reports and maps should organize the data for effective presentation).

My own preference is to leave the "crime reports" as a detailed-as-possible database, then let the reporting (or mapping) tools optionally summarize or group or map as desired by the user.

For example, my local newspaper essentially refuses to report crimes in our city (apparently to protect the local tourist industry), even though the police post all crime reports on their own web site. Poring through those reports to find what happened in my neighborhood is a pain. So, I use Google Fusion Tables to import the police reports and map the crimes to make it easy to spot anything happening near me.

A crime is essentially a triplet: WHAT happened, WHERE it happened, WHEN it happened. One user might want all crimes in all locations for all time. Another might want only felonies in a given region since Tuesday. Another might want last month's map alongside this month's map.

So, I think the best approach is not to try to predict how to summarize the data, but to instead provide flexible tools to map (or normalize or do whatever to) the data in as many different useful ways as possible.

4

As suggested, it really depends on what you're trying to do with these data. While there is strength in normalizing by pops, by transitional populations (more assumptions created here) those approaches meet certain needs. Providing bare counts is helpful but y'all seem to do that in Chicago already.

For a lot of our violence prevention work and for targeting police efforts in Oakland we do normalize by at-risk populations as you hinted. The crime rate is misleadingly low as an absolute, as white guys in the hills are not homicide victims here, well, almost never. So controlling for population, when you look at the rate of homicides amongst young men of color, then you see some crazy, real data. That allows for powerful context- just like saying a white person from the wealthy hills will live on average 14 years longer than a black person born in the poor flatlands. That's some hard to avoid context. Public health folks do this kind of age adjusted rate calculation all the time.