Day/Time Crime Heatmaps

Posted by on Dec 21, 2013 in Analysis | 0 comments

Here is an interactive data application that explores how crime incidents vary over the day of week and time of day. In the checkbox below, select one or more crime types, and the heatmap will show the relative intensity of those crime types over day of week and time of day.

There many interesting patterns here, some you would expect, some you might not:

Things you might expect:

  • DUI and Drugs violations are primarily committed in the evening and early mornings on weekends.
  • Assaults are most frequent in the late evenings and early mornings on weekends.
  • Vehicle theft and break-ins are committed in the dark.
  • Burglary is committed during the day, while people are at work.

There are also a few interesting surprises:

  • Sex crimes, which are mostly prostitution, peak on Thursday evenings.
  • Homicides are spread throughout the week.  Rather than being tied to nightlife.
  • Fraud  occurs almost entirely at noon or midnight. This is almost certainly a data-collection issue, not the actual time the crimes occurred.
  • Weapons violations, however, are primarily in the middle of the week.

This sort of heatmap is a really powerful way to visualize complex relationships quickly, although it also hides a lot of other interesting features. For instance, crime varies considerably by location,  so a valuable extension of this analysis would be to include checkboxes for selecting neighborhoods.

This application was built using R and Shiny. If you’d like to  learn to develop this sort of application for your own site, the Library is considering running a training class. If you are interested, let us know.

Read More

Visualizing Traffic Accidents

Posted by on Aug 12, 2013 in Analysis, New Data | 0 comments

We recently converted the SWITRS database of traffic collisions in California, extracting the records for San Diego County and creating a basic visualization in Tableau Public. Tableau Public is a fantastic data analysis tool, although it takes a bit of training to do complex things.

Below is a simple visualization of the number of people killed and injured in San Diego county traffic collisions by day of week and hour of day, for the years 2002 to 2012, inclusive. The deaths line (orange) shows a familiar “bathtub” shape, with higher values on weekends and at night. The injured line is quite different. There are a lot of possibilities for this, including statistical effects due to the 100x difference in the number of injuries and deaths, or the qualitative differences between pedestrians at night during those of the day. For instance, it may be that injuries during the day are driven by travel patterns and reduced attentiveness in the afternoon, while fatalities are driven by alcohol consumption.

Fortunately, there is enough information in the SWITRS dataset to test some of these hypothesis, but that will be the subject of another post.

Read More

Crime at Street Segments

Posted by on May 3, 2013 in Analysis, New Data | 0 comments

A few weeks ago, we received crime incident data from SANDAG for the years from 2007 to March of 2013. The data has the location of the crime listed as a “hundred block,” so a crime at 1435 Main street would be listed at 1400 Main. Since we don’t know where the crime occurred on the block – we don’t know the exact address –  it makes sense to attribute the crime to the whole block.  I just finished an initial pass at geocoding crimes to street segments, using  the All_Roads file from SANGIS and  lot of custom Python code. Here are the results:

All crime incidents for 2007 to 2013, colored according to linear density on the street segment.

All crime incidents for 2007 to 2013, colored according to linear density on the street segment.

In this map, all of the crime incidents for the 6 year period are counted, grouped by the block where the crime occurred. Then, the counts are divided by the length of the street segment and the time period, so the final value is Crime per year per kilometer. Then, the range of values is broken into 5 groups, using the Jenks natural breaks. ( Using head/tail breaks would require more programming )  I tweaked the breaks manually to put more segments into the top category.

This presentation has an advantage over the previous maps we’d created, using Kernel Density Estimation to produce heatmaps: it is easier to see where high-crime blocks are. For instance, lets look at the red block east of downtown, due south of Balboa park. A crack house, perhaps?

Police Station PM

Nope, it is the Police station. It is very common for crimes to have their address set to the police station if it isn’t clear what an appropriate address would be, such as for a car chase or someone caught in a canyon, like in this hotspot in Mission Gorge:

Mission Gorge


It isn’t likely that the particular segment of road gets a lot of crimes, and the criminals probably don’t live in the mobile home park. That segment of road probably contains the address number that is used for crime committed in Mission Gorge, it maybe it is where the cops like to park to patrol the canyon.

Here is a real hotspot, an empty lot that seems to be popular for drug use:

Downtown Drug Lot

Compare this to the heatmap view, zoomed out to show the contour of the hotspot:

Drug Area Heatmap
The heatmap works better  at small scales, because you can quickly assess an entire region. At large scales, the street segment view can help isolate what a likely cause of the problem is, which can obscured by the large regions of the heatmap hotspots. The heat map also encourages a subtle psychological bias: the larger areas seem more significant, because our brains place a lot of emphasis on size. But in the segment map, the segments are viewed discretely, as qualities, not extents. So, it is easier to pay attention to unusual high intensity blocks which pop up because they are bright red, but which in the heatmap get ignored because they are small.

We’re not quite ready to publish the processed crime data, but should have it released, along with both kinds of maps, in the next week.

Read More

SANDAG Transportation Resources

Posted by on Apr 29, 2013 in Analysis, News | 0 comments

SANDAG Senior Research Analyst Mike Calandra sent us some traffic data updates for our Data As a Public Good report. I’m always thrilled when I get to communicate directly with agency analysts, and having them write to me is like an early Christmas.

In our Data as a Public Good report several interviewees reported that they had difficulty getting traffic data from SANDAG, and the SANDAG website only publishes traffic count data as a PDF or an interactive map. However, SANDAG was very responsive to my Public Records Act Request for the traffic counts in a structured format, and we’ve got the response from that request in our repository, so we’ve updated the report to reflect this more recent experience, and here are some other resources that Mike mentioned.

SANDAG is current in the process of updating their traffic count reports and replacing them with a Regional Count Database. The application will be an interactive map with a focus on arterial streets. CalTRANS, through their PEMS application, reports counts for  freeways. PEMS also has about 10 years of data for many other traffic metrics, such as collisions.

Every 10 years, SANDAG does a survey of county residents about travel. The last one was conducted in 2006. I haven’t read the report in enough detail to assess the quality of the statistics, but I’m already really impressed with the design and methodology – see “Cognitive Interview” appendix on page 132 for a lesson on how to do a survey right. If the survey company, NUSTATS, put as much work into statistics that they did in designing the survey, this report has the best data it would be possible to get.

Mike sent a few other links that I knew existed, but hadn’t seen, such as these links to forecasts:

One of the most important datasets that SANDAG produces is their traffic model and the forecasts that the model produces. Every Metropolitan Planning Organization produces traffic forecasts and analysis, since that is the job they were created to do.  If you’ve followed recent local news, the use and publication of this model has been contentious, and two of our interviewees for the Data as a Public Good report said that SANDAG would not release model inputs and output because they considered them proprietary software, an assertion confirmed by letters from lawyers we’ve seen in SANDAG PRA requests.

However, it looks like this may be changing in response to SB 375 – see ‘contentious’, above – which requires MPOs to update their model capabilities, a project that SANDAG embarked on in 2009. These improvements, along with SANDAG’s long term involvement in the CalPECAS peer group and its transition from the 4 Step Model to the Activity Based Model should mean that the resulting models are more open and ultimately more useful to analysts outside of SANDAG.

Traffic models can be spectacularly complicated, and it is really difficult for average data users to interpret the models, verify that they make sense, and use the results. Having them be closed to public inspection means they are more likely to have errors, so making models more open and available will mean that more of those inevitable errors will be found and corrected. I hope the trends toward openness continue, because I’m very interested in having the San Diego Regional Data Library contribute to model improvements that wil benefit the whole region.



Read More

Population Density Maps

Posted by on Apr 22, 2013 in Analysis, New Data, Projects | 0 comments

Here is some  eye candy, a population density map of Pacific Beach and surrounding neighborhoods.


This map was created with a lot of Python code, using the 2010 census shapefiles for census blocks, setting a value for each block as the population of the block divided by the area of the block, and rasterizing all of the blocks to an image.  Red indicates areas of higher population density. You can clearly pick out the areas in Pacific Beach that are zoned for apartments vs single family homes, the UTC high-rise apartment area, and many other variations in land use.

This map is a test of code I’m creating to allow any census variable to be mapped, but I’m not really happy with the result. The problem is that human brains like to see smooth variations in density, and the jaring discontinuities in this map are confusing. Some of the time, the abrupt changes in density is connected to changes in land use, since census boundaries tend to follow streets, but most of the time what map users are really more interested in is how people respond to density, and in those cases, human movements and behaviors don’t follow sharp boundaries.

To address this issue, I will be converting these maps into the same grid structure that we use for crime maps and smoothing across the grid cells to remove the discontinuities. These modified maps won’t show the population density with the same accuracy, but they will be easier for people to interpret in ways that are relevant to their real interests in population density.

Read More