For the last 5 months, SANDAG has been publishing their crime incident data to the web. The file they publish only stores the last 180 days, and it is a bit hard to find, so we’re archiving the files to our data repository .
Here is an interactive data application that explores how crime incidents vary over the day of week and time of day. In the checkbox below, select one or more crime types, and the heatmap will show the relative intensity of those crime types over day of week and time of day.
There many interesting patterns here, some you would expect, some you might not:
Things you might expect:
- DUI and Drugs violations are primarily committed in the evening and early mornings on weekends.
- Assaults are most frequent in the late evenings and early mornings on weekends.
- Vehicle theft and break-ins are committed in the dark.
- Burglary is committed during the day, while people are at work.
There are also a few interesting surprises:
- Sex crimes, which are mostly prostitution, peak on Thursday evenings.
- Homicides are spread throughout the week. Rather than being tied to nightlife.
- Fraud occurs almost entirely at noon or midnight. This is almost certainly a data-collection issue, not the actual time the crimes occurred.
- Weapons violations, however, are primarily in the middle of the week.
This sort of heatmap is a really powerful way to visualize complex relationships quickly, although it also hides a lot of other interesting features. For instance, crime varies considerably by location, so a valuable extension of this analysis would be to include checkboxes for selecting neighborhoods.
This application was built using R and Shiny. If you’d like to learn to develop this sort of application for your own site, the Library is considering running a training class. If you are interested, let us know.
For the last few months, a team of geography students at SDSU have been working with the crime data provided by the Library, producing analyses and visualizations of the data.
Elias Issa has been looking at Drugs and Alcohol violations in Downtown San Diego and East Village. He writes:
The Hot Spot tool calculates the Gi* statistic for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. To have a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. My animated maps illustrate 2 hot spots from 2007 to 2012. the most significant hot spot is located close to St. Vincent Paul homeless shelter ( Imperial Ave) on the South Eastern part of East Village. The second hot spot is located North of Gaslamp Quarter between Broadway And Market which most of the famous and popular bars are found. In addition to those map,and based on those hot spots, I did some statistical analysis to show the average of Drugs/ Alcohol violation monthly (Above/ Below Avg) and yearly. My study reveals that there is a slight increase within 2011 and 2012 in the average of Drugs/ Alcohol violations.
Over the course of the project, he has been experimenting with various ways to visualize the time component of geographic data. This is quite difficult, since you can’t easily scan the time dimension like you can in space. Visual processing is tuned for noticing changes and differences — like a deer that won’t notice you if you don’t move — so Elias’ visualization is best for quickly identifying areas that deserve more analysis, rather than showing the quantitative differences.
Due to the difficulty of creating animations like this in ArcMap, the video has only one frame per year, but that is enough to illustrate how the changes from frame to frame draw your eye to problem areas. Without a visualization like this, it is easy to miss some of the most important features of an issue including short term spikes and long-term trends.
After identifying an area to focus on through the visualization, Elias’ underlying statistical method serves to quantify the differences between times or locations, so this project is a great example of a way to use animation to partition a large problem space into components that can be analyzed in detail.
Last year, the U.S. Department of Housing and Urban Development commissioned a report to study the feasibility of creating a nationwide database of parcel information. This is a difficult task because the parcels are usually maintained by the counties, and the US has about 3200 counties. The resulting report is remarkably thorough, including a description of the data collection process and the effort required to get the data, and the information contained in each county’s dataset. The effort required varied greatly, with 13% of the...read more
The Voice of San Diego is running a Q&A regarding Open Data, which just happens to involve an interview with the Director of the Library.read more
SANDAG, through its public safety division ARJIS, is now publishing crime data to the web. This is a major advance in accessibility, since previously crime incidents were only available through a Public Records Act request, and usually involved a fee. The download file includes crime incidents for the last 180 days. The Library will be archiving these files occasionally, and adding them to our crime incident datasets. The files are updated weekly. Unlike the data we acquired previously, this release does not include the ‘legend’...read more
We recently converted the SWITRS database of traffic collisions in California, extracting the records for San Diego County and creating a basic visualization in Tableau Public. Tableau Public is a fantastic data analysis tool, although it takes a bit of training to do complex things. Below is a simple visualization of the number of people killed and injured in San Diego county traffic collisions by day of week and hour of day, for the years 2002 to 2012, inclusive. The deaths line (orange) shows a familiar “bathtub” shape, with...read more
For the Dig Into Data meetup this Wednesday, we’ll be talking about tools to use for analyzing data. If you’d like to follow along in the meeting, you can install these tools before you arrive. The two applications are: Tableau public, for analyzing tabular data. It is a great tool for basic data mining. QGIS, for geographic data. Tableau Public is the free, limited version of Tableau’s professional data mining tools. It is really easy to install, just visit their download page to get started. Tableau Public runs...read more
This Wednesday evening we will be meeting for a hands-on introduction to datasets related to city infrastructure, crime, landuse, business permits, and other factors. We will talk about how to use the data sets and how to analyze them using Excel, GIS tools and statistical programs. The contents will start mildly technical, with the most technical aspects towards the end, so if your analysis skills are basic statistics and Excel, you’ll benefit from the first half of the meeting, and if you are a geek or quant, stay for the whole...read more
We’ve released an update to the crime dataset today. This update is based on reprocessed data from SANDAG and includes some new fields. San Diego Crime Incidents, Revision 3 This update is based on a new extract of the data we got recently from SANDAG. They didn’t tell me what had changed, just that they found some problems and re-ran the extract. If you want to compare the inputs, we have both the first revision and second revision of the SANDAG input data in the data repository. This release includes several new fields in the...read more