Exploring San Diego Crime Data using Python – Workshop

Posted by on Nov 7, 2017 in Meeting, New Data | 0 comments

Tonight at Downtown Works,  SCALE San Diego and Open San Diego are hosting a workshop on analyzing crime data with Python, Pandas and Matplotlib. Unlike past analysis we done at the Library on San Diego Crime, this analysis uses data from the San Diego Police, rather than the whole county data from ARJIS, so it is more focused, and more detailed. If you’d like to go, visit the signup page to register.

It is an interesting dataset because, while the location addresses are often missing, and it doesn’t have UCR crime codes, it does have detailed call types and is linked to SDP beats. The 2.5 years of data that are published have more than 1.1M records. The San Diego Open Data Portal also has shapefiles for the beats, districts and neighborhoods, so we can make maps.

For instance, here is a choropleth that is colored according to the counts of calls for “LOUD PARTY” by beat. It should surprise no one where the hotspot is:

The dataset also has very nicely formated date/times so Pandas has an easy time extracting time parts, allowing us to build Rhythm Maps. This heat map displays the count of LOUD PARTY incidents, over the whole dataset, organized by hour and month”

You can clearly see two significant patterns: Loud Party calls are, as expected, primarily made in the late night and early morning, and the calls are more frequent in the summer than in the winter.

There are certainly a lot of other interesting patterns to find in this dataset, and if you are interesting in finding them, I hope to see you at tonight’s meeting.

BTW, if you’d like to see how these charts were generated, the Jupyter Notebook is in Github.


Read More

Crime and Community Data Challenge

Posted by on Apr 3, 2016 in New Data | 0 comments

To announce the arrival of a new set of crime data, our next meetup will be a mini data contest, with a $100 prize for the best student analysis. In this meeting, we will present the new Crime Incident dataset and talk about how to link it to other social datasets. After the presentation, we’ll challenge you to do you own analysis, with a $100 prize for the best analysis from a student, undergraduate or lower.

Then, for the next meeting, we’ll invite the best analysts to present their findings and techniques.

Read More

New Crime Data

Posted by on Mar 23, 2016 in New Data | 0 comments

When we last requested crime data from SANDAG, 3 years ago, it took four months of negotiation to get them to admit they could produce it, and two more months to get the price down to a reasonable amount.

Last week when I requested an update, I got one clarification email, then a phone call, and the files were in my inbox a few minutes later. Thanks SANDAG! As a bonus, the data is now geocoded to census block and track, making the files more immediately analyzable.

You can find both the old data file and the new one in an Ambry package on our new data repository.

Read More

Drugs and Alcohol in Downtown and East Villiage

Posted by on Nov 12, 2013 in New Data, Projects | 0 comments

For the last few months, a team of geography students at SDSU have been working with the crime data provided by the Library, producing analyses and visualizations of the data.

Elias Issa has been looking at Drugs and Alcohol violations in Downtown San Diego and East Village. He writes:

The Hot Spot tool calculates the  Gi* statistic for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially.  To have a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. My animated maps illustrate 2 hot spots from 2007 to 2012. the most significant hot spot is located close to St. Vincent Paul homeless shelter ( Imperial Ave) on the South Eastern part of East Village. The second hot spot is located North of Gaslamp Quarter between Broadway And Market which most of the famous and popular bars  are found. In addition to those map,and based on those hot spots, I did some statistical analysis to show the average of Drugs/ Alcohol violation monthly (Above/ Below Avg) and yearly. My study reveals that there is a slight increase within 2011 and 2012 in the average of Drugs/ Alcohol violations.

Over the course of the project, he has been experimenting with various ways to visualize the time component of geographic data. This is quite difficult, since you can’t easily scan the time dimension like you can in space. Visual processing is tuned for noticing changes and differences — like a deer that won’t notice you if you don’t move — so Elias’ visualization is best for quickly identifying areas that deserve more analysis, rather than showing the quantitative differences.

Due to the difficulty of  creating animations like this in ArcMap, the video has only one frame per year, but that is enough to illustrate how the changes from frame to frame draw your eye to problem areas. Without a visualization like this, it is easy to miss some of the most important features of an issue including short term spikes and long-term trends.

After identifying an area to focus on through the visualization, Elias’ underlying statistical method serves to quantify  the differences between times or locations, so this project is a great example of a way to use animation to partition a large problem space into components that can be analyzed in detail.


Read More