Pro Bono Data Analysis This Summer

Posted by on Apr 4, 2017 in Analysis, News, Projects | 0 comments

Disadvantaged Student Performance

For Summer of 2017, The San Diego Regional Data Library will be running a data internship program, working on pro-bono data analysis projects for nonprofits, governments and journalists in San Diego County. If your organization has questions that can be answered with data, you can have an undergraduate data analyst work on your project, with professional guidance, for up to 12 weeks.

Project sponsors must be able to describe their needs as a set of questions that can be answered with available data, and must be able to meet with the interns at their site for 1 hour per week. Projects must have a social goal or benefit.

We’re currently running two projects, and expect to be running three or four more this summer.

If your organization needs data analysis this summer, please contact the Data Library Director, Eric Busboom at (619) 363-2607 or

Read More

2015 Data Contest Winners

Posted by on Mar 12, 2015 in Analysis, News | 0 comments

We completed our 2015 Data Contest with final presentations and winners at the awards ceremony on Tuesday. Here are the winners and their presentations:

  1. UCSD MAS Data Science, Time and Space Analysis of Food Distribution
  2. irHacker, California Suspensions
  3. Flash and Shadow, A Visual Geographical Study on Location, Availability, Public Transportations and Crime Exposure
  4. A Mathematical Modeling Team, Are Some Teachers Just “Meaner” than Others?

We also have two Honorable Mentions:

Thank you all for participating! The submissions were very valuable for the non-profits that were involved, and we’re looking forward to the contest next year.  Until then, if you’d like to get involved in other nonprofit data analysis projects, join the Practical Data Program for announcements about upcoming projects.

Read More

Data Contest Submissions

Posted by on Mar 10, 2015 in Analysis | 0 comments

We completed the 2015 SDSU Data Contest on Saturday, with a fantastic collection of excellent submissions. The Judges are reviewing them now, but until you learn the winners at the Awards Ceremony on Tuesday, you can see all of the submissions here.

Everyone is welcome at the Awards Ceremony, so follow the link to register.

UCSD MAS Data Science


Flash and Shadow




UCSD / SDSU Alliance

Team IQ

Kearny Komets

A Mathematical Modeling Team

Read More

When Average Fails: Bimodal Distributions

Posted by on Jan 29, 2015 in Analysis | 0 comments

Probably the most common statistic that people deal with is the average, which can often be a good approximation of the typical or general case. However, there are many cases where the average fails, and the most extreme example I’ve seen in recent data is lawyers’ salary distributions.

The NALP has been publishing salary distributions for a few years, and the blog Social Evolution Forum provides a good overview of the distribution. Since 2000 or so, the distribution is extraordinarily bi-modal, making the average, as well as the median, a poor statistic to represent the typical case.

Lawyer's salary distribution 2013 graduates.

In these situations, the average is meaningless, and it is better to report the two ( or three ) modes.

The Blog Empirical Legal Studies has more of the charts and a discussion about the system that created them.

Read More

Burglary Rhythm Maps

Posted by on Apr 18, 2014 in Analysis | 0 comments

A Rhythm Map is a heat map that displays time in the X and Y dimensions. They are an excellent way to visualize repeating patterns in time, such as how crimes occur by hour and data of week. Here we look at some interesting patterns in burglaries in the City of San Diego.

First, here is the map for a range of crime types in San Diego, compiled from the type, time and date of about 400K crime incidents in the City of San Diego from 2006 to 2012.

Rhythm Map, All crimes in San Diego


Each square is a crime type. The vertical axis is the hour of the day, and the  horizontal axis is the day of the week, with Sunday being the cell between 0 and 1.  Darker red means there are more crimes than lighter red and yellow. The colors are not comparable across squares, only within the cell. So, the dark red cell at 5:00PM on Friday in the Burglary square may represent a very different number of crime incidents than then dark red cell at 12:00AM on Thursday in Sex Crimes.  Also note that these views combine citations, arrests and reported crimes, and there may be different patters when the maps are broken out on that factor.

There are a lot of interesting patterns here, but we’ll focus on Burglary. The first thing to notice is there are two time ranges, groups of darker red cells,  when burglaries occur: during the work hours on weekdays and on Friday evenings. ( The strong line at noon is most likely an artifact of crimes for which the time is not known being given that value arbitrarily.  )

What accounts for the two separate time ranges? First, let’s break it out by community. This chart uses Clarinova Place Codes for the community names.


Here we see that some communities exhibit one pattern or the other, and sometimes both. Downtown ( SanDOW ),  La Jolla ( SanLAJ ) and Mira Mesa ( SanMIR ) show the Friday pattern, while Southeastern ( SanSOT),  Greater North Park ( SanGRE) and Midtown ( SanMID ) show the week day pattern.

Community distinctions may explain some of the differences in the patterns, but there is a factor that is probably more important: residential vs commercial crime. So, let’s split out the maps on that factor.

Here is where the distinctions become the strongest. In Otay Mesa ( SanOAT ), Mira Mesa ( SanMIR ) University ( SanUNV ) and others, the Friday evening pattern completely splits from the weekday pattern. However, we also see a new weekday pattern in the commercial burglaries in Claremont ( SanCLA ), Uptown, Midtown, with commercial burglaries occurring across the weekday evenings.

Those features are consistent with exactly what you’d expect from burglary: the burglaries occur when the business and homes are unoccupied. But it doesn’t explain why in many communities the commercial crimes would occur more frequently on Friday evenings.  Another unusual pattern is that in Pacific Beach ( SanPCF ) there is a residential burglary cluster on Friday and Saturday evenings, with a similar but weaker pattern occurring in Uptown and College.

Rhythms are a powerful way to look for patterns in time-structured data, because they take advantage of the ways that human brains most quickly process visual information. However,  they aren’t  a complete solution; they are just a start. Before making any recommendations based on the data, we’d want to do a few statistical tests, and at least, look at the absolute number of incidents per cell in the areas exhibiting patterns.


Read More