Healthy Food Access Data Library Project

Posted by on May 15, 2016 in News, Old Projects | 0 comments

Project Management Site

Collect and analyze data about the food system in San Diego county.

The San Diego Food System Alliance’s Healthy Food Access Working Group is developing an indicator library to analyze food access issues, and we need your help to locate datasets, wrangle them into useable shape, and create visualizations.

The work is similar to the topics of our March 2015 Data Contest, with additional work of building a reusable data library to perform additional analysis.

This project needs volunteers with a range of skills, including:

  • Administration and logistics: Call potential data providers, locate datasets, and arrange meetings and events.
  • Data wranglers: People skilled with either Excel or Python to manipulate datasets.
  • Data analysts: Data analysis who know R or Python/Pandas.

We will be starting with a list of potential datasets, from which we will construct Ambry Data Bundles. We can load the bundles into a data library. Then we can do visualizations and analysis, such as this map from a project at Palomar College.

How To Participate

Register for an account at our Redmine Project Site, and I’ll add you to the Health Food Access Data Library.


Team meetings will be posted to our site, our Google Calendar,  and to mailing lists. We’ll have our first meetings to get started in mid December.


Read More

SDSU Datathon

Posted by on Jan 15, 2015 in News, Old Projects | 0 comments

Solve Social Problems for $2,700 in Prizes

The San Diego Regional Data Library, the SDSU Society for Statisticians and Actuaries and Teradata are organizing a data analysis contest to aid nonprofits, journalists and government agencies in making better use of data, develop a broader regional capacity for data analysis, and introduce students interested in data analysis to future employers.

In this contest, student teams will work on one of three projects, addressing education, food insecurity or community development. After the February 28th kick off, teams will have one week to analyze the data and visualize the analysis, before presenting the results to the public at the March 10 awards ceremony.

Data Science FunConference. The Data Contest is running together with a Data Science Conference run by the Python Meetup group, 5 other meetup groups, companies and universities. The conference will feature training sessions, data science training, and software demos. Visit the conference registration page for a full schedule.

No Experience Necessary. This event will be held in conjunction with the San Diego Python User Group and San Diego Data Science,  who will be offering a data science training class the morning before the contest kickoff. Come early to learn data analysis techniques in your favorite language, then apply them in the contest later in the afternoon. Additionally, Teradata employees will serve as mentors to teams, to offer guidance or help if you get stuck.

No Team Required. Singleton analysts are welcome! You can come early for the training session to meet other single analysts, or join a team at the event.

College, High-school or Pro. The contest is open to high-school students and college students. Professionals and post-students are welcome to participate too, but only students are eligible for prizes.


Data Contest, 28 Feb, 1pm

Register for Contest

Data Science Conference, 28 Feb, 8:30am

Register For Conference

Awards Ceremony, 10 March, 6:30pm

Eventbrite - SDSU Data Contest Awards Ceremony

Not ready to register? Join the mailing list for announcements and updates

Mailing List

Join the Email List

Non-analysts Needed. The data projects can use many different skills, including programers, visual designers, technical writers and presenters. Your skills will be valuable even if you aren’t a statistician.


Here is the complete schedule:


Both the Data Conference and the Data Contest will be held at SDSU, in Peterson Gym, room 153.

San Diego State University – Peterson Gym 153

More Information

If you are interested in being involved in the contest, as a project sponsor, mentor or contestant, contact Eric Busboom at, or (619) 363-2607. Or, subscribe to our email list for future announcements.

Read More

Viz Crime with Python and Javascript, Go To Cool Parties With Free Food

Posted by on Jan 5, 2015 in Projects | 0 comments

We’re looking for some programers to visualize crime data and present it at our booth at the San Diego Magazine Big Ideas Party on Jan 21. The Data Library was one of the 25 Big Ideas covered in their January issue, so they’d like us to have a presentation at the party.

I’d like to have an interactive display, probably using D3, that shows a crime hot spot map for the region, as well as a collection of time-based Rhythm maps for selected areas. A visitor to the booth could select a neighborhood or city, see the hot spots in that area, and see how the crime incidents change in that area over time.

So, we’ll need a Python programmer for the server side ( Pandas for analysis, Flask or similar for the server ) and Javascript person for the front end. Someone with solid visual design skills would be a plus.

You’ll get a ticket to the party on the 21st, to share in the glory, get free food, and do some high-quality hobnobbing.

If you are interested, send me an email, with a link to your Github/Bitbucket/etc account or portfolio, to We can use any number of volunteers, but I only have three free tickets.

Read More

Student Data Analysis Contest

Posted by on Aug 21, 2014 in Old Projects | 0 comments

Give your nonprofit, agency or news organization valuable data-driven insights by sponsoring a project at our student data analysis contest. 

The San Diego Regional Data Library, the SDSU Society for Statisticians and Actuaries and Teradata are organizing a data analysis contest to aid nonprofits, journalists and government agencies in making better use of data, develop a broader regional capacity for data analysis, and introduce students interested in data analysis to future employers.

Contestant teams will have one week in early March 2015 to answer a set of data-driven questions and visualize the results for one of four projects. Each project will be provided by a nonprofit, government agency or news organization.

The contest will be announced in early January, primarily to college students. For a month before the contest begins, the San Diego Regional Data Library will run a special session of its Practical Data Program to train contestants on using data, Python, R, and IPython to analyze data.

The contest will be judged based on both the work submitted by the teams after the end of the week and the teams’ presentations’ at an evening event the following week.

We are recruiting sponsoring organizations to participate in the contest as:

  • Project Sponsors
  • Prize sponsors
  • Judges
  • Team Mentors

Project Sponsors. Project sponsors are nonprofits, news organizations or government agencies that will describe a compelling problem that the Library staff can turn into a suitable contest. Projects should address an important social issue such as:

  • Hunger
  • Homelessness
  • Health
  • Transportation
  • Education

Project sponsors will provide a staff member for the kick off event, to present the project to contestants and answer questions the contestants may have. Project sponsors don’t need to provide financial support, although they may sponsor additional prize money for their project to encourage more contestants to select that project.

Project sponsors will get:

  • Valuable data-driven insights to core operational issues.
  • Introductions to skilled analysts who can volunteer for additional data work.
  • Promotion in contest marketing materials.

Prize Sponsors. While the main prizes are covered by our sponsor, Teradata, other organizations can sponsor a side prize of their choosing. The side prizes can be awarded for anything the sponsor wants, such as:

  • Best visualization
  • Most compelling explanation of a social issue
  • Best statistical work

Project sponsors may offer a side prize for their project to encourage more contestants to choose it, or encourage a particular focus on the project.

Mentors. Because the contestants will be students, it will be very valuable to have data analysis professionals work with the teams. Mentors can either work with the teams over the course of the week of the contest, for a commitment of about 10 hours or less, or be available for consulting at one or both of the weekend events, a commitment of about 4 hours.

Judges. Prize Sponsors can provide a judge for the main prizes or for their side prize. We are also recruiting judges from the project sponsors and journalists.

Participating in the Contest

If you are interested in being involved in the contest, as a project sponsor, mentor or contestant, contact Eric Busboom at, or (619) 363-2607. Or, subscribe to our email list for future announcements.

Read More

Drugs and Alcohol in Downtown and East Villiage

Posted by on Nov 12, 2013 in New Data, Projects | 0 comments

For the last few months, a team of geography students at SDSU have been working with the crime data provided by the Library, producing analyses and visualizations of the data.

Elias Issa has been looking at Drugs and Alcohol violations in Downtown San Diego and East Village. He writes:

The Hot Spot tool calculates the  Gi* statistic for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially.  To have a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. My animated maps illustrate 2 hot spots from 2007 to 2012. the most significant hot spot is located close to St. Vincent Paul homeless shelter ( Imperial Ave) on the South Eastern part of East Village. The second hot spot is located North of Gaslamp Quarter between Broadway And Market which most of the famous and popular bars  are found. In addition to those map,and based on those hot spots, I did some statistical analysis to show the average of Drugs/ Alcohol violation monthly (Above/ Below Avg) and yearly. My study reveals that there is a slight increase within 2011 and 2012 in the average of Drugs/ Alcohol violations.

Over the course of the project, he has been experimenting with various ways to visualize the time component of geographic data. This is quite difficult, since you can’t easily scan the time dimension like you can in space. Visual processing is tuned for noticing changes and differences — like a deer that won’t notice you if you don’t move — so Elias’ visualization is best for quickly identifying areas that deserve more analysis, rather than showing the quantitative differences.

Due to the difficulty of  creating animations like this in ArcMap, the video has only one frame per year, but that is enough to illustrate how the changes from frame to frame draw your eye to problem areas. Without a visualization like this, it is easy to miss some of the most important features of an issue including short term spikes and long-term trends.

After identifying an area to focus on through the visualization, Elias’ underlying statistical method serves to quantify  the differences between times or locations, so this project is a great example of a way to use animation to partition a large problem space into components that can be analyzed in detail.


Read More

Population Density Maps

Posted by on Apr 22, 2013 in Analysis, New Data, Projects | 0 comments

Here is some  eye candy, a population density map of Pacific Beach and surrounding neighborhoods.


This map was created with a lot of Python code, using the 2010 census shapefiles for census blocks, setting a value for each block as the population of the block divided by the area of the block, and rasterizing all of the blocks to an image.  Red indicates areas of higher population density. You can clearly pick out the areas in Pacific Beach that are zoned for apartments vs single family homes, the UTC high-rise apartment area, and many other variations in land use.

This map is a test of code I’m creating to allow any census variable to be mapped, but I’m not really happy with the result. The problem is that human brains like to see smooth variations in density, and the jaring discontinuities in this map are confusing. Some of the time, the abrupt changes in density is connected to changes in land use, since census boundaries tend to follow streets, but most of the time what map users are really more interested in is how people respond to density, and in those cases, human movements and behaviors don’t follow sharp boundaries.

To address this issue, I will be converting these maps into the same grid structure that we use for crime maps and smoothing across the grid cells to remove the discontinuities. These modified maps won’t show the population density with the same accuracy, but they will be easier for people to interpret in ways that are relevant to their real interests in population density.

Read More