Pro Bono Data Analysis This Summer

Posted by on Apr 4, 2017 in Analysis, News, Projects | 0 comments

Disadvantaged Student Performance

For Summer of 2017, The San Diego Regional Data Library will be running a data internship program, working on pro-bono data analysis projects for nonprofits, governments and journalists in San Diego County. If your organization has questions that can be answered with data, you can have an undergraduate data analyst work on your project, with professional guidance, for up to 12 weeks.

Project sponsors must be able to describe their needs as a set of questions that can be answered with available data, and must be able to meet with the interns at their site for 1 hour per week. Projects must have a social goal or benefit.

We’re currently running two projects, and expect to be running three or four more this summer.

If your organization needs data analysis this summer, please contact the Data Library Director, Eric Busboom at (619) 363-2607 or eric@sandiegodata.org.

Read More

Age Friendly Communities Project

Posted by on Dec 12, 2016 in Projects | 0 comments

At tomorrow night’s meeting, we’ll be kicking off two new data projects. The first is the Health Food Access project, previously announced, and the second is the Age Friendly Communities project, for which we’ve just posted the project page. In this project we will be collecting data to analyze the capacity and affordability of San Diego’s assisted living industry, considering the anticipated need for these services over the next 30 years. Hope you can join us. 

Read More

Wrangling Data For Social Projects

Posted by on Dec 6, 2016 in Projects | 0 comments

Next week we’ll be kicking off two new data projects, and a big part of these projects will be finding data, documenting it, and preparing it in a consistent way for analysis, a process known as data wrangling. I’ve been developing software for wrangling social data for a few years, and have collected many of the best ideas into a new metadata system called Metatab. Metatab is a system for storing structured metadata in a CSV file, often alongside data, making it easier to create  and publish metadata.

In the next two data projects, we will using the Metatab Google Spreadsheet Add-On to document data we locate for the two projects. Once a metatab specification is created for a dataset, it can be uploaded to CKAN, our data repository software directly from the Google spreadsheet system. And I’m currently working on other tools for finding and manipulating data.

When we are done with the main data wrangling, there will be collections of datasets in our main data repository  related to food access and assisted living, and then we can start on data analysis, most likely using Pandas and Tableau, but we may also tackled using a few AWS tools like AWS Athena and AWS Quicksight.

Register for the Meeting

 

Read More

Healthy Food Access Data Library Project

Posted by on Aug 16, 2016 in Projects | 0 comments

Collect and analyze data about the food system in San Diego county.

The San Diego Food System Alliance’s Healthy Food Access Working Group is developing an indicator library to analyze food access issues, and we need your help to locate datasets, wrangle them into useable shape, and create visualizations.

The work is similar to the topics of our March 2015 Data Contest, with additional work of building a reusable data library to perform additional analysis.

This project needs volunteers with a range of skills, including:

  • Administration and logistics: Call potential data providers, locate datasets, and arrange meetings and events.
  • Data wranglers: People skilled with either Excel or Python to manipulate datasets.
  • Data analysts: Data analysis who know R or Python/Pandas.

We will be starting with a list of potential datasets, from which we will construct Ambry Data Bundles. We can load the bundles into a data library. Then we can do visualizations and analysis, such as this map from a project at Palomar College.

How To Participate

To participate in this project, join the practical data program, then join the project mailing list by selecting the “Food Access” list under the “List Memberships” section of your profile page.

Team meetings will be posted to our Meetup.com site,  the Practical Data Program site, and our Practical Data Program mailing lists. We’ll have our first meetings to get started in late August.

Read More

Viz Crime with Python and Javascript, Go To Cool Parties With Free Food

Posted by on Jan 5, 2015 in Projects | 0 comments

We’re looking for some programers to visualize crime data and present it at our booth at the San Diego Magazine Big Ideas Party on Jan 21. The Data Library was one of the 25 Big Ideas covered in their January issue, so they’d like us to have a presentation at the party.

I’d like to have an interactive display, probably using D3, that shows a crime hot spot map for the region, as well as a collection of time-based Rhythm maps for selected areas. A visitor to the booth could select a neighborhood or city, see the hot spots in that area, and see how the crime incidents change in that area over time.

So, we’ll need a Python programmer for the server side ( Pandas for analysis, Flask or similar for the server ) and Javascript person for the front end. Someone with solid visual design skills would be a plus.

You’ll get a ticket to the party on the 21st, to share in the glory, get free food, and do some high-quality hobnobbing.

If you are interested, send me an email, with a link to your Github/Bitbucket/etc account or portfolio, to eric@sandiegodata.org. We can use any number of volunteers, but I only have three free tickets.

Read More

Drugs and Alcohol in Downtown and East Villiage

Posted by on Nov 12, 2013 in New Data, Projects | 0 comments

For the last few months, a team of geography students at SDSU have been working with the crime data provided by the Library, producing analyses and visualizations of the data.

Elias Issa has been looking at Drugs and Alcohol violations in Downtown San Diego and East Village. He writes:

The Hot Spot tool calculates the  Gi* statistic for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially.  To have a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. My animated maps illustrate 2 hot spots from 2007 to 2012. the most significant hot spot is located close to St. Vincent Paul homeless shelter ( Imperial Ave) on the South Eastern part of East Village. The second hot spot is located North of Gaslamp Quarter between Broadway And Market which most of the famous and popular bars  are found. In addition to those map,and based on those hot spots, I did some statistical analysis to show the average of Drugs/ Alcohol violation monthly (Above/ Below Avg) and yearly. My study reveals that there is a slight increase within 2011 and 2012 in the average of Drugs/ Alcohol violations.

Over the course of the project, he has been experimenting with various ways to visualize the time component of geographic data. This is quite difficult, since you can’t easily scan the time dimension like you can in space. Visual processing is tuned for noticing changes and differences — like a deer that won’t notice you if you don’t move — so Elias’ visualization is best for quickly identifying areas that deserve more analysis, rather than showing the quantitative differences.

Due to the difficulty of  creating animations like this in ArcMap, the video has only one frame per year, but that is enough to illustrate how the changes from frame to frame draw your eye to problem areas. Without a visualization like this, it is easy to miss some of the most important features of an issue including short term spikes and long-term trends.

After identifying an area to focus on through the visualization, Elias’ underlying statistical method serves to quantify  the differences between times or locations, so this project is a great example of a way to use animation to partition a large problem space into components that can be analyzed in detail.

 

Read More