San Diego Data Knowledge Base

Posted by on Nov 27, 2017 in News | 0 comments

We’re working on building a new knowledge base, a collection of example Jupyter notebooks related to projects the Library has been working on.  The notebooks include detailed analyses using Jupyter, Python, Pandas and other tools, with most of the notebooks covering  crime and demographics. For instance, the Demographics and Business Loans notebooks shows how to fetch American Community Survey data and create a radius around the highest minority area in San Diego, displaying it a Leaflet Map:

Or, from the San Diego Police Calls For Service, a rhythm map of reports of loud parties, by neighborhood.


Loud Party Rhythm Map

Each of the notebooks includes a link to our example notebooks github repository, so you can also get the full source and run them yourself.


Read More

Exploring San Diego Crime Data using Python – Workshop

Posted by on Nov 7, 2017 in Meeting, New Data | 0 comments

Tonight at Downtown Works,  SCALE San Diego and Open San Diego are hosting a workshop on analyzing crime data with Python, Pandas and Matplotlib. Unlike past analysis we done at the Library on San Diego Crime, this analysis uses data from the San Diego Police, rather than the whole county data from ARJIS, so it is more focused, and more detailed. If you’d like to go, visit the signup page to register.

It is an interesting dataset because, while the location addresses are often missing, and it doesn’t have UCR crime codes, it does have detailed call types and is linked to SDP beats. The 2.5 years of data that are published have more than 1.1M records. The San Diego Open Data Portal also has shapefiles for the beats, districts and neighborhoods, so we can make maps.

For instance, here is a choropleth that is colored according to the counts of calls for “LOUD PARTY” by beat. It should surprise no one where the hotspot is:

The dataset also has very nicely formated date/times so Pandas has an easy time extracting time parts, allowing us to build Rhythm Maps. This heat map displays the count of LOUD PARTY incidents, over the whole dataset, organized by hour and month”

You can clearly see two significant patterns: Loud Party calls are, as expected, primarily made in the late night and early morning, and the calls are more frequent in the summer than in the winter.

There are certainly a lot of other interesting patterns to find in this dataset, and if you are interesting in finding them, I hope to see you at tonight’s meeting.

BTW, if you’d like to see how these charts were generated, the Jupyter Notebook is in Github.


Read More

Explore Society with Social Data

Posted by on Oct 5, 2017 in News | 0 comments

Tonight we’ll be meeting to talk about two important collections of survey data, The General Social Survey (GSS) and a broad collection of integrated survey datasets, Additionally, we’ll be meeting with David Lynn, founder of Mission Driven Finance, to talk about his volunteer data project to analyze business lending to minorities in San Diego county.

In the main meeting, I’ll show how to use the web-analysis tools for the GSS and IPUMS. You can visit those sites now to explore a bit, and it would be worthwhile for you to create an account before the meeting. Here are the important URLs:

We’ll aso talk about some of the details of how surveys are constructed and how to use them. When we’re finished, you’ll be ready to explore important social questions, such as how people’s optimism for the future varies according to home many children they have, or whether cat owners are smarter than dog owners. ( Seriously, that’s in the GSS! )

Hope to see you tonight.

Read More

Explore Social Issues with Microdata

Posted by on Sep 15, 2017 in Meeting | 0 comments

In this Hands-On presentation, we’ll be exploring social issues using two sources of microdata: The General Social Survey , a 40 year survey program that asks a representative sample of Americans questions on a huge number of social issues, and IPUMS, a curated, processed collection of international survey and census data. Using these two sources you can study a wide range of social questions.

For instance, here is an analysis of the relationship between Occupational Prestige and Personal Income, using the IPUMS online analysis tool, with data from the ACS 2010-2015:

The Red color means that the intersection between the prestige group and the income group is more frequent than would be predicted by chance. The red diagonal shows a correlation between prestige and income, with higher prestige being associated with higher income. Since income is a a component of job prestige, this is exactly what you would expect. We’ll be demonstrating how to construct analyses like this, as well as analyses using Jupyter notebooks.  The meeting will consist of both a presentation and a demonstration. Bring a laptop, and you can explore some of the data during the meeting.

Also, we will spend about 30 minutes for updates and discussion about the ongoing Mission Driven Credit Analysis  project, including an overview of the FFIEC data sources for studying lending patterns.

Please visit our page for the meeting to RSVP. 

Read More

Mission Driven Credit Analysis

Posted by on Aug 20, 2017 in Current Projects | 0 comments

We’re excited to announce the first of our next round of social data projects.

Mission Driven Finance is a San Diego impact investment fund that provides loans to nonprofits, social enterprises, and businesses that benefit the San Diego community.  MDF is interested in helping businesses understand the regional lending landscape and advocating for improvements to make it easier for businesses that have a hard time getting financing to be able to get the money they need to grow.

This project will explore:

  • Characterizing the state of small and medium business (SMB) lending in SD County.
  • Visualizing regions of the county by Credit Score, similar to this map.
  • Demographic analysis of SMB lending

We will be collecting data related to businesses, demographics and lending and creating some datasets and visualizations.

We will probably start project meetings in October, but may get started with wrangling data before then.  More details about the project are in the project wiki and the Initial Questions document, and visit to register for the kickoff meeting

Here is a map that is the inspiration for the project; we’d like to produce something similar for San Diego:



Join The Project
Read More

Pro Bono Data Analysis This Summer

Posted by on Apr 4, 2017 in Analysis, News, Projects | 0 comments

Disadvantaged Student Performance

For Summer of 2017, The San Diego Regional Data Library will be running a data internship program, working on pro-bono data analysis projects for nonprofits, governments and journalists in San Diego County. If your organization has questions that can be answered with data, you can have an undergraduate data analyst work on your project, with professional guidance, for up to 12 weeks.

Project sponsors must be able to describe their needs as a set of questions that can be answered with available data, and must be able to meet with the interns at their site for 1 hour per week. Projects must have a social goal or benefit.

We’re currently running two projects, and expect to be running three or four more this summer.

If your organization needs data analysis this summer, please contact the Data Library Director, Eric Busboom at (619) 363-2607 or

Read More