San Diego Data Knowledge Base

Posted by on Nov 27, 2017 in News | 0 comments

We’re working on building a new knowledge base, a collection of example Jupyter notebooks related to projects the Library has been working on.  The notebooks include detailed analyses using Jupyter, Python, Pandas and other tools, with most of the notebooks covering  crime and demographics. For instance, the Demographics and Business Loans notebooks shows how to fetch American Community Survey data and create a radius around the highest minority area in San Diego, displaying it a Leaflet Map:

Or, from the San Diego Police Calls For Service, a rhythm map of reports of loud parties, by neighborhood.


Loud Party Rhythm Map

Each of the notebooks includes a link to our example notebooks github repository, so you can also get the full source and run them yourself.


Read More

Exploring San Diego Crime Data using Python – Workshop

Posted by on Nov 7, 2017 in Meeting, New Data | 0 comments

Tonight at Downtown Works,  SCALE San Diego and Open San Diego are hosting a workshop on analyzing crime data with Python, Pandas and Matplotlib. Unlike past analysis we done at the Library on San Diego Crime, this analysis uses data from the San Diego Police, rather than the whole county data from ARJIS, so it is more focused, and more detailed. If you’d like to go, visit the signup page to register.

It is an interesting dataset because, while the location addresses are often missing, and it doesn’t have UCR crime codes, it does have detailed call types and is linked to SDP beats. The 2.5 years of data that are published have more than 1.1M records. The San Diego Open Data Portal also has shapefiles for the beats, districts and neighborhoods, so we can make maps.

For instance, here is a choropleth that is colored according to the counts of calls for “LOUD PARTY” by beat. It should surprise no one where the hotspot is:

The dataset also has very nicely formated date/times so Pandas has an easy time extracting time parts, allowing us to build Rhythm Maps. This heat map displays the count of LOUD PARTY incidents, over the whole dataset, organized by hour and month”

You can clearly see two significant patterns: Loud Party calls are, as expected, primarily made in the late night and early morning, and the calls are more frequent in the summer than in the winter.

There are certainly a lot of other interesting patterns to find in this dataset, and if you are interesting in finding them, I hope to see you at tonight’s meeting.

BTW, if you’d like to see how these charts were generated, the Jupyter Notebook is in Github.


Read More

Explore Society with Social Data

Posted by on Oct 5, 2017 in News | 0 comments

Tonight we’ll be meeting to talk about two important collections of survey data, The General Social Survey (GSS) and a broad collection of integrated survey datasets, Additionally, we’ll be meeting with David Lynn, founder of Mission Driven Finance, to talk about his volunteer data project to analyze business lending to minorities in San Diego county.

In the main meeting, I’ll show how to use the web-analysis tools for the GSS and IPUMS. You can visit those sites now to explore a bit, and it would be worthwhile for you to create an account before the meeting. Here are the important URLs:

We’ll aso talk about some of the details of how surveys are constructed and how to use them. When we’re finished, you’ll be ready to explore important social questions, such as how people’s optimism for the future varies according to home many children they have, or whether cat owners are smarter than dog owners. ( Seriously, that’s in the GSS! )

Hope to see you tonight.

Read More