Wrangling Data For Social Projects – San Diego Regional Data Library

Next week we’ll be kicking off two new data projects, and a big part of these projects will be finding data, documenting it, and preparing it in a consistent way for analysis, a process known as data wrangling. I’ve been developing software for wrangling social data for a few years, and have collected many of the best ideas into a new metadata system called Metatab. Metatab is a system for storing structured metadata in a CSV file, often alongside data, making it easier to create and publish metadata.

In the next two data projects, we will using the Metatab Google Spreadsheet Add-On to document data we locate for the two projects. Once a metatab specification is created for a dataset, it can be uploaded to CKAN, our data repository software directly from the Google spreadsheet system. And I’m currently working on other tools for finding and manipulating data.

When we are done with the main data wrangling, there will be collections of datasets in our main data repository related to food access and assisted living, and then we can start on data analysis, most likely using Pandas and Tableau, but we may also tackled using a few AWS tools like AWS Athena and AWS Quicksight.

[button link=”https://www.meetup.com/San-Diego-Regional-Data-Library/events/235951114/” newwindow=”yes”] Register for the Meeting[/button]