The Data Library works primarily with journalists and nonprofits, but until recently, I hadn’t fully realized how different the processes are in these two environments. We’d been following two different processes, but didn’t have names for them, so it is worthwhile to give the two contexts names, so we can be sure we are working in a process that is comfortable for our clients and partners.
In the Journalist / Exploratory context, the journalist is looking for a story in data, or wants to use data to support a story idea. In either case, the dataset is novel; there is no pre-existing format to follow, and most times, neither the journalist nor the analyst has worked with the data before.
In this context, we follow a light, fast process that produces a lot of graphs and tables to look at different angles. The outputs are very rough, so the plots aren’t properly labeled and table columns can be cryptic. The goal is to sift through the data quickly to find a few gems, and polish them later. We don’t want to spend a lot of time making plots look good if they will never be used.
In this context, we will work directly from IPython and plotly, documenting the process in Google Docs, and will share with the journalist the IPython Notebooks, which like this one, are complicated, ugly, and not at all suitable for publication, but they are very useful for exploring ideas quickly.
The other context is the Nonprofit / Reporting context, where an organization has a fairly specific goal, most often a well defined report. In this context, the projects run most smoothly when we start with a copy of a previous report, discuss changes to it, and use it as a template. If there isn’t a previous report, we’ll mock up one in Excel first.
In the reporting context, the client doesn’t want to see the rough work, and the IPython notebooks are very confusing and distracting, so we usually share Excel files for data, and load the data into Tableau for presentation and discussion. The Tableau workbooks are much easier to understand, and a whole lot more attractive. Here is an example of a Tableau workbook organized for reporting.
There are other contexts as well, one for programming projects and another for producing datasets for later analysis, but these are the two where proper communication about which context will be used for a project has the most impact on project success.