Prepare Your Data Contest Toolkit

Posted by on Feb 20, 2015 in News | 0 comments

To get ready for the Data Contest, you’ll want to ensure that your laptop already has installed on it all of the tools you’ll need. There is a set of tools that we use in most of our programs, and it will serve as a good base for your contest toolkit. These tools are:

Additionally, we frequently use Sqlite files for data storage and to sort and search thorough data using SQL. Sqlite is already installed almost everywhere, but you may want to get a Sqlite GUI to make it more like working with a spreadsheet.

If you’d like to get a visual introduction to these tools, we’ll be running Google Hangouts to demo the tools. Signup and get contact information for these sessions at our Meetup Page. 

This same set of tools will serve you well if you come to the full day data conference that the Python Meetup group is hosting, starting at 8:30 on the same day as the Contest. Come early, learn some useful skills, and put them to the test in the afternoon. Visit the Data Science FunConference registration page for details and to get a free ticket.

Read More

Data Contest in 10 Days!

Posted by on Feb 18, 2015 in News | 0 comments

The SDSU / Data Library Data Contest has teamed up with UCSD, Python User Group and several Data Science User’s groups to now offer a full day event with two morning tutorials (R and Python) a mid-day exhibition with many Data Science projects and software demos, an afternoon Machine Learning challenge and the kick off to the SDSU / Data Library Data Contest. Visit the signup page to join the contest, learn more about data science, and have a chance to win part of the $2,100 in prizes. Visit the Conference Eventbright page to register for the conference. 



Read More

Tools for the Data Contest

Posted by on Feb 16, 2015 in Misc | 0 comments

The Student Data Contest is in less than two weeks, so it’s time to get your tools ready. If you are a student and want a shot at $2,100 in prizes, signup for the contest. 

One of the best tools available for quickly visualizing data is Tableau, and best of all, if you don’t need to connect to a database, Tableau Public is free.  Tableau allows you to quickly produce beautiful charts and tables, and makes it easy to embed those visualizations on the web.  Tableau runs on both Mac and Windows, but while it has a very well designed user interface, it does have its biases; you’ll want to spend some time learning how it expects you to build visualizations before the contest.

So, download Tableau Public  and spend a bit of time learning how to produce basic visualizations.  It will really pay off during the contest, and you’ll have a valuable addition to your tool box for future use.

Read More

When Average Fails: Bimodal Distributions

Posted by on Jan 29, 2015 in Analysis | 0 comments

Probably the most common statistic that people deal with is the average, which can often be a good approximation of the typical or general case. However, there are many cases where the average fails, and the most extreme example I’ve seen in recent data is lawyers’ salary distributions.

The NALP has been publishing salary distributions for a few years, and the blog Social Evolution Forum provides a good overview of the distribution. Since 2000 or so, the distribution is extraordinarily bi-modal, making the average, as well as the median, a poor statistic to represent the typical case.

Lawyer's salary distribution 2013 graduates.

In these situations, the average is meaningless, and it is better to report the two ( or three ) modes.

The Blog Empirical Legal Studies has more of the charts and a discussion about the system that created them.

Read More

SDSU Datathon

Posted by on Jan 15, 2015 in News, Old Projects | 0 comments

Solve Social Problems for $2,700 in Prizes

The San Diego Regional Data Library, the SDSU Society for Statisticians and Actuaries and Teradata are organizing a data analysis contest to aid nonprofits, journalists and government agencies in making better use of data, develop a broader regional capacity for data analysis, and introduce students interested in data analysis to future employers.

In this contest, student teams will work on one of three projects, addressing education, food insecurity or community development. After the February 28th kick off, teams will have one week to analyze the data and visualize the analysis, before presenting the results to the public at the March 10 awards ceremony.

Data Science FunConference. The Data Contest is running together with a Data Science Conference run by the Python Meetup group, 5 other meetup groups, companies and universities. The conference will feature training sessions, data science training, and software demos. Visit the conference registration page for a full schedule.

No Experience Necessary. This event will be held in conjunction with the San Diego Python User Group and San Diego Data Science,  who will be offering a data science training class the morning before the contest kickoff. Come early to learn data analysis techniques in your favorite language, then apply them in the contest later in the afternoon. Additionally, Teradata employees will serve as mentors to teams, to offer guidance or help if you get stuck.

No Team Required. Singleton analysts are welcome! You can come early for the training session to meet other single analysts, or join a team at the event.

College, High-school or Pro. The contest is open to high-school students and college students. Professionals and post-students are welcome to participate too, but only students are eligible for prizes.


Data Contest, 28 Feb, 1pm

Register for Contest

Data Science Conference, 28 Feb, 8:30am

Register For Conference

Awards Ceremony, 10 March, 6:30pm

Eventbrite - SDSU Data Contest Awards Ceremony

Not ready to register? Join the mailing list for announcements and updates

Mailing List

Join the Email List

Non-analysts Needed. The data projects can use many different skills, including programers, visual designers, technical writers and presenters. Your skills will be valuable even if you aren’t a statistician.


Here is the complete schedule:


Both the Data Conference and the Data Contest will be held at SDSU, in Peterson Gym, room 153.

San Diego State University – Peterson Gym 153

More Information

If you are interested in being involved in the contest, as a project sponsor, mentor or contestant, contact Eric Busboom at, or (619) 363-2607. Or, subscribe to our email list for future announcements.

Read More

Viz Crime with Python and Javascript, Go To Cool Parties With Free Food

Posted by on Jan 5, 2015 in Projects | 0 comments

We’re looking for some programers to visualize crime data and present it at our booth at the San Diego Magazine Big Ideas Party on Jan 21. The Data Library was one of the 25 Big Ideas covered in their January issue, so they’d like us to have a presentation at the party.

I’d like to have an interactive display, probably using D3, that shows a crime hot spot map for the region, as well as a collection of time-based Rhythm maps for selected areas. A visitor to the booth could select a neighborhood or city, see the hot spots in that area, and see how the crime incidents change in that area over time.

So, we’ll need a Python programmer for the server side ( Pandas for analysis, Flask or similar for the server ) and Javascript person for the front end. Someone with solid visual design skills would be a plus.

You’ll get a ticket to the party on the 21st, to share in the glory, get free food, and do some high-quality hobnobbing.

If you are interested, send me an email, with a link to your Github/Bitbucket/etc account or portfolio, to We can use any number of volunteers, but I only have three free tickets.

Read More