Data Contest Links

Posted by on Feb 27, 2015 in News | 0 comments

The SDSU Data Contest Kickoff is tomorrow! Better register if you haven’t already. Here are all of the last minute details.

Time and Location:



  • Laptop

Staff Contacts:

Also, Our Twitter hashtag is #sddc15

Read More

Prepare Your Data Contest Toolkit

Posted by on Feb 20, 2015 in News | 0 comments

To get ready for the Data Contest, you’ll want to ensure that your laptop already has installed on it all of the tools you’ll need. There is a set of tools that we use in most of our programs, and it will serve as a good base for your contest toolkit. These tools are:

Additionally, we frequently use Sqlite files for data storage and to sort and search thorough data using SQL. Sqlite is already installed almost everywhere, but you may want to get a Sqlite GUI to make it more like working with a spreadsheet.

If you’d like to get a visual introduction to these tools, we’ll be running Google Hangouts to demo the tools. Signup and get contact information for these sessions at our Meetup Page. 

This same set of tools will serve you well if you come to the full day data conference that the Python Meetup group is hosting, starting at 8:30 on the same day as the Contest. Come early, learn some useful skills, and put them to the test in the afternoon. Visit the Data Science FunConference registration page for details and to get a free ticket.

Read More

Data Contest in 10 Days!

Posted by on Feb 18, 2015 in News | 0 comments

The SDSU / Data Library Data Contest has teamed up with UCSD, Python User Group and several Data Science User’s groups to now offer a full day event with two morning tutorials (R and Python) a mid-day exhibition with many Data Science projects and software demos, an afternoon Machine Learning challenge and the kick off to the SDSU / Data Library Data Contest. Visit the signup page to join the contest, learn more about data science, and have a chance to win part of the $2,100 in prizes. Visit the Conference Eventbright page to register for the conference. 



Read More

Tools for the Data Contest

Posted by on Feb 16, 2015 in Misc | 0 comments

The Student Data Contest is in less than two weeks, so it’s time to get your tools ready. If you are a student and want a shot at $2,100 in prizes, signup for the contest. 

One of the best tools available for quickly visualizing data is Tableau, and best of all, if you don’t need to connect to a database, Tableau Public is free.  Tableau allows you to quickly produce beautiful charts and tables, and makes it easy to embed those visualizations on the web.  Tableau runs on both Mac and Windows, but while it has a very well designed user interface, it does have its biases; you’ll want to spend some time learning how it expects you to build visualizations before the contest.

So, download Tableau Public  and spend a bit of time learning how to produce basic visualizations.  It will really pay off during the contest, and you’ll have a valuable addition to your tool box for future use.

Read More

When Average Fails: Bimodal Distributions

Posted by on Jan 29, 2015 in Analysis | 0 comments

Probably the most common statistic that people deal with is the average, which can often be a good approximation of the typical or general case. However, there are many cases where the average fails, and the most extreme example I’ve seen in recent data is lawyers’ salary distributions.

The NALP has been publishing salary distributions for a few years, and the blog Social Evolution Forum provides a good overview of the distribution. Since 2000 or so, the distribution is extraordinarily bi-modal, making the average, as well as the median, a poor statistic to represent the typical case.

Lawyer's salary distribution 2013 graduates.

In these situations, the average is meaningless, and it is better to report the two ( or three ) modes.

The Blog Empirical Legal Studies has more of the charts and a discussion about the system that created them.

Read More