More Working with MySQL   Leave a comment

In my previous post, I talked about setting up the SQL database based on Dennis Erny’s NFL play-by-play dataset. I spent some time generating a script that did the appropriate ALTER TABLE statements to ensure that the character encodings and collations are utf-8, not latin1_swedish_ci as in the last post. I started doing it manually through phpMyAdmin, but decided after five minutes that just writing a script was going to be faster, especially across 30 tables.

So, I now have a remote database that I can reach. That’s quite exciting. (Again, later I can dump everything into SQLite to just have everything as a single file. But for now, I feel like playing with remote databases.)

I thought about using R, Python, and Stata for my analytics work, because each one has different strengths, and I feel as though I’m better served knowing all three, and being able to use them for statistics and visualization. Python and R have low-headache interfaces to MySQL — Python with the MySQLdb module, and R with the RMySQL library. They were both rather easy to set up and configure.

But…I also have Stata, and I like it. Except to get Stata to work with databases, I need to use ODBC. I used to have to deal with this in previous jobs, and I was hoping I would never have to deal with this ever again. MySQL offers an ODBC connector for various platforms, including Mac OS X. Except the documentation is inconsistent between the online documentation, the README file, and the actual library itself. It was a bear to install, and I couldn’t get it to work.

For now, at least, I’m going to use R and Python because those interfaces to MySQL work.

Posted April 8, 2012 by techstep in sports

Tagged with , ,

Leave a Reply