Friday, November 13, 2015

Adding Interest Via Data

One of the lessons that stuck with my from some workshops I took some years ago was that more interesting projects involved doing things that involve a lot of data. Doing something once can often be done manually more easily than writing code to do it. Doing that same thing 10,000 times on the other hand justifies writing code. For that reason I like to assign projects with lots of data.

A couple of examples.

  • To learn about string handling I sometimes assign a project to calculate license numbers from names and birthdays. I prepared a data file of 20,000 names and birthdays for that one.
  • Today I asked students to generate such a data file using a set of files IUnited States Census Bureau obtained from the US Census Bureau. It turns out that files listing popular first names (by year no less) and popular last names are available there. Students will randomly generate name combinations and birthdays.
  • The thirdProject Gutenberg choice I gave students today was to count the occurrences of individual letters in a document. Project Gutenberg has some great data for that sort of project. Mark Twain’s “A Connecticut Yankee in King Arthur's Court” has over a half a million words. Plenty of data.

There is no shortage of large data sets on the Internet these days. (take a look at Big data sets available for free sometime) Computers are great at doing interesting things with data. A little interaction with students can help to determine the sorts of information they are interested in knowing about. That helps provide motivation for them as they work on projects.


Garth said...

A data set I like to have the kid play with is Shakespeare's sonnets. It is my attempt at being cross curricular. Just to piss them off I have them memorize a sonnet and say it in class. Cackle. Assignments are something like print the Xth sonnet, find the Xth occurrence of a word or how many times is a particular word used and in which sonnets. The sonnets are numbered. I also have a text file of all the words in the English language. All sorts of fun there.

Mike Zamansky said...


Real data

and Real daa part 2 though the links in the post are probably dead.