Friday, October 16, 2015

Autograders–for good or for evil?

Over 40 years ago I first graded programming projects. As an undergraduate lab assistant I reviewed student projects based on metrics from the professor. The professor assigned the grade in large part based on my review of how the project met the assignment goals. Grading coding projects was a pretty labor intensive process. For many of us it still is labor intensive today.

There are programs that evaluate programs that instructors can use. The CS 50 MOOC and in-person course out of Harvard uses one such autograder. Using it really got me thinking about autograders recently. After all automating labor intensive work is something computer science people think about all the time.

Recently I played around with using CodeHunt as a sort of autograder. I had my students copy code they had written previously into a CodeHunt puzzle. (The puzzle I used is embedded in an Office Mix at Palindrome Tester  if you want to try it.)

Having student using it emphasized a minor frustration I had with the CS 50 grader – it is very picky.  If the grader is looking for a return of “palindrome” and the student uses a capital “P” the grader judges it incorrect. A Binary right or wrong lacks any of the granularity that one would like to incorporate into grading. At least CodeHunt will give some weighting based on how a student solution compares to the canonical solution the puzzle writer used.

Recently CodeHS announced an autograder and made it available. (These Are The Autograders You’ve Been Looking For) They posted a link on a Facebook group for computer science educators. One of the replies there was from Mark Guzdial who really knows his stuff about what works in computer science education.

I'm deeply worried about use of autograders. Lecia Barker's research says that THE most critical factor in retaining students in CS (especially women and URM students) is using creative, relevant, meaningful assignments. Autograders steer assignments towards easily graded, cookie-cutter coding projects where creative solutions are a minus -- they're harder to autograde.

That comment really resonated with me. As I have been thinking about autograders one of the issues for me has been how to write an assignment that works with an autograder. I’d already thought about how picky they are and had been thinking about other limitations. The limitations I had been thinking about had been mostly technical and I hadn’t really thought about the creativity aspect. I hope I would have gotten there eventually but Mark’s comment brought it home immediately.

I like creativity in projects and I don’t mean just creative ways to get a narrowly defined answer. I alluded to this in a recent post (Just Don’t Turn Them Off  - more so in the comments) but it is a common thread in my ideas about teaching. An autograder that is hyper picky about spelling and punctuation is a little frustrating but an autograder that leaves no room for “creative, relevant, meaningful assignments” is likely to be deadly boring.

That doesn’t mean there is no role for autograders. I do like using CodeHunt for small exercises. And my students like it for the game aspects of it as well. But for major projects I think I really want interesting projects of the sort that do not lend themselves to auto grading. I also want to offer a lot of options for individual projects and that is likely to be more work to set up in an autograder than to grade manually.

So do autograders have a place? I think so. You can’t do a MOOC without them for example – though I am not so sure a MOOC is the way I want to learn or teach. They can be useful for the simple exercises that one wants students to do before incorporating a concept into a larger project. But they are not the end all answer to grading really interesting and motivating projects.


Mike Zamansky said...

Right off, let me say that I HATE GRADING!!!!!!

I like autograders for simple small homework type assignments and use codingbat early on in a number of classes (and our own custom brew Schemingbat as well).

Forget about the limiting of assignments, you can learn a lot about a student by looking at their project code and development history (github, logs, and graphs are your friend, at least for classes where they're advanced enough to use git). Even if an autograder can put a summarative number on a project, it won't give me the feedback I need to drive instruction.

Mike Zamansky said...

My thoughts on the subject:

Michael Ball said...

Mark is certainly right that you can't measure creativity with an autograder, and I think we need to be aware of what the limitations are. But, at the same time, it's possible to use this technology for much less ambitious goals, or to encourage student performance. Students working on creative exercises could still use guidance about what they're doing, so autograding tools could provide them info about other parts of their code.

However, in my experience it's entirely possible to build autograders that are robust to different implmentations and that allow signficant flexibility. We're working on one for Snap! and while it's very early on still, we can do things like unit type tests and checking whther students follow broad concepts, like even just stringing 3 blocks together in a script.

But, ugh, I said auto_grader_ a bunch of times now! Mike is quite right! Grading sucks! Our current use of autograders in the MOOC we're doing is purely auto_feedback_ and all work is participation based. At Berkeley for BJC / CS10, we want to use these tools again as _feedback_ so that students will be able to work more independently if they desire, but they won't become a primary means of grading any time soon.