Tuesday, April 15, 2014

Programming Languages Are Horrible

A bunch of my students in my honors programming class made the same error yesterday. Remarkably today Bertrand Meyer had a post about the very same error in the CACM Blog (Go read it – it’s great - Those Who Say Code Does Not Matter) The short version of my student’s problem revolved around how C-style languages handle if statements with multi-statement actions. Take for example the following code:
   1:              if (booleanExpression)
   2:                  DoSomething();
   3:                  DoSomthingElse();
   4:              if (!booleanExpression)
   5:                  DoAction();

Students have a tendency to assume that if the expression in line 1 is true that both of the statements in lines 2 and 3 will be executed. Not so. The compiler assumes that (regardless of indentation) only the statement in line 2 will be executed if the expression in line 1 is true. The statement in line 3 will always be executed. This caused my students no end of trouble. The right way to do this is to enclose the two statements inside curly braces.

   1:              if (booleanExpression)
   2:               {
   3:                  DoSomething();
   4:                  DoSomthingElse();
   5:               }
   6:              if (!booleanExpression)
   7:                  DoAction();

This removes ambiguity. I try to get students to use the curly braces even for single line code blocks but it is not an easy sell.

This problem doesn't occur in all languages of course. Visual Basic, derived from that ancient of languages BASIC, doesn’t allow this to happen as easily. Trying to do this line for line conversion in Visual Basic gives me errors.

   1:          If (booleanExpression) Then
   2:              DoSomeThing()
   3:              DoSomethingElse()
   4:          If (Not booleanExpression) Then
   5:              DoAction()

The compiler refuses to deal with this code until End If statements are added to make things clear.

   1:          If (booleanExpression) Then
   2:              DoSomeThing()
   3:              DoSomethingElse()
   4:          End If
   5:   
   6:          If (Not booleanExpression) Then
   7:              DoAction()
   8:          End If

Now I am not saying that Visual Basic is not without flaws. All programming languages have flaws. But we do have to be aware of these flaws. The flaw that Meyer wrote about was the same basic error my students made but was made by professional developers in a product that impacted millions of people. It is so easy to make “rookie mistakes” in many languages.

So does this impact the tools we teach with? Honestly, not really. The APCS exam is based on Java which has all the same problems of other curly brace and semi-colon languages. [Let’s be honest – are those curly braces and semi-colons there for the programmers or the complier writers?]  Especially in high schools where we are largely at the whims of things outside our control (APCS exam and pressure from parents and students to teach industry languages) wind up using Java, C++, and maybe C# for many courses.

Oh sure a lot of us get by with various versions of BASIC (and take flack for it from “experts”) but there is always the pressure to “move on.” Most of us at the high school CS level have barely heard of Eiffel (invented and promoted by Meyer) or other languages that have been invented in academic institutions. These languages sometimes do influence the development of other programming languages but seldom seem to migrate into industry intact.

What does that mean for us as educators? It means we wind up teaching students have to solve bugs that they’d be better off if the language did not permit to happen in the first place. This problem has, I believe, contributed to the development, popularity and use of drag and drop block programming languages for beginners. But eventually we all push our students to learning crummy languages.

I don’t see an easy answer. It will probably have to be the universities who solve this first. High schools are allowed to follow trends in higher education. And I have seen a lot of professional development organizations influenced by language choices of recent graduates. Though that almost always requires a common choice by many top universities which seems to be less common all the time.

For a lot of professional developers, especially those who are self taught, seem to view doing things the hard way as a point of pride. Looking for tools (or languages) that make creating bugs harder is seen as a crutch by the “brogrammer” crowd.

Ah, well, maybe when the current generation gets to be my age and loses the desire to spend time tracking down easy to prevent bugs things will change.

7 comments:

Don Bruey said...

Languages like Python use indentation for blocks instead of brackets or "end if", so that's an example of a language that follows "do what I mean" when writing indented code.

Some editors like Visual Studio have "format my code" features that will indent everything properly so it's easier to catch things like your first example. In that case, Visual Studio would remove the indent from the second line, and it would be (maybe) apparent to the programmer that the result won't be what they might have thought. It may even do that formatting and auto-indenting as you type.

You're right, this is a crummy thing to have to learn, and we often learn it the hard way. In this case, maybe a different set of tools will help?

Boris said...

This is a valid concern, but you are looking at the problem in isolation. You must know what I'm going to point out, but for posterity:

There are good reasons for C's lax syntax. It needs to be easy to generate by other programs, and it tries to be concise (to the point of obfuscation). The solutions are obvious: using the appropriate tools. Even the most basic static code analysis tool would catch a mistake like this.

As an educator, why not give your students an exercise: writing a static code analysis tool that detects possible errors?

Anonymous said...

The Go programming language with its strict and uniform syntax prohibits such errors. There is no if without curly braces.

Michael S. Kirkpatrick said...

I wish I had a reference to point to (can't remember where I saw it and don't feel like looking it up), but this problem also plagues indentation-based languages like Python, except in reverse. Specifically, if you do something like:

if False:

    DoSomething



    DoSomethingElse


In the study I am thinking of, a very large number (something like 65%) of Python programmers said that DoSomethingElse would be executed. Even for the most experienced demographic, a majority still got it wrong.

I think the important takeaway is that there is no perfect language. You should use the language that is most appropriate for the concepts you want to cover. For me (I teach systems courses such as OS), that language is C and only C. It's not that I perceive other languages as crutches. It's that other languages get in the way. If I'm trying to talk about page tables or scheduling queue implementations, higher level languages block me from having direct access to the machine-level features that I need.

For high school and CS1, I think languages like Python, Ruby, etc., are better, because they allow you to talk about computational thinking concepts instead of dwelling on syntax.

Garth said...

I like EndIf. It is just so simple. I really do not like the indent method of Python. The end of a block simply does not jump out at you like the good ol' EndIf. I would be interested in knowing why languages define code blocks using different approaches. Is there something better about the indent as opposed to EndIf as opposed to curly brackets?

Michael S. Kirkpatrick said...

"I would be interested in knowing why languages define code blocks using different approaches. Is there something better about the indent as opposed to EndIf as opposed to curly brackets?"

Based on conversations I've had with Python advocates, they despise syntax. They feel that things like curly braces are annoyances that get in the way. They just don't like them. Perhaps there are other reasons, but I don't know why.

As a curly-brace-advocate, though, I will say that this is grounded in compiler design theory and context-free grammars. One fundamental principle from that area is that white-space should never be significant. Parsing should be done based on an input stream of symbols with an unambiguous grammar. For instance, the infamous if structure can be defined as:

IF : if EXPR BLOCK

That is, an IF consists of the keyword "if" followed by an expression EXPR and a block BLOCK. These definitions could be expanded (note that | is for listing multiple options):

EXPR : EXPR == EXPR
| IDENT = EXPR
| ( EXPR )
| ...

BLOCK : IDENT = EXPR;
| STATEMENT
| { BLOCK }
| ...

From a parse-tree structure, this ends up being easy to automate. However...it makes it hard to program. In general, programming language designers have focused on how to make it easy to build a compiler while ignoring the cognitive difficulties of how to write code. Whether or not that is the right choice is a matter of debate.

Alfred C Thompson II said...

I think there is always a sort of trade off between what is easy to parse and what is natural for people. Terminators like curly braces and semicolons to make parsing easier. White space is more natural for humans but opens up ambiguity pretty easily and ambiguity is seldom a good thing in programming.