As the old computer geek joke goes. the three hardest things in programming are naming things and off by one errors. Lately I have been thinking about off by one errors and wondering if having array and similar indexes start from zero rather that one is contributing the off by one errors by novice programmers.
I understand why many programming languages start indexes at zero. It’s a natural outgrowth of pointer arithmetic. After all if the pointer is to the beginning of an array or other data structure the first element is an offset of zero. The language could hide this though. In fact, FORTRAN and several other languages do start at 1. I used to use a version of BASIC (Basic-Plus) where not only was the first element an index of 1 but a programmer could specify the low and high index of an array at what ever numbers they wanted. Individual characters in a string could be accessed as if they were in an array as well. The first letter was at index of 1. An index of zero held the length of the string.
So we don’t have to start at zero. Does it matter? I don’t have any studies or data (someone please research this for their PhD) but I suspect that starting from zero promotes off by one errors especially for novices.
Getting beginners to grok that the first element in an array is zero is a struggle. It seems like a great many of novice errors come from assuming that the first item is a list is item number 1. I see the same issue with loops. Going from zero to less that some value does not come as easy to beginners as from 1 to some value.
So there is my theory, starting from one is better than starting from zero and could reduce off by one errors. As the meme goes, prove me wrong.
6 comments:
It all depends on what you think the index means. It either means "the number of the thing in the sequence of things you re looking at" (which makes sensor for house numbers) or "the distance you have to travel once you get to the one at the start" (which makes sense for pointers).
I take great care to avoid saying things like "the first element has an index of zero" because "first" means one, and I've just contradicted myself. "The element at the start has an index of zero" is a nicer way to put it.
If you make the point that streets don't have a house number zero, but computer languages sometimes do you can kind of get away with it. Lots of things in life are the way they are for reasons that make little sense now, and I think you have to just sell array indexing as one of those things.
Zero is a tricky threshold concept in general. That's why schoolkids spend so much time using number lines to get to grips with addition and subtraction involving negative numbers. Zero indexing just seems to be of the same ilk, and is certainly an important topic to teach well so that students have a solid awareness. But treating it as something harmful seems overly zealous, like trying to redefine pi to a convenient value.
PS: Kudos for using "grok" in a sentence.
It is not only the pointer arithmetic, but also the bounds of integers the make zero a more natural choice.
Take 16 bit addressing, and a maximum array size of 65,536. With zero-indexing all ranges can be stored neatly in a 16-bit index. With 1-indexing, you need to bump up to 32-bit indices, which is wasteful for 99.847% of the values.
Interesting take. Over the years, I have taken both techniques - probably because my first programming language was Fortran. I suspect that the approach taken is largely based on the approach taken by the teacher. There is good discussion above.
But let me throw another concept at you. I'm thinking of CS in Grade 10-12 so that's my frame of reference. How many times does a student figure the program is good because it compiles and successfully executes? The "off by one" error is a teaching opportunity to actually check the output to ensure that the program is indeed correct. I see no problem with that. By verifying that it is correct, I feel like the student is all the better for the experience.
I have to smile at the concept; I've had programming teams at contests and written many problems for solution in contests and inevitably there is a problem or two designed to make it easy to be off by one!
Zero is great for mod math, which is a handy technique. It is useful as a "falsey" value. All of these useful tricks are just a bad substitute for usable human-friendly APIs and abstractions.
Still, even very experienced programmers spend many hours on off-by-one errors! I am in favor of 1-indexing. With enough practice you learn how to deal with 0 and how to bounce back and forth, but I wish more languages used 1 as a starting point.
I was shocked a few years ago to learn that most BASIC variants (Visual Basic included) actually support either 0 or 1 based arrays transparently. When you say "Dim X[10]", you actually get 11 elements, 0 through 10!
Post a Comment