Monday, May 08, 2017

Pseudo Random Numbers Are Confusing

Random numbers are really useful in computer science. We use them a lot for simulations (or games – pretty much the same thing) and to make programs more interesting. But they are not as simple as might appear. First off they are not really random and secondly there are often timing issues. And then there is the whole testing situation. randomnumber_wordle

Let’s start with the first of these. Those of us who understand random numbers in CS understand that it is impossible for computers to calculate truly random numbers. Cryptography which really relies on true randomness uses all sorts of tricks and tools to generate groups of random numbers. No serious cryptography person would use a standard library for use in cryptography. But for most things we do  in high school computer science built in library random number tools are just fine.

Where things have gotten complicated for my students lately is with timing issues. The libraries we are using use time from the computer’s clock as a seed value for the generator. If you instantiate two different random number objects at the same time, or rather within a single clock interval, you get two lists with the same values. Whoops! This is pretty much the opposite of what students really want. They’re generally better with one variable than with two but that is not obvious to many beginners. After all if one is good two is better right? And even if you have one variable if you instantiate it over and over again in a tight loop you can also get a lot of duplicates. Various languages have various ways around this (static variables for example) but hardly obvious to a student who just heard (or read) about generating random numbers.

And then there is testing. Oh my. Seldom a student strong suit. After all if it compiles isn’t it right? The special problem with random numbers is how does one test if the data is different all the time? Visual Basic has a Randomize statement that makes things “more” random. If one doesn’t use it than the program will use the same random numbers every time. In other languages (libraries really) one can use a fixed seed value to get the same set of random values for each run. How ever one does it, it is useful to have a known set of data for testing purposes. Talking about this is an important part of the larger issues of understanding random numbers AND of testing program.

One of my tasks for this summer is going to be trying to document this better for students. I realize that I need to cover this in more depth if students are going to really be able to use random numbers effectively. I can only talk so much in a single semester course. Perhaps if I provide supplementary reading for homework it will help. It’s a theory.

Maybe I’ll bring random.org into the mix. Looks like some interesting tools there.

No comments: