Sunday, October 9, 2011

No Monkey Business: Typing Shakespeare

 By Carl Bialik Wall Street Journal)  My print column examined a computer programmer’s effort to simulate the thought experiment of putting many monkeys over many years to the task of bashing keyboards, with the hope their random output would eventually include the works of Shakespeare. This effort was successful mainly because it changed the rules, requiring only that monkey-like software match nine-letter strings from Shakespeare’s works, rather than whole scenes, let alone acts or plays.

That made it the target for some skepticism from academic researchers. “This bit of silliness deserves no attention from anyone,” said Jeffrey Shallit, a computer scientist at the University of Waterloo. “It has nothing to do with evolution and it is of absolutely no interest mathematically.”

“It is of no scientific interest,” said Richard Stanley, an applied mathematician at the Massachusetts Institute of Technology. He said that if nine-letter strings are good enough, why not two letters? “If you waited until the monkey typed just ‘to,’ then waited until he/she typed just ‘be,’ etc., of course this would take much less time, but I would not consider this typing the sentence ‘To be or not to be, that is the question,’ ” Stanley wrote in an email. “You might as well do just one letter at a time, and it would be even faster but would accomplish nothing.”

“The ability of the random generator to do this is highly dependent on the size of the blocks” it counts as successful matches, added Robert Simon, a mathematician at the London School of Economics.

“I think this is about the easiest task for monkeys that can be imagined, and saying that they have typed the text is a stretch of the imagination,” said Brendan McKay, a computer scientist at Australian National University Canberra.

The project’s creator, Reno, Nev., software engineer Jesse Anderson, defended it for its ability to shed light on the scope of infinity. “It helps you wrap your mind about what would happen if you had infinite resources, and an infinite amount of time,” Anderson said. “We have a hard time thinking in terms of numbers that big.”

“He’s showing the power of raw computation,” said Dave Thomas, president of New Mexicans for Science and Reason. “It illustrates that monkeys randomly typing stuff eventually produce real words.”

Greg McColm, a mathematical logician at the University of South Florida, said the real point of the thought experiment is to illustrate that it is far more likely the monkeys might produce some piece of writing that is coherent, than a specific one specified before they are put to work. “In real life, the point is that a large number of independent events (like throwing darts at a dartboard) will almost always generate a pattern different from any previously selected pattern,” McColm wrote in an email. “Of course, afterwards, you can say that the darts form a horse and a pirate, but you are not going to start by saying you are going to form a horse and a pirate and then throw the darts (at random) and get a horse and a pirate.”

Anderson’s experiment did illustrate one intriguing mathematical concept, the coupon collector’s problem.

This problem involves determining how many random numbers one would have to generate to have a reasonable expectation of getting every number in a set. The answer is it would take far more than the total number, because there would be many duplicates on the way to completeness. There are about 5.4 trillion possible nine-letter strings using the 26-letter English alphabet, but it would take generating about 162 trillion, on average, to get all of them, and 184 trillion to have a 99% chance of getting all of them, according to Tobias Friedrich, a computer scientist at the Max Planck Institute for Informatics in Germany. Since Anderson didn’t need all of them, only those in Shakespeare, he needed just 7.4 trillion random strings to succeed at his self-assigned task.

A previous effort to simulate Shakespeare online was actually a simulation of a simulation, said its creator, Nick Hoggard, of Lund, Sweden. “It ran on individual PCs that didn’t really have much power and I wanted those who had a slow PC to have an equal chance,” Hoggard said. “So my webpage didn’t actually do all the calculations but started off from a base where it assumed a number of calculations had already been done. 

This base progressed every day to reflect increasing numbers of monkeys.” The goal, he said, was “to try and attract visitors to a website and from there try to earn money through advertising.” It didn’t attract enough visitors, and the site is no longer online, though it did help inform Anderson’s effort.

A real-life experiment testing the monkey hypothesis that proceeded both of these simulated efforts pointed out a flaw in them, though: Plymouth University researchers in 2002 found that six zoo monkeys didn’t hit random keys on an iMac, but instead pounded away on certain favorites, particularly “S.”

While the monkeys liked some letters more than others, they preferred other parts of the enclosure to the computer. “They became much more interested in each others’ bits and pieces than in the computer,” said Mike Phillips, professor of interdisciplinary arts at Plymouth, who led the research. The iMac also was a target for monkey excretion as much as monkey typing.

Anderson said he is aware of the experiment and plans to incorporate the results into a future simulation. His virtual monkeys, which he increasingly thinks of as real ones, would somehow be incentivized to hit keys randomly. “I think you would have to train the monkeys or give them some kind of reward for doing something,” Anderson said. “Just putting [a computer] in a room with them, they’re not going to be that interested in it.”

No comments: