Turing Tests: Humans or Intelligent Machines?
March 26, 2010 2 Comments
In the 1950s, the article “Computing Machinery and Intelligence” by Turing discussed conditions under which a machine can be considered intelligent. In today’s web technology, Turing’s Test is being used to make sure that the request sent to the web server is generated by a human and not by a machine and this is done by using CAPTCHA test or challenge. CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
CAPTCHA simply puts a challenge that most humans can solve but computers can’t unless a very intelligent fuzzy logic or heavy artificial intelligence is deployed. The trend in websites is to implement CAPTCHA as a picture that is difficult for OCRs to read but could be easily identified by humans.
But several websites and especially “should-be-highly-available” websites are implementing poor CAPTCHA’s – this includes online trading websites, online banking websites and others.
Examples of CAPTCHA’s:
The above CAPTCHA’s can be considered poor because they can be easily recognized by an OCR. The first example showing random numbers with the same color, same alignment and same style. The second example is more obvious having the letters highlighted in a very clear manner.
Now, we notice that the CAPTCHA that Google use is the most fuzziest.
But the drawbacks of Google’s CAPTCHA is that humans also cannot distinguish the letters most of the times. For example in the above CAPTCHA we do not know whether the word is bidasters or biclasters. Many often users have to refresh the page to try other CAPTCHA’s to be able to send their web request.
I propose a more intelligent Turing Test. A test with no random characters or numbers to recognize. A test that cannot be recognized by OCRs and is not fuzzy for humans.
The test will be based on picture questions. For example, we can ask the user how many balls do you see in this picture or to be more specific count the red balls in the picture. We can have a database of around hundreds of questions each being rendered on the web page with the same file name, so that learning the answer to a machine and relating it to the file name will not happen. Moreover, we can do some image processing and update some images randomly for example adding some balls to the “Count the balls in the picture” question and thus having more random answers.
The test may contain the following type of questions:
- Different counting techniques (count the balls, count the rods, count the fingers..)
- Specify the color (for example show a picture with colors and the question will be what is the color of the boy’s pants in the picture)
- Specify the shape (for example showing a picture full of shapes the question will be what is the shape of the boy’s face in the picture)
- String Manipulation like “please write this string in reverse order or write this string again skipping the vowels…)
As noticed the answer range for each question is not wide as a random string or a random number, so you may think that this technique is weak against brute forcing. But in fact, brute forcing needs to send a request for every trial and a CAPTCHA on every request should be changed to answer random questions. Nevertheless, our answer space should be big enough to lower the probability of guessing and here we are talking about thousands of answers, other than that this method will render useless.
The only drawback for this method is the accessibility in a way that it depends on vision (seeing pictures to get the answer). People with vision disabilities will not be able to use this Turing Test. In this case, we will refer them to the Audio Turing Test were characters are pronounced and the user has to write them.
The cost of this method should not be much, I will try next time to compile a database of answers for you with the needed randomness and a demo site showing how much easy and more secure this method can be.