Archive for February, 2010

Breaking Weak CAPTCHA in 26 Lines of Code

February 23rd, 2010

During one of our latest engagements we found a weak CAPTCHA implementation being used in the target Web application. The assessment was being performed on-site, and after identifying this vulnerability we started to talk with the CSO about how easy it would be to break it.


The general consensus of course was “very easy”. The problem was that we were unable to find any good CAPTCHA breaking software that average joe could download and run on his computer; so I spent some minutes creating a simple Python script that returns the CAPTCHA solution for this particular implementation.

Before we dig into the script, lets analyze why this CAPTCHA is weak (might not be obvious for some readers):

  1. The letters are not rotated
  2. All letters have the same height
  3. All letters have the exact same color
  4. The letters are not deformed in any way
  5. The background noise color is the same for the whole image

Now, lets see the code that breaks this CAPTCHA:

from PIL import Image

img ='input.gif')
img = img.convert("RGBA")

pixdata = img.load()

# Clean the background noise, if color != black, then set to white.
for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y] != (0, 0, 0, 255):
            pixdata[x, y] = (255, 255, 255, 255)"input-black.gif", "GIF")

#   Make the image bigger (needed for OCR)
im_orig ='input-black.gif')
big = im_orig.resize((116, 56), Image.NEAREST)

ext = ".tif""input-NEAREST" + ext)

#   Perform OCR using pytesser library
from pytesser import *
image ='input-NEAREST.tif')
print image_to_string(image)

This simple script works with ~ 90% of the CAPTCHA images created using this specific implementation. Enjoy!

andres.riancho bonsai, security , , , ,