CAPTCHA (an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”) is a type of challenge-response test used in computing to determine whether or not the user is human. CAPTCHA mostly used in registration or login pages. So our topic is Bypass Online CAPTCHA Using Python.
requires Optical Character Recognition (OCR) to get crack.so let me tell you first what is OCR,its a is a technology that allows you to convert scanned images of text into plain text.This makes your code to read the text and submit it into a login form just like a human action.Under Linux, “Tesseract” is the most accurate OCR.
To install Tesseract —->sudo apt-get install tesseract-ocr
How to use Tesseract for Image Preparing:
As you guys already know that Tesseract is not familiar about the format of its input images,as it accept TIFF images and compressed TIFF images are very complicated and the same goes for grey-scale and color images. So you’re better off with single-bit uncompressed TIFF images.
Parsing CAPTCHA image
Following are the samples that we are going to use along with some specification.You copy this images and store in folder “capat”(any folder you want but change the folder name in progarm from capta to your folder name)
- All images contain only 4 numbers [Written in English]
- There are no alphabet letters
- Number color is black
- There is no rotation for the numbers
- All the numbers are in a single line
The process to prepare them with GIMP is very simple:
- Go to the Image→Mode menu and make sure the image is in RGB or Grayscale mode.
- Select from the menu Tools→Color Tools→Threshold and choose an adequate threshold value.
- Select from the menu Image→Mode→Indexed and from the options choose 1-bit and no dithering.
- Save the image in TIFF format with a .tif extension
For more information on GIMP,how to install it and how to work with it ,goto https://help.ubuntu.com/community/TheGIMP
Gimp, can clean these images from noise and concentrate the numbers to be ready for OCR.The next step is apply threshold tuning for colors concentrating in Gimp as given in step 2, we got the following cleaned image
Now the output is ready to be used in OCR to print out the numbers. But each time we have to do the following steps to convert it,so we need script that will convert all the images into .tif format automatically.So the Fallowing script cap.py is given below
(“capat” is a folder name and “check” is also a folder name where all .tif image are generated)
from PIL import Image import os import time def captcha(): getlist = os.listdir("/home/mk/Desktop/capat/") print getlist number = int (len(getlist)) for cap in range(1,number+1): print convert(str(cap)) def convert(cap_name): img = Image.open('/home/mk/Desktop/capat/'+cap_name+'.jpg') img = img.convert("RGB") pixdata = img.load() for y in xrange(img.size): for x in xrange(img.size): if pixdata[x, y] < 90: pixdata[x, y] = (0, 0, 0, 255) for y in xrange(img.size): for x in xrange(img.size): if pixdata[x, y] < 136: pixdata[x, y] = (0, 0, 0, 255) for y in xrange(img.size): for x in xrange(img.size): if pixdata[x, y] > 0: pixdata[x, y] = (255, 255, 255, 255) ext = ".tif" img.save("/home/mk/Desktop/check/"+cap_name + ext) command = "tesseract -psm 7 /home/mk/Desktop/check/"+cap_name +".tif "+"/home/mk/Desktop/text_captcha" os.system(command) time.sleep(1) Text = open ("/home/mk/Desktop/text_captcha.txt","r") decoded = Text.readline().strip('\n') if decoded.isdigit(): print '[+}CAPTCHA number are ' + decoded else: print '[-] Error : Not able to decode' captcha()
In Above code the function “captcha” load the image into img object, then convert it into RGB mode.Then the above 3 iterators process the images and convert the number more bold and cleaned the background to white and save the images in “.tif” format
Now run the above code in terminal
mk@mk-System-Product-Name:~/Desktop$ python cap.py
You will find the output in folder check with a clean images and the number that appears in images are output in the terminal
Hope guys you will understand the small concept of hacking with Python clearly.Try to implement it with more Pattern in code, and if you find any difficulty, feel free to drop a comment on this post.