Bypass Online CAPTCHA Using Python Language

CAPTCHA (an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”) is a type of challenge-response test used in computing to determine whether or not the user is human. CAPTCHA mostly used in registration or login pages. So our topic is Bypass Online CAPTCHA Using Python.

CAPTCHA

requires Optical Character Recognition (OCR) to get crack.so let me tell you first what is OCR,its a is a technology that allows you to convert scanned images of text into plain text.This makes your code to read the text and submit it into a login form just like a human action.Under Linux, “Tesseract” is the most accurate OCR.
To install Tesseract —->sudo apt-get install tesseract-ocr
How to use Tesseract for Image Preparing:
As you guys already know that Tesseract is not familiar about the format of its input images,as it accept TIFF images and compressed TIFF images are very complicated and the same goes for grey-scale and color images. So you’re better off with single-bit uncompressed TIFF images.

Parsing CAPTCHA image
Following are the samples that we are going to use along with some specification.You copy this images and store in folder “capat”(any folder you want but change the folder name in progarm from capta to your folder name)

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

CAPTCHA

  • All images contain only 4 numbers [Written in English]
  • There are no alphabet letters
  • Number color is black
  • There is no rotation for the numbers
  • All the numbers are in a single line

The process to prepare them with GIMP is very simple:

  1. Go to the Image→Mode menu and make sure the image is in RGB or Grayscale mode.
  2. Select from the menu Tools→Color Tools→Threshold and choose an adequate threshold value.
  3. Select from the menu Image→Mode→Indexed and from the options choose 1-bit and no dithering.
  4. Save the image in TIFF format with a .tif extension

For more information on GIMP,how to install it and how to work with it ,goto https://help.ubuntu.com/community/TheGIMP
Gimp, can clean these images from noise and concentrate the numbers to be ready for OCR.The next step is apply threshold tuning for colors concentrating in Gimp as given in step 2, we got the following cleaned image
CAPTCHA

Now the output is ready to be used in OCR to print out the numbers. But each time we have to do the following steps to convert it,so we need script that will convert all the images into .tif format automatically.So the Fallowing script cap.py is given below
(“capat” is a folder name and “check” is also a folder name where all .tif image are generated)

 

from PIL import Image

import os
import time

def captcha():

getlist = os.listdir("/home/mk/Desktop/capat/")
print getlist
number = int (len(getlist))
for cap in range(1,number+1):
print convert(str(cap))

def convert(cap_name):

img = Image.open('/home/mk/Desktop/capat/'+cap_name+'.jpg')
img = img.convert("RGB")
pixdata = img.load()
for y in xrange(img.size[1]):
for x in xrange(img.size[0]):
if pixdata[x, y][0] < 90:
pixdata[x, y] = (0, 0, 0, 255)
for y in xrange(img.size[1]):
for x in xrange(img.size[0]):
if pixdata[x, y][1] < 136:
pixdata[x, y] = (0, 0, 0, 255)
for y in xrange(img.size[1]):
for x in xrange(img.size[0]):
if pixdata[x, y][2] > 0:
pixdata[x, y] = (255, 255, 255, 255)
ext = ".tif"
img.save("/home/mk/Desktop/check/"+cap_name + ext)

command = "tesseract -psm 7 /home/mk/Desktop/check/"+cap_name +".tif "+"/home/mk/Desktop/text_captcha"
os.system(command)
time.sleep(1)

Text = open ("/home/mk/Desktop/text_captcha.txt","r")
decoded = Text.readline().strip('\n')
if decoded.isdigit():
print '[+}CAPTCHA number are ' + decoded
else:
print '[-] Error : Not able to decode'
captcha()

 

In Above code the function “captcha” load the image into img object, then convert it into RGB mode.Then the above 3 iterators process the images and convert the number more bold and cleaned the background to white and save the images in “.tif” format
Now run the above code in terminal
mk@mk-System-Product-Name:~/Desktop$ python cap.py
You will find the output in folder check with a clean images and the number that appears in images are output in the terminal
CAPTCHA

Hope guys you will understand the small concept of hacking with Python clearly.Try to implement it with more Pattern in code, and if you find any difficulty, feel free to drop a comment on this post.

6 Comments

  1. Kamal Thakur April 10, 2016
  2. смотреть May 6, 2016
  3. порно May 9, 2016
  4. Theodore Stiens May 17, 2016