With the goal of creating a tool that can help security professionals and developers to test their CAPTCHA schemes, I conducted a research on over 200 high traffic websites and several CAPTCHA service providers listed on Quantcast’s Top 1 Million Ranking Websites.
During the same time frame, students at the Stanford University also conducted a similar research (PDF). Both research works concluded the obvious:
An alarming number of CAPTCHAs schemes are vulnerable to automated attacks.
I looked around, tested and zeroed in on Tesseract-OCR as my OCR engine. To remove color complexities, spatial irregularities, and other types of random noise from CAPTCHAs, I decided to write my own image preprocessing engine. After a few months of research, coding and testing in my spare time, TesserCap was born and is ready for release now.
TesserCap is a GUI based, point and shoot CAPTCHA analysis tool with the following features:
*This accuracy maybe further increased by training the Tesseract-OCR engine for the CAPTCHAs under test.
Reddit
During the same time frame, students at the Stanford University also conducted a similar research (PDF). Both research works concluded the obvious:
An alarming number of CAPTCHAs schemes are vulnerable to automated attacks.
I looked around, tested and zeroed in on Tesseract-OCR as my OCR engine. To remove color complexities, spatial irregularities, and other types of random noise from CAPTCHAs, I decided to write my own image preprocessing engine. After a few months of research, coding and testing in my spare time, TesserCap was born and is ready for release now.
TesserCap is a GUI based, point and shoot CAPTCHA analysis tool with the following features:
- A generic image preprocessing engine that can be configured as per the CAPTCHA type being analyzed.
- Tesseract-OCR as its OCR engine to retrieve text from preprocessed CAPTCHAs.
- Web proxy support
- Support for custom HTTP headers to retrieve CAPTCHAs from websites that require cookies or special HTTP headers in requests
- CAPTCHA statistical analysis support
- Character set selection for the OCR Engine
Downloads
TesserCap and it's user manual can be downloaded from one of the following locations:- http://www.opensecurityresearch.com/files/tessercap.zip -- No password protection on this zip file
- http://www.mcafee.com/us/downloads/free-tools/tessercap.aspx -- Use password as "foundstone" without quotes to extract this zip file.
Results
The two tables below summarize the CAPTCHA analysis performed using TesserCap for few popular websites and some CAPTCHA service providers. All these tests were performed using TesserCap’s image preprocessing module and Tesseract-OCR’s default training data.| Website | Accuracy* | Quantcast Rank |
| wikipedia | 20-30% | 7 |
| ebay | 20-30% | 11 |
| reddit.com | 20-30% | 68 |
| CNBC | 50+% | 121 |
| foodnetwork.com | 80-90% | 160 |
| dailymail.co.uk | 30+% | 245 |
| megaupload.com | 80+% | 1000 |
| pastebin.com | 70-80% | 32,534 |
| cavenue.com | 80+% | 149,645 |
| CAPTCHA Provider | Accuracy* |
| captchas.net | 40-50% |
| opencaptcha.com | 20-30% |
| snaphost.com | 60+% |
| captchacreator.com | 10-20% |
| www.phpcaptcha.org | 10-20% |
| webspamprotect.com | 40+% |
| ReCaptcha | 0% |
*This accuracy maybe further increased by training the Tesseract-OCR engine for the CAPTCHAs under test.






2 comments:
Hi,
Very interesting article!!! we are faced with a similar situation.. Could you help advising; what do you think are best Captcha solutions available out there? Thanks!
Regards,
Vamsi
vamsic@ivycomptech.com
Hi Vamsi,
I suggest you guys look at deploying google's reCAPTCHA or microsoft's asirra on to your website. reCAPTCHA has been tested quite comprehensively and google often updates the CAPTCHA generation algorithms. ASIRRA (http://research.microsoft.com/en-us/um/redmond/projects/asirra/) is a new initiative by microsoft that basically uses animal (cats and dogs) images. Theses two are free and there are several paid CAPTCHA providers out there which you can use in you application.
Post a Comment