This blog post was voted as 8th best in Top 10 Web Hacking Techniques of 2011 poll.
With the goal of creating a tool that can help security professionals and developers to test their CAPTCHA schemes, I conducted a research on over 200 high traffic websites and several CAPTCHA service providers listed on Quantcast’s Top 1 Million Ranking Websites.
During the same time frame, students at the Stanford University also conducted a similar research (PDF). Both research works concluded the obvious:
An alarming number of CAPTCHAs schemes are vulnerable to automated attacks.
I looked around, tested and zeroed in on Tesseract-OCR as my OCR engine. To remove color complexities, spatial irregularities, and other types of random noise from CAPTCHAs, I decided to write my own image preprocessing engine. After a few months of research, coding and testing in my spare time, TesserCap was born and is ready for release now.
TesserCap is a GUI based, point and shoot CAPTCHA analysis tool with the following features:
*This accuracy maybe further increased by training the Tesseract-OCR engine for the CAPTCHAs under test.
Reddit
With the goal of creating a tool that can help security professionals and developers to test their CAPTCHA schemes, I conducted a research on over 200 high traffic websites and several CAPTCHA service providers listed on Quantcast’s Top 1 Million Ranking Websites.
During the same time frame, students at the Stanford University also conducted a similar research (PDF). Both research works concluded the obvious:
An alarming number of CAPTCHAs schemes are vulnerable to automated attacks.
I looked around, tested and zeroed in on Tesseract-OCR as my OCR engine. To remove color complexities, spatial irregularities, and other types of random noise from CAPTCHAs, I decided to write my own image preprocessing engine. After a few months of research, coding and testing in my spare time, TesserCap was born and is ready for release now.
TesserCap is a GUI based, point and shoot CAPTCHA analysis tool with the following features:
- A generic image preprocessing engine that can be configured as per the CAPTCHA type being analyzed.
- Tesseract-OCR as its OCR engine to retrieve text from preprocessed CAPTCHAs.
- Web proxy support
- Support for custom HTTP headers to retrieve CAPTCHAs from websites that require cookies or special HTTP headers in requests
- CAPTCHA statistical analysis support
- Character set selection for the OCR Engine
Downloads
TesserCap and it's user manual can be downloaded from one of the following locations:- http://www.opensecurityresearch.com/files/tessercap.zip -- No password protection on this zip file
- http://www.mcafee.com/us/downloads/free-tools/tessercap.aspx -- Use password as "foundstone" without quotes to extract this zip file.
Results
The two tables below summarize the CAPTCHA analysis performed using TesserCap for few popular websites and some CAPTCHA service providers. All these tests were performed using TesserCap’s image preprocessing module and Tesseract-OCR’s default training data.Website | Accuracy* | Quantcast Rank |
wikipedia | 20-30% | 7 |
ebay | 20-30% | 11 |
reddit.com | 20-30% | 68 |
CNBC | 50+% | 121 |
foodnetwork.com | 80-90% | 160 |
dailymail.co.uk | 30+% | 245 |
megaupload.com | 80+% | 1000 |
pastebin.com | 70-80% | 32,534 |
cavenue.com | 80+% | 149,645 |
CAPTCHA Provider | Accuracy* |
captchas.net | 40-50% |
opencaptcha.com | 20-30% |
snaphost.com | 60+% |
captchacreator.com | 10-20% |
www.phpcaptcha.org | 10-20% |
webspamprotect.com | 40+% |
ReCaptcha | 0% |
*This accuracy maybe further increased by training the Tesseract-OCR engine for the CAPTCHAs under test.
13 comments:
Hi,
Very interesting article!!! we are faced with a similar situation.. Could you help advising; what do you think are best Captcha solutions available out there? Thanks!
Regards,
Vamsi
vamsic@ivycomptech.com
Hi Vamsi,
I suggest you guys look at deploying google's reCAPTCHA or microsoft's asirra on to your website. reCAPTCHA has been tested quite comprehensively and google often updates the CAPTCHA generation algorithms. ASIRRA (http://research.microsoft.com/en-us/um/redmond/projects/asirra/) is a new initiative by microsoft that basically uses animal (cats and dogs) images. Theses two are free and there are several paid CAPTCHA providers out there which you can use in you application.
i am trying to use it, but at the main tab i enter a url and after a few seconds it just says test completed but nothing changes/shows? did i do something wrong?
thanks
@Anonymouse, please check the logs. They may have some additional information if there is an error condition.
6/7/2012 11:42:12 PM
System.UriFormatException: Invalid URI: The URI is empty.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at TesserCap.Misc.ValidUrlFormat(String url)
6/7/2012 11:44:11 PM
System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at TesserCap.Misc.Retrieve200OKContent(String url, String proxyAddress, String proxyPort, Boolean followRedirect, String headers)
6/7/2012 11:44:16 PM
System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at TesserCap.Misc.Retrieve200OKContent(String url, String proxyAddress, String proxyPort, Boolean followRedirect, String headers)
6/7/2012 11:44:36 PM
System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at TesserCap.Misc.Retrieve200OKContent(String url, String proxyAddress, String proxyPort, Boolean followRedirect, String headers)
6/7/2012 11:44:48 PM
System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at TesserCap.Misc.Retrieve200OKContent(String url, String proxyAddress, String proxyPort, Boolean followRedirect, String headers)
6/7/2012 11:52:13 PM
System.UriFormatException: Invalid URI: The format of the URI could not be determined.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at TesserCap.Misc.ValidUrlFormat(String url)
6/7/2012 11:53:14 PM
System.ArgumentException: Parameter is not valid.
at System.Drawing.Image.FromStream(Stream stream, Boolean useEmbeddedColorManagement, Boolean validateImageData)
at TesserCap.Misc.IsImage(Byte[] img)
6/7/2012 11:54:10 PM
System.ArgumentException: Parameter is not valid.
at System.Drawing.Image.FromStream(Stream stream, Boolean useEmbeddedColorManagement, Boolean validateImageData)
at TesserCap.Misc.IsImage(Byte[] img)
6/7/2012 11:59:56 PM
System.ArgumentException: Specified value has invalid HTTP Header characters.
Parameter name: name
at System.Net.WebHeaderCollection.CheckBadChars(String name, Boolean isHeaderValue)
at System.Net.WebHeaderCollection.Add(String name, String value)
at TesserCap.Misc.AddHeaders(HttpWebRequest q, String headers)
6/7/2012 11:59:56 PM
System.ArgumentException: Specified value has invalid HTTP Header characters.
Parameter name: name
at System.Net.WebHeaderCollection.CheckBadChars(String name, Boolean isHeaderValue)
at System.Net.WebHeaderCollection.Add(String name, String value)
at TesserCap.Misc.AddHeaders(HttpWebRequest q, String headers)
6/7/2012 11:59:56 PM
System.ArgumentException: Specified value has invalid HTTP Header characters.
Parameter name: name
at System.Net.WebHeaderCollection.CheckBadChars(String name, Boolean isHeaderValue)
at System.Net.WebHeaderCollection.Add(String name, String value)
at TesserCap.Misc.AddHeaders(HttpWebRequest q, String headers)
6/7/2012 11:59:56 PM
System.ArgumentException: Specified value has invalid HTTP Header characters.
Parameter name: name
at System.Net.WebHeaderCollection.CheckBadChars(String name, Boolean isHeaderValue)
at System.Net.WebHeaderCollection.Add(String name, String value)
at TesserCap.Misc.AddHeaders(HttpWebRequest q, String headers)
hmm?
It appears that the .Net HTTP library finds the header values that you are supplying to TesserCap as invalid. Does your application custom HTTP headers?
i am confused.. is it possible if i can talk to you on windows live messenger or something for a few minutes for some help?
Can you host the CAPTCHAs somewhere and share the URL?
http://freepicupload.com/images/735workimage_tn.jpg they are just like that?
can i just explain what i have been trying to do.
i am trying to make a bot/script for a game with a every 15 minute a 4 number captcha
So check these sample settings. The results arent accurate for the sample you sent, but you will get the idea on removing the noise.
http://freepicupload.com/images/387sample_settings.png
Very nice, how did you get that picture working?
i also and not sure about everything else, i have been reading yours and other peoples blogs and i am just more confused on how to do this..
I have errors when I try to launch it. All errors are with connection to server with capcha by URL. SSL renegotiation, and unexpected errors on transmission. When i try to set proxy - I have no traffic on proxy, but I have errors. What to do with it?
Post a Comment