Showing posts with label CAPTCHA. Show all posts
Showing posts with label CAPTCHA. Show all posts

Tuesday, October 2, 2012

Bypassing CAPTCHAs by Impersonating CAPTCHA Providers

CAPTCHA service providers validate millions of CAPTCHAs each day and protect thousands of websites against the bots. A secure CAPTCHA generation and validation ecosystem forms the basis of the mutual trust model between the CAPTCHA provider and the consumer. A variety of damage can occur if any component of this ecosystem is compromised.

During Analysis of the CAPTCHA integration libraries provided by several CAPTCHA providers (including reCAPTCHA) revealed that almost all of the CAPTCHA verification API’s relied on plain text HTTP protocol to perform CAPTCHA validation. Because of this, the CAPTCHA provider’s identity was not validated, message authentication checks were not performed and the entire CAPTCHA validation was performed on an unencrypted channel. This vulnerability was also reported to reCAPTCHA team several months back. 

If you decompile the .NET Plugin, you'll be able to pull out reCAPTCHA's verification URL, which demonstrates the absense of HTTPS:

In the current scenario, two types of attacks can be launched against vulnerable CAPTCHA implementations. These attacks are based on the assumption that an attacker is able to intercept the CAPTCHA validation traffic between target website and the CAPTCHA provider.

Private Key Compromise
Most of CAPTCHA providers issue private and public keys to identify a particular consumer and to enforce an upper limit on the number of CAPTCHAs used by them. Private keys are often sent over to the CAPTCHA provider during the CAPTCHA validation process. If the public and private keys are sent using plain text HTTP, an attacker could sniff the private keys and:

Use the CAPTCHA service for without registering for the service by using the captured keys.
Exhaust the target web site’s CAPTCHA quota for the service, which depending on the CAPTCHA provider may cause a wide variety of unexpected issues.

The CAPTCHA Clipping Attack
The following image describes what I call the "CAPTCHA Clipping Attack". Notice that steps 5 and 6 in blue would be the normal operation of events. We'll go into the attack in a little more detail below.

Since the website’s application server acts as a client to CAPTCHA provider during steps 5 and 6 (in blue) and the application server often neglects to validate the CAPTCHA provider’s identity and the session integrity checks, an attacker may be able to impersonate the CAPTCHA provider and undermine the anti-automation protection (steps 5 and 6 in red). CAPTCHA validation responses are mostly Boolean (true or false, success or failure, pass or fail, 0 or 1). The response format and its contents are also publicly available as part of CAPTCHA provider’s API documentation. This allows an attacker to easily construct the finite set of possible responses, impersonate the CAPTCHA provider, and perform malicious CAPTCHA validation for the application servers. 

To exploit this vulnerability an attacker performs the following:

  1. The attacker acts as a legitimate application user and submits a large number of requests to the web application.
  2. At the same time, he/she intercepts CAPTCHA validation requests, masquerades as the CAPTCHA provider and approves all submitted requests.

Masquerading as the CAPTCHA provider and not forwarding the CAPTCHA validation requests to the actual CAPTCHA provider is the CAPTCHA Clipping Attack.

clipcaptcha is a proof of concept exploitation tool that specifically targets the vulnerabilities discussed above and allows complete bypass of CAPTCHA provider protection. clipcaptcha is built on the sslstrip codebase and has the following features:

  1. Performs signature based CAPTCHA provider detection and clipping.
  2. Can be easily extended to masquerade as any CAPTCHA provider by adding corresponding signatures to the configuration XML file.
  3. Has built in signatures of several CAPTCHA providers including reCAPTCHA, OpenCAPTCHA, Captchator etc…
  4. Logs POST requests that match any supported CAPTCHA provider to capture private and public keys. Unmatched requests are forwarded as is.
  5. clipcaptcha supports five operational modes. These are “monitor”, “stealth”, “avalanche”, “denial of service” and “random”.

clipcaptcha can be downloaded here 

This blog post is a copy of my original post here

Oct 7, 2012 Update: 
The complete whitepaper is available for download from here.

Friday, March 2, 2012

CAPTCHA Re-Riding Attack

This attack was voted at #8 in Top Ten Web Hacking Techniques of 2012

CAPTCHA Re-Riding Attack bypasses the CAPTCHA protection built into the web applications. The attack exploits the fact that the code that verifies CAPTCHA solutions sent by the user during form submissions does not clear the CAPTCHA solution from the HTTP Session. 

Impact: A large number of successful submissions on CAPTCHA protected pages by riding on a single CAPTCHA solution. 

A typical scenario to demonstrate the vulnerability is explained below. 
1.       A user visits register page of the website.
2.       The website creates an HTTP session, assigns it a SESSIONID and returns the register page to the user along with the SESSIONID cookie. The register page also contains one image tag which directs the browser to retrieve a CAPTCHA and display it on screen.
3.       Upon parsing the image tag, the browser sends out request for the CAPTCHA.
4.       The server side code creates a new CAPTCHA with random text and CAPTCHA solution is stored in the HTTP session.
5.       CAPTCHA image is then sent to the client and is then displayed by the browser.
6.       Browser sends CAPTCHA solution along with form fields for verification.
7.       Server side code retrieves CAPTCHA solution from the HTTP Session and verifies it against the solution provided by the client.
8.       If verification is successful, client is sent to next logical step in the registration process. If not, client is redirected to the register page (step 1 above).

Figure 1: Image shows an example Register page that supports CAPTCHA

Analysis of the CAPTCHA generation and verification process reveals the following:
  1. The captcha.php is the only page responsible for updating the HTTP session with correct CAPCHA solution. The first ingredient.
  2. CAPTCHA solution inside the HTTP session is not explicitly cleared during the verification process. Yes, you guess it right. This is the second and the most important ingredient for CAPTCHA Re-Riding Attacks.
  3. When registration fails (for any reason), the web applications continue to use the same HTTP session and SESSIONID. We will not look into this further.
  4. When registration succeeds, the user is redirected to next step and the CAPTCHA generation page (/captcha.php) is not likely to be called for current SESSION again.  This allows the CAPTCHA solution to stay in the HTTP store for as long as SESSION is valid. Following are the likely scenarios to be seen when CAPTCHA verification is successful.
    1. The web application generates a new SESSIONID for the same HTTP session for known security reasons. This implementation is most likely to be seen. Combine this behavior with first and second ingredients above and you have a successful CAPTCHA Re-Riding attack.
    2. The web application continues to use the same SESSIONID for the same HTTP session.  Here we have more things to worry than just the CAPTCHA. For now, combine this behavior with first and second ingredients above and you have a successful CAPTCHA Re-Riding attack again.
    3. The web application generates a completely new HTTP session with new or same SESSIONID. For CAPTCHA Re-riding Attack, this scenario is not exploitable.

For scenarios 4.a and 4.b, the HTTP Session continues to hold the CAPTCHA solution as it is not explicitly cleared by the CAPTCHA verification code. Since /captcha.php is not going to be called again (and we will not let the call happen anyway), the same CAPTCHA solution continues to exist in HTTP session. Let us now see how 4.a & 4.b scenarios above can be exploited to make multiple successful submissions using a CAPTCHA solution.

Exploiting Scenario 4.b:
1.       Load the register page of the target website in a web browser.
2.       Solve the CAPTCHA manually, and submit the form.
3.       Record this form submission using a web proxy. This request contains a valid SESSIONID, valid form fields and a valid CAPTCHA solution.
4.       Create a custom script or use any tool like Burp intruder that can repeatedly send this request to server. With each request change the unique values (like User ID) to create multiple new accounts with a single CAPTCHA solution.

Exploiting Scenario 4.a:
1.       Load the register page of the target website in a web browser.
2.       Solve the CAPTCHA manually, and submit the form.
3.       To make things easy, trap this request in a web proxy and do not allow it to reach the web server. This request contains a valid SESSIONID, valid form fields and a valid CAPTCHA solution.
4.       Create a custom script or use any tool like Burp intruder that can repeatedly send this request to server.
5.       Submit one request.
6.       Upon successful submission, the web application will reset the current SESSIONID and send new SESSIONID back in response headers.
7.       Change the value of SESSIONID in recorded request (step 3) to the value copied from response in Step 6 above.
8.       Go to step 5.
9.     We will be able to make multiple successful submissions with single CAPTCHA solution.

Using one time tokens along with CAPTCHAs on the register pages may still be exploitable with a few additional lines of attack code. The best defense is to reset CAPTCHA solution inside the HTTP session during the CAPTCHA verification stage. It is also important to note that when a website relies on third party CAPTCHA  provider it does not maintain any session information at its end and CAPTCHA is performed by the CAPTCHA provider and these websites are not vulnerable to CAPTCHA Re-Riding Attack.

Thursday, November 17, 2011

CAPTCHA Hax With TesserCap

This blog post was voted as 8th best in Top 10 Web Hacking Techniques of 2011 poll.

With the goal of creating a tool that can help security professionals and developers to test their CAPTCHA schemes, I conducted a research on over 200 high traffic websites and several CAPTCHA service providers listed on Quantcast’s Top 1 Million Ranking Websites.

During the same time frame, students at the Stanford University also conducted a similar research (PDF). Both research works concluded the obvious:

An alarming number of CAPTCHAs schemes are vulnerable to automated attacks.

I looked around, tested and zeroed in on Tesseract-OCR as my OCR engine. To remove color complexities, spatial irregularities, and other types of random noise from CAPTCHAs, I decided to write my own image preprocessing engine. After a few months of research, coding and testing in my spare time, TesserCap was born and is ready for release now.

TesserCap is a GUI based, point and shoot CAPTCHA analysis tool with the following features:
  1. A generic image preprocessing engine that can be configured as per the CAPTCHA type being analyzed.
  2. Tesseract-OCR as its OCR engine to retrieve text from preprocessed CAPTCHAs.
  3. Web proxy support
  4. Support for custom HTTP headers to retrieve CAPTCHAs from websites that require cookies or special HTTP headers in requests
  5. CAPTCHA statistical analysis support
  6. Character set selection for the OCR Engine
An example TesserCap image preprocessing and run on Wikipedia (Wikimedia’s Fancy CAPTCHA) is shown below:


TesserCap and it's user manual can be downloaded from one of the following locations:


The two tables below summarize the CAPTCHA analysis performed using TesserCap for few popular websites and some CAPTCHA service providers. All these tests were performed using TesserCap’s image preprocessing module and Tesseract-OCR’s default training data.

Website Accuracy* Quantcast Rank
wikipedia 20-30% 7
ebay 20-30% 11 20-30% 68
CNBC 50+% 121 80-90% 160 30+% 245 80+% 1000 70-80% 32,534 80+% 149,645

CAPTCHA Provider Accuracy* 40-50% 20-30% 60+% 10-20% 10-20% 40+%
ReCaptcha 0%

*This accuracy maybe further increased by training the Tesseract-OCR engine for the CAPTCHAs under test.


OpenCaptcha Preprocessing

OpenCaptcha Sample Run



Saturday, March 19, 2011

Breaking A Weak CAPTCHA implementation

A while back I came across a web application that implemented captcha to prevent automated form entries. The captcha was weak and could be easily solved. Below I summarize the steps followed and provide sample ruby scripts that were used to perform automated form submissions. The page names, form fields etc... are fictitious and do not reflect the exact application data/behavior.

So lets get started. Here is one sample captcha obtained from the website.

My first thought was to try the free "OCR to text" conversion service provided by guys at Free-Ocr. I uploaded few captchas to the website and it could successfully solve almost all of them. One solved capcha is shown below.

Now I knew that the CAPTCHA can be solved, and needed a way to automate the process of solving the captcha. I turned to Tesseract to do that for me. Tesseact enjoys the reputation of being one of the most accurate open source OCR engines available.

Tesseact was downloaded and installed on a windows box. The page requiring captcha input was sourcing captcha's from a php script on the web server. Lets say its path is The following script helped download a sample captcha, stored it on local file system and then solved it. 

require 'net/http'
tesseract = 'C:\Tesseract-OCR\tesseract.exe'
q ='',80)
# Download new captcha
r = q.get("/get_captcha.php")"captcha.bmp",'wb') do |f|
f.puts r.body
# Solve the CAPTCHA
system("#{tesseract} captcha.bmp captcha") #Output gets stored in captcha.txt

Most of the sourced captchas could be successfully solved using the script above. Good! 

The next obvious step was to automate the entire process of form submissions. The application used PHPSESSIONID to associate captchas with sessions. was issuing the PHPSESSIONID and the same sesssion value was being sent to /get_captcha.php to retrieve a captcha. To automated the process, following was required:
  1. GET /home.php page and capture the value of PHPSESSIONID.
  2. Retrieve a captcha by accessing /get_captcha.php while using the captured PHPSESSIONID.
  3. Solve the captcha locally
  4. POST the form fields along with PHPSESSIONID and the captcha value
A few more lines to the script above would serve our purpose. The final script looked like below:

require 'net/http'
tesseract = 'C:\Tesseract-OCR\tesseract.exe'
q ='',80)
r = q.get("/home.php")
r['set-cookie'] =~ /PHPSESSIONID=(.*?);/
hdr = {'Cookie' => "PHPSESSIONID=#{$1}"}
#get a captcha associated with a valid PHPSESSIONID and solve it
r = q.get("/get_captcha.php",hdr)"captcha.bmp",'wb') do |f|
f.puts r.body
system("#{tesseract} captcha.bmp captcha")
#retrive the captcha value and POST the form details along with valid PHPSESSIONID
captcha ="captcha.txt").strip'/save_details.php', "fname=gursev&lname=kalra&captcha=#{captcha}" , hdr)

Further Analysis:
The captcha implementation appeared to have more issues. During the analysis around 100 captchas were solved and their values analyzed. Here are the the various observations:
  1. Captchas contained only numerals and hence lesser number of possible combinations.
  2. Out of 100 captchas around 4 duplicate captchas were identified. Thats around 4% of total captchas issued.
  3. Captchas had uneven character distribution with 4's and 5's getting the maximum share of captcha characters. The distribution formed a bell curve with a peak at 4 and 5.