Codementor Events

Solving CAPTCHA with Web automation

Published Nov 18, 2019
Solving CAPTCHA with Web automation

CAPTCHA is no longer an alien term to the users. An acronym for Completely Automated Public Turing test to tell Computers and Humans Apart. CAPTCHA is basically a computer program built to distinguish between the human and machine to prevent any type of spam or data extraction from websites. The entire concept of CAPTCHA is based on the assumption that only a human would pass this test and bot or automated scripts would fail.

The need for bypassing CAPTCHA
Now, one may use automated CAPTCHA solving for illegal or legitimate purposes. Spammers use CAPTCHA solving to extract email addresses of users to be able to generate as many spams as possible. The legit examples are scenarios where a new client or business partner has joined you and needs access to your Application Programming Interface(API) which is not ready or can’t be shared with due to some security issue or abuse it may give rise to. In that case, you are only left with CAPTCHA bypassing using automated scripts. One of the very common bypassing methods is automated CAPTCHA solving using Python or Java or C++ programs when the developer community needs access to the system services.

There are different types of CAPTCHA, text-based CAPTCHA, image-based CAPTCHA, reCAPTCHA, and mathematical CAPTCHA. Solving one can get really challenging sometimes as the technology used in CAPTCHA and reCAPTCHA is getting smarter.

Automated CAPTCHA solving techniques
There are several CAPTCHA solving techniques available for the users to solve the CAPTCHA and reCAPTCHA. The two most common strategies are:

1. OCR (Optical Character Recognition) enabled bots - In this approach, CAPTCHA is solved automatically using Optical Character Recognition (OCR) enabled bot.

2. Human-based CAPTCHA solving services - The service has human workers who are available online constantly to solve CAPTCHAs. When you send your CAPTCHA, the company forwards it to the workers who solve it and send back the solutions.

The pros of OCR enabled software applies in a case where you need to solve a large number of trivial CAPTCHAs where it turns out to be a cost-effective solution. But that is rarely the case after the release of ReCaptcha V3 by Google. OCR bots are thus not built to fight CAPTCHA used by the big boys like Google, Facebook or Twitter. That would require far more advanced CAPTCHA solving solutions. Therefore, the logical and future-proof choice would be to employ the second technique which has higher precision and bypasses complex solutions as well.

Pros of the online anti-captcha services over OCR:

  • Higher percentage of correct solutions (OCR gives extremely high level of
    incorrect answers to really complicated CAPTCHAs; not to mention that some kinds of CAPTCHA can't be solved with OCR at all, at least for now)
  • Continuous flawless work without any interruptions with quick adaptation to the newly added complexities.
  • Cost-effective with limited resource constraint and low maintenance cost as there are no software or hardware issues; all you really need is the internet connection to send simple requests via API of anti-captcha service.

Big players in the online solving services
Now that we have the knowledge of the better technique to solve your CAPTCHAs. Let’s look at the services which provide precise solutions, API support, and quick responses to our requests. We have the organizations like 2captcha, Deathbycaptcha, Anticaptcha, etc.

2CAPTCHA is one of the best platforms which I used in my case. They have quick response time and has pretty good accuracy. They have a human-based team available online to solve the CAPTCHAs. They provide solutions for all major kinds of CAPTCHAs at reasonable rates. It is the one we are going to use to bypass CAPTCHA. Here is why 2captcha has an upper hand among its contemporaries:

High speed of solution (17 seconds for normal (graphic and text) captchas and 23 seconds for ReCaptcha)
It supports almost all popular programming languages with comprehensive documentation of their ready libraries.
Fixed price rates (which don't change along with increasing server's load)
High accuracy (up to 99%, depending on captcha type)
Money-back guarantee for incorrect answers
Ability to solve a vast volume of captchas (more than 10,000 every minute).

Instructions to integrate 2captcha API
Now comes the best part, here we will learn web automation using 2CAPTCHA service.

2CAPTCHA service requires us to provide it with a few parameters :

  1. service key 2. google key
  2. pageurl 4. method

Register yourself on the 2CAPTCHA, you will be provided an API service key that allows you to automate and integrate your software with 2 CAPTCHA services.
Now go to the site page and get the data-site key attribute value using the developer tools. Now we make a GET or POST request to the 2CAPTCHA service with the above-mentioned parameters using Python (or any other language) script.
The 2CAPTCHA service renders a response in the form of OK|CAPTCHA_ID where CAPTCHA_ID – is the id of the reCAPTCHA in the system.
Now we need to wait till a worker solves the reCAPTCHA and google returns a valid token to the service. For this, we can make a request to the 2CAPTCHA service every 5 seconds until we get a valid token. See a request to res.php endpoint with parameters:

http://2CAPTCHA.com/res.php?key=SERVICE_KEY&action=get&id=CAPTCHA_ID

Now we submit the form with the g-reCAPTCHA-response token.
At the target site (server-side), this token is checked. The site’s script sends a request to google to check the CAPTCHA-response token’s validity. At 2CAPTCHA testing ground this token is checked before the form submission. It is done by passing a token through ajax (XHR) request to proxy.php which, in turn, inquires of google if the site is verified and returns google’s response.

Conclusion
Yes, we can automate the process of CAPTCHA bypassing. There are 2 commonly talked about ways to do that among which online anti-captcha services turn out to be more successful. There are different service providers in this technique of CAPTCHA solving but 2CAPTCHA is the one I’d recommend as of now. The scripts can be written in C#, javascript, java, and python. The service provided by 2CAPTCHA is superfast and the solutions are accurate.
ing here...

Discover and read more posts from Harshit Tyagi
get started
post comments2Replies
Papapz
a year ago

Hello, could be possible to get your e-mail so we can talk about a paid blog / article that you can do for our service? Let me know

Abhi
5 years ago

Hiii broo i work on a ssmms website its mainly need captcha but in time we work with it it will not be visible can u have any solution for it