Capture the Captcha – And the winner is…
Bonsai is very proud to announce the winner of our Capture the Captcha game: Julio Vidal, aka @madeye12 !
He wrote some lines for our Blog regarding his experience during the hole game:
Hi, my name is Julio Vidal aka madeye12. I’ve been following the Security Scene since a few years ago learning as much as I could, hope to keep playing this kind of games to learn more and to make some friends while I’m on it. Well but this post isn’t about me, it’s about the way I broke the captchas.
I noticed the Capture The Captcha 2 days after it started, on 17/05/2011, so the first thing I did was to see the Team Scores page to see what was going on and as a gift I saw which captchas were broken by which teams that gives me an idea about wich ones were the easiers so I decide to go for #5.
Captcha # 5
It wasn’t hard to solve this captcha, unless u don’t have idea how to make an http request. The captcha was the addition of 2 numbers, the operation was written as in the source code within the web page so not very hard to solve. We just need a few steps to solve the captcha:
- Do a GET request to /five/ resource.
- Use a regexp, /\d+ \+ \d+/, to find the operation string in the source code.
- Split the string found with the regexp so we can get the numbers.
- Make a POST with the result of the Add.
Finally we just need to take a beer while the captchas are broken till we get to 5000.
Easy…this kind of captchas should be punished with jail but hell…If u still see windows 3.11 on schools u can expect everything.
Captcha # 7
This was a good one but it was like a crypto challenge with a simple substitution cipher on it, in the beginning you can think that it could be a hard captcha but after you refresh it a few times you can see on parameters the value of the flash object some numbers like 31,32,8,12,21 and that this numbers are always the same for the same letter so we can deduce that numbers represent letters we just need to get the right cipher so lets work ;)
Once you have the cipher everything resumes to following
- Do a GET request to /seven/ resource.
- Use a regexp to find the params value in the source code.
- Get the correct character for each number.
- Make a POST with the right captcha value.
Captcha # 3
Damm…things start to get interesting, finally we don’t have the captcha value in the source code now it’s on an image as most of captchas with some noise on it not a lot but well better some than nothing like #2. So … how to solve this captcha???. Before this change I played once with image captchas and find that imagemagick is one of the best tools to work with images but didn’t knew how to use it well so the first thing I did was to convert the png file to bmp and tried to remove all the non black pixels but I always ended with a weird image, dunno why maybe I was too tired that day to keep typing scripts so I decide to RTFM of Imagemagick and find a great command to get rid of all the noise in the image but couldn’t do it on my linux box so this one I have to solve it in windows … why?? don’t ask me I was too tired and bored to figure out why :P. So to the point how do we solve the captchas
- When we have this kind of captchas the first thing to do is to find the image url so we can directly request the image to save it on our box and play with it.
- Once in our box we call our magical command `convert captcha.png -fill white +opaque “#0099cc” captcha.pnm`. how does this command work? easy…from the man page:
+convert -convert between image formats as well as resize an image,blur, crop, despeckle, dither, draw on, flip, join, re-sample, and much more.
-fill color color to use when filling a graphic primitive
-opaque color change this color to the fill color
So … wait!! we use +opaque instead of -opaque what does the ‘+’ do???? the same but instead of filling the color it fills all the colors on the image except the one you specify and in this case the value of the captcha is with the color value of #0099cc.
Ok, now we have an image with only the captcha value and luckly it doesn’t have distortion on it and letters with strange angles so we use an OCR tool to get the content of the value of the captcha, `gocr.exe captcha.pnm` And the captcha is broken we just keep sending POST til we broke the 5000.
Captcha # 2
This captcha was easier than #3 cuz in this captcha we don’t have noise on the image and it’s about solving an easy math operation so again…
- Find the image url and get the captcha image.
- Use gocr to get the content of the image
- Here we have four operands so we just need to figure out wich one it is (+,-,* or /) to split the string from the captcha and get the right answer to finally send it back to the server.
Captcha # 8
This one looks like hard to break for a newbie but it’s pretty easy if u always have a proxy on your connection or firebug ON, go get it if u don’t have it, in your firefox to see what’s going on every time you visit a web page, so with that in mind I visited the /eight/ captcha and looked at firebug to see what was going on on the net … and … tada!!!! I found this url ‘/eight/captcha/captcha.php’ … let see what does it have …mmm…just a digit…really is this captcha that easy??… lets solve one by hand and see what does it send to the server … the same number!!! so we’re done with this captcha we just need to keep requesting /eight/captcha/captcha.php and send the content to the server with our team name and password to solve the captcha.
Captcha # 4
Looks like things start to get harder from here. With this captcha I really suffered and just cuz I tought that it was simple like a conbination of #2 with #3. A subtraction with some noise on it and I really tried to solve it that way, removing noise trying to read the numbers but the numbers were to small so I decide to resize it but I din’t have luck :S the captcha looked to me like a hard captcha. Sometimes when we’re hacking stuff we often go by the wrong path thinking that the problem looked like one in the past and try to solve it that way instead of looking at other variables and another way to hack it. After some time I gave up and started again this captcha from 0 … and tada!!! just realized that the captcha url sends a cookie with every request that you make to the captcha url. mathhashcode is the cookie name and it’s obvious that something has to do with the result of the operation from here you can go by 2 ways, of course one is easier than the other but u can try both.
1.-One rule of a good captcha is not to accept the same captcha info after u submit it once,if u do this without changing the captcha value every time the user send it then the captcha could be bruteforced til the computer finds a right value, but we don’t need to use brute force in this case. we just need to solve it once and keep sending the same info ’til the server stops accepting the answer as Human answer and then we repeat, solve another one and send the same info as many times as the server accepts it ’til u broke the captcha.
2.-The first time I saw the mathhashcode it looked to me like an md5 so I went to an online hash crack service and got the string 72237 … mmm the right answer was 7 … got another mathhashcode and got 112237 … and the answer was 11 … so it looke like it was easy to crack the hash going from 0 to 20, since the biggest substraction that I saw was 16, append 2237 and get the hash value. After some time I realize that the second digit of 2237 changed to 1 but I decide to went by option 1 since it was easy and I think it’s always better to go for the easiest solution.
So here we are, we have successfully broken 6 captchas but the 4 captchas left looks like the hardest, at least for me … I found source code of all of them and staterd to look at them trying to find my way … but they all looked hard for me.
#1 looked easy since it only showed 10 words but couldn’t think in a way to solve it.
#6 damm just could think something to describe this captcha hard hard hard hard hard.
#9 looked to me imposible I tried playing with the variables but nothing, even when it could be broken preaty easy (thanks go for sinfocol for telling me how to break it after the CtC ends).
#10 I almost cried, lol, it looked prety hard but since #10 and #6 were unbroken and #6 was in flash I choosed #10.
#– makes string of random letters (for embedding into image)
mmm and found
So I just think that solution could have what I need so I mounted it on my local web server to see if I was right doing an echo of $this->solution; and yes it has our solution … damm if only I could find my way to the solution … I also found that the captcha info was stored on a tmp file and the name of the file was the __ec_i value, the _ec_i is “ec.” . time() . “.” . md5( $_SERVER["SERVER_NAME"] . CAPTCHA_SALT . rand(0,1<<30) );” keep that in mind.
I feel like I was lucky and started to think that maybe there was a /ten/tmp/ folder where the files could be read and I could be done … well no luck for me … maybe a trasversal directory attack so we can reach /tmp to find the files!!!!… no luck again… looks like the only way is to figure out the value of solution variable … how can I do this if that value is random, can we predict rand values??? maybe a php bug or a bad srand ;) and I found
srand(microtime() + time()/2 – 21017);
Looked like it wasn’t my day … wait didn’t __ec_i have time() on it’s name?!?!?! we’re back to the game again. Ok so most of the time we could recreate the value of the seed since we have time() on the name and the _ec_i value is generated after srand it must be the same value, if only we could find the exact value of microtime(). At this point I started to read a lot of documents about how to find that value but nothing that helps me. I decide to RTFM of microtime() and srand() srand accepts int values and microtime start with a floating value!!! Yes!! (int)(microtime() + time()/2 – 21017)=(time()/2 – 21017) and as a gift I found that u can submit a captcha value twice. so now we just modify our local captcha script to accept a parameter to substitute time() value so we can generate the same captcha than CTC server.
srand($_GET['seed']/2 – 21017);
and echo $this->solution;
and we’re done with this captcha we just need to make request to the /ten/ captcha read the _ec_i value from the source code (regexp /ec\.[0-9]+\.[0-9a-z]+/) get the time value spliting the _ec_i value and send it to our own server to get the answer and if the CTC server answer us with HUMAN! we send it again so we can solve this captcha faster.
I must say that from here I cheated to get myself to the first place, I broke the 3 captchas with the same technique just finding the right values for each one.
From the 3 captchas left this one looked the easiest since it has a small directory, this captcha remembers me to gmail captcha … and I think I read once that google captcha was broken so started to look for solutions but didn’t found anything. I looked for the source code and found cool-captcha, the first thing I did was to look for an srand … no luck this time. how does it generate the words??? it generate a rand number between 1 and the length of the Directory file, it comes with 2 files es.php and en.php. This files have the words showing in the image, in this case it was using a small portion of es.php, the first 10 words but how to find the rand value?!?!??!!?.
While reading some articles from Stefan Esser, rusian forums and raz0r.name blog I found an interesting article talking about how to predict php rand values but that wasn’t the only thing I found, the most important thing was
“In fact, errors in the implementation is not as important as the fact that HTTP connections established as Keep-alive, are serviced by the same process on a remote Web server. This means that the position of the random number generator will be the same … However, this is true only for those Web servers where PHP is used as a module of Apache (mod_php) – in the case of CGI or fastcgi random number generator will always be restarted.”
Why is this so important????…remember #10 where we know the seed and we know the source code of both (#10 and #1) so we know how many rands do we have to do to get the right word. so we mount the 2 captchas on our own server and echo the word of #1 instead of writing it to the image and we’re done.
*Now we just need to Get /ten/ captcha with keep-alive header, read the __ec_i value and get /one/captcha.php
*We make the same but sending the seed value to our script with the keep-alive header too and get the captcha value.
*Most of the time it will give you the right answer so now you just need to relax and see how does it get broken.
Note: Since this captcha has a small directory of images it can be broken by a probabilistic method according to the people I’ve been talking.
This captcha is the same than #1 but with the whole es.php directory and looks like some words aren’t in the same position than default or the cheat wasn’t working well since but basically I did the same than #1. This captcha can be broken in other way, thanks sinfocol for letting me know how. I tried playing with POST variables to fool the captcha but nothing works. I tried changing power param to off but didn’t work, tried to reuse a captcha value but none … and why???? well if u submit power=off and captcha empty it counts it like solved so this was an easy one don’t you think. Note for my self next time try to send empty values on parameters, lol.
Finally we’re with the hardest captcha, at least for me. It wasn’t hard to find the source code for this captcha since u can click the captcha and it’ll take u to the home page of the project. With this one I tried different things. The first thing I tried was to look at the string generated by six/captcha.php I was looking for something that gives me the answer in the response but nothing. After looking at source code I saw that I wasn’t possible and after reversing the icaptcha.swf file realize that it draw a lot of lines to draw the numbers. Wow…I played with the param values on the swf file but nothing good came. I don’t know a way for convert swf files to image files so we can split the numbers and try to read them with OCR software :S, once reversed the swf file I noticed the post to validate.php and went to my local copy to see what does that file do but nothing good came. After a few days and getting a lot of work to do I decide to cheat again after all hacking is about taking advantage of everything to do what you want, so #6 fault was to don’t call srand.
This script some times failed and dunno why so the only thing I did to get better results was to compare both results, the one from CTC server and the local copy, I think that it failed and get me different v values cuz the Dopping value, if u look at source code you can see that according to Doping value u can have more or less calls to rand(). So resume is:
- Modify icaptcha.php to print “.$code” after “v=12313123…” value.
- Call /ten/ and /six/icaptcha.php with keep-alive header and read _ec_i value.
- Send seed value to our /ten/ captcha and call /icaptcha.php
- Split the answer from local script, compare length with the server one and if they have the same length submit the $code generated by our local script.
I hope to keep playing this kind of games and make more friends to have interesting talks and of course drink some tequilas and eat some tacos. If you broke the captchas in diferent ways I would like to read them, you can always spam me on @madeye12.
Keep hacking and thanks to @bonsai_sec for setting the CTC.