[Written on September 16th 2002 - Updated October 1st 2002 - Updated May 6 2003]
Steganography strength (is it easy to see there is hidden data?): Low Cryptography strength (is it easy to recover the hidden data?): Low |
1. BackgroundSteganography is the technique for hiding data inside other data, for example, to hide a secret message inside a picture, or a secret picture inside a music file. There are several techniques to do that, and several softwares available. Some use complex algorithms and are pretty good at doing their job (it's difficult to affirm that there is actually hidden data, and even more difficult to retrieve it), some other use very simple algorithms and are easy to detect and break. You can find reliable and scientific information about steganography, digital watermarking (which is basically the same thing) and how to detect them on several web pages on the web, like the Neil Johnson site, the Fabien Petitcolas site, the Outguess page (here you can find a tool to detect steganography in images), and several others. A few days ago, actually September 11th of 2002, first anniversary of the attack in the United States, there was a short subject talking about steganography use by terrorists. It was aired on the french private TV network "Canal Plus" on the show "Le Journal des Bonnes Nouvelles". Not only the tabloïd-like subject by itself raised my bullshit detector alarm to the red level (it's an old rumour, never proven, but the journalists transformed this rumour in facts: they said several times that terrorists actually used steganography), but also there was a lot of technical errors in the commentary. Sloppy and cheap journalism at its best, using the last hype or rumours to scare the audience. They did a "demonstration" of a "famous" and "unbreakable, even by the NSA" steganography sofware, which hides data in a "totally indetectable way", and is "illegal". Here are some screenshots of the show:
[Images © Canal Plus] When I commented in the french cryptography newsgroup about the fact that steganography is often detectable, the "computer specialist" interviewed in the show went totally mad (his insults in public and private are not worth translating), and proposed me a challenge. He set up a website with two JPG photos and challenged people to find which one contained a hidden Word document and what is the text contained in this document. It took me a few minutes with an hexadecimal editor to detect which image was modified, and post a first message. Less than one hour later, I posted a second message to show that I recovered the hidden data easily. Let's see how I did.
2. Presence of hidden data is evidentI first saw that one of the pictures had data added at the end. Because almost all the file formats have a fixed structure, and JPG is no exception, you can very easily see where the actual image ends, and where the "hidden" data starts. So much for the "undetectable" steganography software. The amount of data was compatible with a short Word file, so I guessed I was on the right track. There are very few steganography software that hide data at the end of files, because it's an extremely weak and detectable scheme. I found out the software they used was "Camouflage" (the homepage of this software seems to be no more available). Compare its interface below with the show screenshots above.
Because the data at the end of the file, although evident to detect, seemed to be encrypted or scrambled in some way, I downloaded the software to do a few tests, and I was ready to reverse engineer it to trace its routines. I found out the data is so weakly encrypted that I didn't even need that. A few tests with choosen passwords were enough to break the software.
3. Breaking CamouflageLet me put a few tests images here, so you can follow the procedure on your own computer. Everything you need is an hexadecimal editor, and of course the "Camouflage" software if you want to do your own tests. For curious people, the photo a lovely piece from my art hologram collection, "Lucy in a tin hat" by english artist Patrick Boyd. The hidden message is a simple ASCII text called "secret_message.txt", containing the text "This is the secret message.". You can get it here.
Please note that the following hexadecimal dump was not done with the exact same above images, but the structure of data is exactly the same. The first thing you notice by comparing the original image with any of the other ones is the big block of very recognizable data at the end of the file, just after the FF D9 "end of JPG file" signature. It starts with "20 00", and then some variable header, probably some data like the size of the hidden and original files, then you may have the encrypted hidden file data, and then a bunch of "20" (32 in decimal, the ASCII code for space: these buffers are probably for storing ASCII strings) with two small islands of encrypted data, and then a final and fixed signature. If you don't want to go through it yourself, here is for example the added data at the end of the image with the hidden text file but no password:
It would be easy to know exactly what all of these fields mean (for example, the size of the hidden text, which is 27 in decimal or 1B in hexa, appears twice, I underlined it above), but it's not needed. Let's get the only interesting one: the password. The second thing that is really surprising is that, when the password changes, the first block of data, containing the "encrypted" and "hidden" message, does not change! Really weird: the encryption does not depend on the password! Only a few bytes are modified in the last big "20" island. Let's now compare what's in this last big "20" island (starting at offset 1400h) of different files encrypted with different passwords. First, the image with the text file hidden without password (same than above). It's all empty:
1400 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 Second, the image with the text file hidden with password "aaaa". I highlighted in yellow the modified bytes. You can see that a 4 bytes long password results in a 4 bytes modification. A clear sign that the encryption is weak:
1400 63 F4 1B 43 20 20 20 20 20 20 20 20 20 20 20 20 Finally, the image with the text file hidden with password consisting in a 255 times repetition of "a" character. I highlighted in yellow the modified bytes. All the bytes in the buffer are now modified. Notice that the first 4 ones are the same than above, another clear sign of weakness:
1400 63 F4 1B 43 6D C7 75 80 80 AE DE 04 41 0E FF D2 The conclusion is that the password is stored at this position, probably masked by XORing it with a key composed by a fixed string of bytes. This string is now easy to obtain. Because XOR is reversible, you just have to XOR the above data with the password, which is "aaaa...", so in hexadecimal "61616161...":
63F41B436DC7758080AEDE04410EFFD2 F8042B329A971435CC42AC1FFD48869D
4. Back to the challengeSo now we know where the password is (fixed location relative to the end of the file, offset -275 in decimal), and how to decipher it. Let's go back to the challenge (the page does not exist anymore, but it's not really important, you can apply this analyse to any Camouflaged file). We can find in the second image that the password buffer contains:
71 FA 0F 51 61 C3 66 85 84
We just need to XOR it with the 9 first bytes of the key, which are:
02 95 7A 22 0C A6 14 E1 E1
The result is:
73 6F 75 73 6D 65 72 64 65
Which, translated back from ASCII, is: "sousmerde" (an insult). We can try it with Camouflage, and it works nicely, we can now extract the "hidden" Word file (which contains other insults). You can test it by downloading the images on the challenge (once again, the page does not exist anymore, but it's not really important, as you can apply this analyse to any Camouflaged file pages). [As a funny side note: a few hours after I posted my results on the cryptography newsgroup (first and second message), the images suddenly changed, and the "computer specialist", ridiculous because his "unbreakable" challenge was so easily broken, claimed that the original images never existed and I invented it all. Fortunately several people downloaded the files and independantly verified my results before he changed them]. Well, my hobby being reverse engineering and not psychiatry, I let the dog bark, and I thought this small analysis could nevertheless be interesting for some people, so here it is. I've quickly programmed a small utility to automatize the recovery of Camouflage passwords [Now version 0.2]. Source included, as always. It works with Camouflage 1.1.1 and 1.2.1.
5. ConclusionsDon't trust what is said on TV, journalists don't know what they are talking about, and instead of doing a little bit of research asking to competent people (there are plenty in the academia and the corporate worlds), they fall for the hype, and listen to people who are incompetent or just want to have their faces on a TV screen. Most of the steganography software around are easy to detect and to break. If the algorithm used in some encryption or steganography software is not documented precisely, its strenght is probably very weak. Never use them for serious security purposes. Don't trust what you see on the internet, and that includes this page. Be especially aware of people with a big mouth who use big words ("unbreakable", "indetectable", etc...). Test everything yourself, or ask different people who may know more. There are plenty of forums on Usenet with specialists about almost any subject you can imagine. [Note written much later: I've since discovered some other tools to unprotect Camouflage files: Have a nice day!
|