SO what is so special about the following two images?
Yes sure, both show something that I love, cats and souvlaki (The Greek National food).
But that is not what makes these two images special. It is the MD5 hash!
Because both have the same!!
You can verify it by yourself. Download them and run md5sum to confirm that their hash is the same.
Original images: https://emaragkos.gr/md5collision/gyra.jpg https://emaragkos.gr/md5collision/souvlaki.jpg Images after collision attack: https://emaragkos.gr/md5collision/gyra_coll.jpg https://emaragkos.gr/md5collision/souvlaki_coll.jpg Log files Short version: https://emaragkos.gr/md5collision/demo.output Long verison: https://emaragkos.gr/md5collision/demo-short.output
So what is a Hash Collision Attack?
A Hash Collision Attack is an attempt to find two input strings of a hash function that produce the same hash result. Because hash functions have infinite input length and a predefined output length, there is inevitably going to be the possibility of two different inputs that produce the same output hash. If two separate inputs produce the same hash output, it is called a collision. This collision can then be exploited by any application that compares two hashes together – such as password hashes, file integrity checks, etc.
The odds of a collision are of course very low, especially so for functions with very large output sizes. However as available computational power increases, the ability to brute force hash collisions becomes more and more feasible.
For example, let’s say we have a hypothetical hash function. A collision attack would first start with a starting input value, and hash it.
Now the attacker needs to find a collision – a different input that generates the same hash as the previous input. This would generally be done through a brute-force method (trying all possible combinations) until one was found. Let’s say we found a collision for this input in our hypothetical hash function.
The attacker now knows two inputs with the same resulting hash.
Practically speaking, there are several ways a hash collision could be exploited. if the attacker was offering a file download and showed the hash to prove the file’s integrity, he could switch out the file download for a different file that had the same hash, and the person downloading it would be unable to know the difference. The file would appear valid as it has the same hash as the supposed real file.
So – are hash collisions something to worry about? It depends on the hash function. Md5 and even SHA-1 have been shown to not be completely collision resistant – however stronger functions such as SHA-256 seem to be safe for the foreseeable future.
Q: Is this something new? A: No. MD5 is known to be insecure. The first collision ever found was in 2004. Q: Can I do it by myself? A: Yes. Follow the tutorial. Q: Should I do it? A: Eh, yes, I guess? If you have free time.
In my case considering I own an old crappy Dell laptop, running this on my localhost would be impossible. So I searched for a cloud solution. I found an awesome tutorial by Nat McHugh that uses an Amazon AWS preconfigured to run HashClash.
The calculations took almost 18 hours on an EC2 g2.2xlarge. So by running this collision I made mr.Bezos $ 11.70 richer!
But now I am a proud owner of an image of a cat that has the same MD5 has with an image of souvlakia!