5.9 MD5 RETURNS!

EXERCISE 5.9 MD5 RETURNS!

In Chapter 2, we discussed some of the ways that MD5 is broken. In particular, we emphasized that MD5 is still not broken (in practice) for finding the preimage (i.e., working backward). But it is broken in terms of finding collisions. This is very important where signatures are concerned because signatures are typically computed over the hash of data and not the data itself.

For this exercise, modify your signature program to use MD5 instead of SHA-256. Find two pieces of data with the same MD5 sum. You can find some examples at or with a quick search on the Internet. Once you have the data, verify that the hashes are the same for the two files. Now, create a signature for both files and verify that they are the same.


Two different pieces of data with the same MD5 sum…

data1 = bytes.fromhex("""
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89 
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b 
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0 
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70 
""")


data2 = bytes.fromhex("""
d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89 
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b 
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0 
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70 
""")
data1 == data2
False
from hashlib import md5

d1 = md5()
d1.update(data1)
digest_of_data1 = d1.hexdigest()
digest_of_data1
'79054025255fb1a26e4bc422aef54eb4'
d2 = md5()
d2.update(data2)
digest_of_data2 = d2.hexdigest()
digest_of_data2
'79054025255fb1a26e4bc422aef54eb4'
digest_of_data1 == digest_of_data2
True

Computing their RSA signature…

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives import hashes


def create_rsa_signature(m: bytes, private_key: rsa.RSAPrivateKey) -> bytes:
    return private_key.sign(
        data=m,
        padding=padding.PSS(
            mgf=padding.MGF1(hashes.MD5()),
            salt_length=padding.PSS.MAX_LENGTH,
        ),
        algorithm=hashes.MD5(),
    )
private_key = rsa.generate_private_key(
    public_exponent=65537,
    key_size=2048,
    backend=default_backend(),
)
signature_of_data1 = create_rsa_signature(m=data1, private_key=private_key)
signature_of_data2 = create_rsa_signature(m=data2, private_key=private_key)
signature_of_data1.hex(' ')
'c9 66 1b 3c 6d e1 11 3c 0f b8 0c fc 27 d4 8d e6 88 c8 99 42 dd 76 26 0b 9a 4e 38 ae c2 e7 5e cb af 25 bb c8 c6 50 c3 8e 27 3c 03 3e 04 df eb 8c 77 81 0a fb 31 78 60 9d 9d a6 10 70 18 b9 40 9f b3 d6 34 f9 69 46 20 04 c0 9e 05 dc 71 49 04 99 3a 39 b8 d1 ed 38 de 03 25 ea 7b 83 70 72 72 0c 70 a7 f1 68 77 85 8d da ac 57 47 c4 3e 5d 9c a3 1d 5c b7 c4 47 b5 49 d1 44 54 aa 44 ec b7 fe 76 e0 0a 00 6b 25 ad 91 66 80 d1 17 21 c4 d4 eb b5 38 af 3e 7e 7b f1 e0 5d 75 8e 39 b8 89 f0 83 10 5e 2c 85 07 66 99 4d 94 9f a3 75 39 37 1f 69 7a 26 12 6f 84 b3 e5 ac 8d 65 b0 bf d7 e7 a9 02 54 ba 02 12 fc 4d 14 88 a4 df bf ee ec 7c d5 a5 65 2e bd f7 a1 f6 94 38 ed 05 09 46 01 27 d4 4c 0c 4a 77 4f 60 e8 8e b9 e8 d5 01 5c 0c fb 6c ff 59 06 f1 07 49 de 16 9d 19 73 9b 0b bf 51 25 82 cd'
signature_of_data2.hex(' ')
'57 e3 04 5c 05 01 8d b1 51 97 76 ac b1 fc 7b ec 48 30 b9 6f 99 55 a5 82 2b a1 93 1d 9c d9 df cd c4 f8 53 30 44 b5 0e d3 65 ab 8c fa 38 d3 d9 9d 87 81 33 cc 96 dc 56 5b 57 97 7d 9b 96 5e 96 bd e1 6a 8b 15 0b 93 23 d6 ca d3 46 da c7 1d d4 26 20 d7 5c fb df 4a 66 90 33 63 1b cf 60 c6 ea ba 85 12 6b 83 f6 6f 40 31 2c fa 0b de c0 44 bb bc a8 a3 3d b5 ea c0 72 fa 23 2a 2e 90 21 76 6e c3 16 5d 92 9c d1 8d 19 e4 48 d3 aa 75 89 e7 0e cf 77 f7 90 bb 4e 07 24 fe 7a 1e 4e a9 ee 78 44 20 79 ed 3e 7b 27 f2 d3 de 6a 25 f9 25 0b 82 f7 39 2a 08 16 ba 12 91 35 fa 09 43 bd b0 7c 83 60 1c 44 03 d4 e6 06 8d ce 34 16 4b 67 0e de 2b 4e c2 a7 38 e3 03 0b bb 44 50 1e 7a 6d ef 1a 87 7c d7 e3 12 cd 14 b8 5e 39 33 78 98 33 8e 17 4f 8e 7f 03 78 cb fa 01 e0 30 df 1d ff f2 93 53 91 e0 3a'

TODO: Why signature_of_data1 different from signature_of_data2?

The answer is probably related to the MGF, but make it more precise.