MD5 que te pasa

Hola chicos,

here I come with the first content filled post! I would like to tell you today about something that I came across at my work (I am database dev), usage of that is quite common so I decided it will be valuable to share.

Sooo... MD5 - anyone knows what is that? As wiki says MD5 is a cryptographic algorithm created by Ronald Rivest (for lovers of cryptographic, this is the same guy that participated in creating RSA algorithm!) in 1991. At the time it was invented it was one of the most secure algorithms, but later after more extended research and cryptanalysis scientists had found different vulnerabilities in it so after all it cannot be used for high-security applications. 

Brief overview: Hash function is a function that can be used to map data of arbitrary size onto data of a fixed size, value of that function is called hash value, hash code, digest or hash. The goal of the hash function is to find irreversible hash value, that can be decrypted only using brute-force search (this means enumerating all the possibilities and checking whether they fit the value or not #timeconsumingdance). As MD5 is a hash function, from a string of a random size we receive a 128-bit hash value. I was considering going here into specifics but the algorithm itself is rather complicated and many articles will describe it far better than me :D.

So hash function should be injective function, which is a function that allocates value from domain to only one value in a set of values. As far as we do not need a super-secure code for particular data, just something that will differ one value from another in unambiguous way algorithm will work just fine. And here we come with one of the nowadays applications. MD5 is a function provided in many ETL tools, such as Informatica, etc. and developers are using it as a value to compare whether something has changed within one row or not. For example, we can concatenate all the values in one row and then use MD5 with the argument of that concatenation and we should get unambiguous value. Then when comparing already existing row with a row that came from source (previously selecting key and joining tables using key) we can compare MD5 for both row values and then use that knowledge to decide whether update or not selected row. Cool, right? :)


xoxo,

szarki9