User:Jorge Stolfi based on Image:Hash_function.svg by Helix84

What on a hash?!

I got this question from a friend (you know who you are) who has mined crypto goodies and just came to think about what those hashes are all about. So.. what is the deal with hashes? I am going to assume whoever is reading this has no idea what hashes are, which gives me freedom to explain it my way.

 A bit on Hash

Wikipedia (the source of all knowledge according to some, and the source of all ignorance according to others) has a very nice entry on what I should properly call Cryptographic Hashing Function (hashing algos in short). The wiki entry explains it all very nicely but it does not explain it without using strange words.

It lists five properties for nice hashing algo, which I will attempt to explain without using strange words (numbering is mine, for convenience):

1) It is deterministic so the same message always results in the same hash

2) It is quick to compute the hash value for any given message

3) It is infeasible to generate a message from its hash value except by trying all possible messages

4) A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value

5) It is infeasible to find two different messages with the same hash value

So what are those all about?

 1) It is deterministic, so the same message always results in the same hash

Very important otherwise the function would be unreliable. For the same input you get the same hash output. Not similar, or close, but the same. No way of tricking the algo into producing something else, and if it does produce something else, put the algo in the garbage.

2)  It is quick to compute the hash value for any given message

Very important as well. Given how handy these are for bitcoin mining and so on, it will be great not to have to wait for it. This is unlike other encryption functions which can take forever to do their thing. A handy hash function (algo) is fast. It shakes and jumbles the input until it gives up that output which is usually a convenient fixed length.

3)  It is infeasible to generate a message from its hash value except by trying all possible messages

This is a fancy way to say it is “irreversible”. For math functions, an input gives an output. For a lot of them there is also an “inverse” function,where you put that output and it gives you input you started with. Not the hash function! That one has no “reverse”, as it is a “trapdoor function“, which is just like a real life trap door: once you get thru it there is no way back.

Let’s say you wanted to find out what was used to generate a given hash value. Well.. the only way to find out is to hash everything in sight until you find something that works (the input). For most hash functions the input has no limit on length so good luck on trying that. A caveat: if you know the input is, lets say 4 characters in length, then the guessing will not take long. This is not the hash function problem, but moron’s who decided to use 4 characters instead of something longer.

Clever people have been using long lists of input and output pairs, with some algorithmic assistance,  to help with the guessing. This is a problem if the input is a secret such as password. For most mining concerns this guessing is not an issue, as the input is written for everybody to see in the ledger.

4)  A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value

This says the hash function has chaos like behavior. Like the weather where hypothetically a tiny change over here ends up with a hurricane over there (yes.. Jurassic Park reference).

 It means that if you hash number “2” then you can’t expect the output hash to be “close” to the hash of “1”.  It could be “close”, or it could be a million miles away in terms of “proximity” (that’s another discussion).

Practical hash outputs involved in mining are 256 bits in length. This means there are 2 to the power of 256 possible values. The actual number has 78 decimal places:

115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129, 639,936

 A way to understand how big this is:  imagine all the grains of sand on Earth. Then imagine each one of those grains being yet another planet with more sand. Count the total grains of sand. Here is a cool explanation just how large that number is: How secure is 256 bit security?

Why does this matters?

a) The hash function output is pretty much all over the place, and it is almost impossible to predict where it will land within all those fine grains of sand.

b) Any teeny-weeny change to the input will send the output into an unpredictable place (one of those sand grains on a planet on a grain on a planet). Even Doctor Who won’t be able to predict where that went.

c) If you use as an input something which keeps on changing (combo of mouse, system clock, IP packet rates.. anything…) and use it to generate a hash, it is very unlikely you will end up with the same result as somebody else doing the same thing.

So what you ask? Well…  when you create a “crypto wallet”, that’s what the app will do (more or less). It waits for random stuff to happen, and then, based on that randomness, it spits out a hash. It did not check if anybody else already has that value (a disaster for you if someone else does!). The app assumes nobody else has it, as it is very, very unlikely (Doctor Who unlikely).

5) Last but not least:  it is unfeasible to find two different messages with the same hash value

This is called “collision resistance”, and it is very important for mining and crypto currencies, and making money out of those.  It means, it is very unlikely anybody could find two inputs which will end up with the same output hash. Given the inputs can be of pretty much any length (such as bitcoin block sizes), it is a guaranteed, there is only one input for a given output.

It matters, as the hashes of things, such as bitcoin blocks, are used to identify the block. This is how hashes are used to validate something has not changed: compare the original hash with the hash of the thing being suspected. If it is not the same someone will be in trouble. The hash value is called a “digest” and it is like a fingerprint. For practical purposes, “same hash, same thing”.

This is used in bitcoin (and the likes) to create the blockchain. Let’s say, there is a blockchain with three blocks: “A”, “B”,  and “C”, where “A” is the first one in this chain. Block “C” includes a digest (hash) of block “B”, which includes a digest of block “A”. Given this property, there is no way to make changes to block “A” without impacting  block “B” and subsequently “C”, and all blocks that come after “C”. This will break the chain, and trying to alter the chain will get you booted from the network pretty quickly… unless you have found a secret way to create collisions, by which point you will be either hunted down or become a billionaire.

Bringing it Home

There is a long list of dead hash functions which failed to uphold some, or all, of the attributes listed above. Creating a hash function, which qualifies for practical use, is not a matter of cleverness but of science. Any proposals for “inventing” a better mousetrap (hash function) should be treated with suspicion and subjected to examination by people trained on the matter.

Not doing that diligence is a rookie mistake, as it happened to IOTA blockchain, when the team behind it decided to create a better mousetrap. Any claims of improvements to well documented hash functions are an extraordinary claim which needs extraordinary evidence.

Another point of wisdom comes from the design principles listed by IEEE Center for Secure Design, including  the following under the heading “USE CRYPTOGRAPHY CORRECTLY“:

Rolling your own cryptographic algorithms or implementations. Designing a cryptographic algorithm (including protocols and modes) requires  significant and rare mathematical skills and training, and even trained mathematicians sometimes produce algorithms that have subtle  problems. There are also numerous subtleties with implementing cryptographic algorithms. For example, the order of operations involved when exponentiating a number—something common in cryptographic operations—can leak secret information to attackers. Standard algorithms and libraries are preferable.

As noted by the experts, “hashing functions are the workhorses of modern cryptography” and this applies to blockchain, bitcoin, and every other crypto-money-thingy out there.

Here is video with a nice explanation on how hashes are used with bitcoin. It is definitely a better explanation than the one I just provided.

And if you have time for one more, please consider this one from Three Blue One Brown. Enjoy!

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

Enjoy this blog? If so, spread the word!

%d bloggers like this: