Understanding Cryptographic Algorithms used by Cryptocurrencies
If you like to understand how stuff works, then this article is for you, in this post, I will be explaining the different algorithms used in the crypto space, of course not all, but I will be able to cover the basic ones, enough to make you understand why some things are the way they are, and also helps your problem solving ability.
Disclaimer: I am not a cryptologist neither am I an expert in this field, so I might use some words loosely, don’t be offended, and also this is not an exhaustive list, I might miss some but I will surely update this lists continuously.
Cryptographic Hash Functions
The first algorithm we will be looking at is the hash function, as this is the basis of the uniqueness of bitcoin addresses created.
A hash function is a mathematical function which takes an input (message) and returns a fixed sized alphanumeric string.
a hash function has the following characteristics:
It is easy to compute a hash for any given input string
It is extremely difficult to compute the exact string that has given the hashed output: This means that it should be impossible to reverse-engineer the hashed output to get the input string even if the algorithm that is used in developing the hash is known, for example:
if I have an input 2 + 2 +5 = 9, now given the output 9, for me to tell what numbers were added to arrive at 9, I will have to iterate over 9C1, which is still solvable considering the simplicity of the output, however, what happens when I have an output like 92830384059555686468684884784, its going to be extremely difficult to compute, and it is known that some of the best hash functions will take millions of years to decrypt even by a quantum computer.
Image Source
A hash function should be able to take an input string of any length and turn them into an output of fixed length: meaning that if I have an input of “Hello World” or “Hello this great looking awesome world” the output should have a fixed length for the two strings even though they are two different strings with different lengths
Bitcoin makes use of the SHA256 which stands for Secure Hash Algorithm 256bits, although this algorithm was originally developed by the NSA, it has been publicly vetted and verified.
Merkle Tree
Now, having understood what hash function is, you will need to apply that knowledge with Merkle Tree. The Merkle Tree is being used in cryptocurrencies to securely verify and validate transactions, with this cryptocurrency use it to ensure consistency and data integrity. because once any change is being made to one node in the tree, it affects all the nodes as the values changes.
How Merkle trees work
Merkle trees are created by repeatedly hashing pair of nodes until there is only one hash left, the one left is called the Merkle root, a node could contain any type of data, but here, let us take a node to contain transactions, the first step is to use our hash function (SHA 256) to hash each transaction, for example: if the raw transaction data looks like this:
0200000000010111b6e0460bb810b05744f8d38262f95fbab02b168b070598a6f31fad438fced4000000001716001427c106013c0042da165c082b3870c31fb3ab4683feffffff0200ca9a3b0000000017a914d8b6fcc85a383261df05423ddf068a8987bf0287873067a3fa0100000017a914d5df0b9ca6c0e1ba60a9ff29359d2600d9c6659d870247304402203b85cb05b43cc68df72e2e54c6cb508aa324a5de0c53f1bbfe997cbd7509774d022041e1b1823bdaddcd6581d7cde6e6a4c4dbef483e42e59e04dbacbaf537c3e3e8012103fbbdb3b3fc3abbbd983b20a557445fb041d6f21cc5977d2121971cb1ce5298978c000000
the hashed functioned will look like this:
B579585CA762CB2F17BDEC8C6698C0E8A3BC3D981397CB5DE09C1710F9C7DE51
These hashes are then organized into what is now called the Merkle tree, so for each transaction done, the hashes are paired, concatenated and then hashed together, this is done for each output of our transactions until something like a tree below is formed
Using Merkle tree has significantly reduced the amount of data a trusted authority has to maintain to verify the authenticity of that data, so they can actually validate that a data is valid without actually having that data, don’t mistake this with the Zero Knowledge Proof tho.
Scrypt PBKDF
Image Source
This is a password-based key derivation function that is used cryptographically from preventing passwords from being hacked with either brute force or rainbow table attacks
Most of the time, when a user creates a password, different hashing functions (SHA-1, SHA-256, MD5 etc) are used to hash these passwords, and a hashed output is generated, which is then stored in the database, hashed passwords are still susceptible to some attacks like the rainbow table attack where the attacker reference a table of pre-hashed passwords and try them against the database.
Now in other the circumvent this, the hashing algorithm was optimized by concatenating a salt + a password and hashing them to generate an output key, however this output key can still be compromised with the rainbow table attack, so the algorithm was optimized even further by taking the password + a salt value + a number of specified iteration values and use these three parameters to generate a stronger password key.
By now, you would probably think the password is very strong and it will be almost impossible for an attacker to decrypt the password, Yes, you are right for the most part, except that this can still be attacked by a large number of parallel operations performed against the key, so we needed a new way to combat parallel attacks and this is where the Script PKBDF came about
How it works
Scrypt PBKDF was created by Dr. Collin Percival, it is a memory intensive algorithm that makes it very expensive to perform large scale attacks, it is currently used by popular cryptocurrencies like Litecoin and Dogecoin.
SPBKDF works by taking a passphrase + a salt value + a CPU cost parameter + a memory cost parameter + a parallelization parameter + an output length that is specified by the user known as the dkLen and use all of these parameters to generate an output key.
The parallelization factors help to fine tune the CPU costs, the dkLen gives the user the ability to determine the output length, the N & R parameter allows the user to set the CPU and memory cost. read this if you would like to know more about the Scrypt PBKDF
Elliptic Curve Digital Signature Algorithm (ECDSA)
Personally, this is one of my favorites of all cryptographic algorithms used in blockchain and cryptocurrency because I think its one of the most powerful algorithms out there, ECDSA is the algorithm behind the private and public keys being used in Bitcoin.
How it works
As you already know, I like talking about the problem that made the creators arrive at creating these algorithms first before I go on to explain how the algorithms works
RSA Algorithm
Prior to the creation of the Elliptical Curve, the RSA was the most computationally impossible algorithm to decrypt at the time, its security lies on the fact that factoring is slow and multiplication is fast.
To put RSA in context: Everything we do on the internet from purchasing online, to accessing websites securely is protected using the RSA algorithm and the way RSA works is by taking two prime numbers and multiplying them together to get something called a semiprime number, what is really interesting about semiprime number is that it is really difficult to calculate the numbers that the numbers that could have been multiplied together to form them to get back to the original values. for example, if we multiply two prime numbers say 13 and 17, we get 221, but if we are given a number 221 and we are asked to determine what two numbers we multiply together to arrive at 221, now you might think for this number, its quite easy, but in a situation where we have a two prime number like 7208983630373 and 16390902739373 multiplied together, it will take a huge computation power and ages for us to determine the two numbers that were multiplied together to arrive at the calculated number, its more like trying to unfry an egg, it is impossible, or so we thought until algorithms like Quadratic Sieve and General Number Field Sieve were created to tackle the problem of prime factorization, these algorithms are faster and less computation intensive, and what this means is that RSA might not scale us long enough especially in a future where quantum computing is going to be readily available.
So researchers explored other ways to implement algorithms that could serve as a trapdoor function beyond factoring and that is how the elliptical curve algorithm came about
An elliptical curve is a set of points that satisfy a specific mathematical equation:
y2 = x3 + ax + b
representing the above equation in a graph leads to something like this:
Image Source
there are several properties that makes the curve above a good fit for cryptography, and one of the property is its horizontal symmetry which implies that any point on the curve an be reflected over the x axis and it will remain the same as the original curve, another interesting property is that any other non-vertical line will intersect the curve in at most three places, like this
Image Source
Any two points on the curve can be dotted together to get a new point just like the animation above represents, what this means is that, if you have two points, an initial point “dotted” with itself n times to arrive at a final point, finding out n when you don’t know the final point is hard, this is the basis with which elliptical curve was generated, to fully understand the elliptical curve, I will suggest you go through this article
Bitcoin uses the ECDSA for its digital signature
Zero Knowledge Proof (ZKP)
In this age of data privacy concern, we have been two options when it comes to our privacy concerns, either we give all these companies unfettered access to our data in exchange to use their platform, or we decide not to use their platform, but its not that simple cos we don’t always have a choice, take, for example, if you want to seek for a loan from any financial institution, you just have to reveal all your financial details whether you like it or not. but what if you can actually prove to these financial institutions that you are able to return their money without them actually sniffing into your entire bank details? and not just financial institutions, what if you can provide anyone with the data they need to confirm anything about you without you actually giving them access to your data?
A zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that something is true, without revealing any information apart from the fact that this specific statement is true.
How does it work?
To explain this, I will simplify the illustration provided on Wikipedia,
Let us assume we have a blind person and two balls, one black and one white, and you would like to prove to the blind person that these balls are of different colors without revealing the individual colors of each ball.
Image Source
For this, you asked the blind person to hide both balls under the table and bring one back for you to see, after that you instructed him to hide the balls under the table and then either show you the same ball or bring the other one. As a result, you can prove to the blind person that the colors are different by saying whether he changed the ball under the table or not.
In this case you can argue that its just a lucky guess and the blind person is not completely certain that the balls have different colors, this doubt is solved by repeating this process n-times, even at that it is known that the blind person cannot fully ascertain that you are lying to him, so it is agreed that you can get a probability that the person is lying to you which will be very small because of the number of iterations that have been performed, but then the probability can never be zero
Zero-Knowledge Proof is used to enable transfer of assets across a distributed, peer-to-peer blockchain network with complete privacy, ZCash is one of the popular cryptocurrency that uses this protocol.
Oh Well! I will stop here for now, as I said earlier, this list is not exhaustive and its subject to modification, if you also know any other algorithms that you would like me to explain here, please drop it in the comment section.