In December 2017, something strange happened in the Ethereum blockchain. If you were trying to perform transactions on the network, you’d notice that they would take longer than usual to get confirmed. That was because of a game called Crypto Kitties. The game was so popular that it made up 20% of the Ethereum network traffic.
The main concern here isn’t that a silly game could become so popular, but instead, how one single application had the power to place all network processing into congestion and ended up placing a lot of pressure for Ethereum to address its scalability issue.
Currently, a blockchain cannot process more transactions than a single node because every node in the network has to process all transactions and store all states. This provides security but limits scalability. Ethereum is processing around half a million transactions per day (7-15 transactions per second) and Bitcoin only process around 50 thousand per day (3-7 transactions per second). These might sound like good numbers, but when we compare them against Visa transactions (4000 transactions per second), they fall short.
The question to ask here is this: Is there a solution for solving the scalability issue for Ethereum?
Some Basic Vocabulary
Before we dive into sharding, let’s take a look at some important vocabulary (feel free to skip to the next section) if you are already familiar with these):
- State: a set of information that represents the current state of a system. In Ethereum, this is the current account set containing current balances, smart contract code, and nonces at some point in time.
- History: an ordered list of all transactions that have taken place since genesis.
- Transaction: represents an operation that some user wants to make, and is cryptographically signed. It changes the state of a system.
- State transition function: a function that takes a state, applies a transaction, and outputs a new state.
- Merkle tree: a cryptographic hash tree structure that can store a very large amount of data, where authenticating each individual piece of data only takes O(log(n)) space and time. More about merkle trees here.
- Receipt: an object that represents an effect of a transaction that is not directly stored in the state, but is still stored in a merkle tree (e.g. logs in Ethereum are receipts).
- State root: the root hash of the merkle tree representing the state.
What is Sharding?
In the world of blockchain, there seems to be a trilemma that claims that blockchain systems can only, at most, have two of the following three properties:
- Decentralization
- Scalability
- Security
Currently, Ethereum is decentralized and secured but not scalable.
There have been many ideas proposed in order to solve the problem, but there is one in particular that is now a part of Ethereum’s 2.0 roadmap: Sharding.
Vitalik Buterin (Co-founder of Ethereum) describes this concept with the following example:
Imagine that Ethereum has been split into thousands of islands. Each island can do its own thing. Each of the islands has its own unique features and everyone belonging on that island i.e., the accounts, can interact with each other AND they can freely indulge in all its features. If they want to contact other islands, they will have to use some sort of protocol.
The idea behind sharding is to split the state and history of the network into multiple partitions or shards. For example, a sharding scheme on Ethereum might put all addresses starting with 0x00 into shard 1, all addresses starting with 0x01 into shard 2, an so on and so forth. Each new transaction on each individual shard would change the state of that shard only.
With a protocol such as this, every shard processes its own portion of the state of the network, which allows the system to process many transactions in parallel, thus significantly increasing throughput. Each one of the shards (likely to be 1024 in phase 1) will have as high of a capacity as the current Ethereum chain.
How would a sharded Ethereum blockchain look like?
Below you can see the image of how Ethereum 2.0 will look like. We’ll still have the current Proof of Work (POW) main chain but now there will be a sidechain called the beacon chain that stores hashes to main chain blocks in its own blocks.
Anatomy of Ethereum 2.0. Diagram by Hsiao-Wei Wang
In this setup, there is still going to be smart contracts that live on the main chain and as part of the main chain state, there will be a registration contract that users will use to make a fixed-size one-way deposit of 32 ETH for them to become validators, which is the name given to participants in the Casper/sharding consensus system.
It might seem a little complicated to have two chains in parallel but bear in mind that now the beacon chain will be the center of Ethereum 2.0 since it will store and manage the set of active validators as well as being the base of the sharding system.
We can see the sharding solution in two layers. Let’s start from the top with the beacon chain:
Source: Hackernoon
In the diagram above, we can see the beacon chain. Notice that in each block, we store two roots: one that describes the state of the network, which is divided into shards, and one that contains the information about all the verified collation headers.
Collations are nothing more than summary descriptions of the state of a specific shard, and they are created by proposers. A proposer is a validator that creates collations. This is an example of what a collation looks like:
Source: Hackernoon
Collations are basically groups of transactions that belong to one single shard. Each collation, as seen in the image above, has a collation header which contains information about which shard belongs to, state of the shard before and after being processed, and the receipt root after all transactions are verified. Also, on the right side of the header we can see the information about the attesters (previously known as notaries) nodes that verify the collation.
Attesters are also validators. They are part of a committee that needs to sign off on a beacon chain block while simultaneously creating a link (cross-link) to a recent shard block on a particular shard chain.
We have seen how a sharding Ethereum will look like and also how new terminologies might make it daunting to understand how everything works, so let’s take a look at a more broad description of the structure.
Ethereum 2.0 Structure
The beacon chain is a central PoS (Proof of Stake) chain which stores and manages the current set of active validators. Validators can be proposers, attesters, and committees that, at first, are part of the active validator set that will later assume the role. Again, the way they become validators initially is by sending a transaction on the existing PoW main chain burning 32 ETH.
Once the transaction is processed in the PoS chain (beacon chain), the validator is queued and eventually inducted as an active validator until it either voluntarily logs out or it is forced to logout as a penalty for misbehavior.
The primary source of load on the beacon chain are attestations. Attestations simultaneously attest to a shard block and a corresponding beacon chain block. A sufficient number of attestations for the same shard block create a crosslink, confirming the shard segment up to that shard block into the beacon chain.
Simply put, a cross-link is a special type of transaction that says: “Here is the hash of some recent block on shard X. Here are the signatures from at least 2/3 of a randomly selected sample of M validators (eg. M = 1024) that attest to the validity of the cross-link”.
Every shard is itself a PoS chain, and the shard chains are where the transactions and accounts will be stored.
These crosslinks serve as infrastructure for asynchronous cross-shard communication and to confirm segments of shard chains into the main chain.
Single-Shard Takeover Attack
As you might be aware of, in Proof-of-Work it’s said that if someone can possess 51% or more of the hashing power in the network, this malicious miner would be able to force fraudulent transactions. In the case of Ethereum, the cost of attacking the blockchain would cost around $70 million, so even though it is possible, it would require a lot money and resources.
But when it comes to working with shards, it becomes much easier to attack the blockchain. The reason is simple, this time, the attacker just needs to take over the majority of collators in a single shard to create a malicious shard that can submit invalid collations — this is called a single-shard takeover attack.
Think of it this way: if there are 100 shards, the attacker can focus on one particular shard and it would only need 1% of the hash rate of the network.
In order to solve this, it is proposed to perform a random sampling of validators on each shard. The way it works is that each shard is assigned a certain number of attesters (e.g. 150), and the attesters that approve collations on each shard are taken from the sample for that shard. Then, these samples can be reshuffled either semi-frequently (e.g. once every 12 hours) or maximally frequently (i.e. there is no real independent sampling process, attesters are randomly selected for each shard from a global pool every block).
The goal is that these validators will not know which shard they will get in advance. Also, the source of randomness needs to be common to ensure that this sampling is entirely compulsory and can’t be gamed by the validators in question.
Proof-of-Stake
Most blockchains run on Proof of Work (PoW)*, which means that miners solve cryptographic puzzles in order to validate transactions. Current versions of Ethereum and Bitcoin use PoW algorithms that are not very efficient and end up using massive amounts of electricity. Moreover, it encourages the use of mining pools which make the blockchain more centralized as opposed to decentralized.
Proof of Stake (PoS) is a very efficient consensus algorithm with significant advantages that include security, reduced risk of centralization, and energy efficiency. In PoS, the consensus nodes are known as validators, whereas in PoW, they are known as miners. The way it works is that a set of validators take turns proposing and voting on the next block and their weight of each validator depends on the size of its deposit (stake).
To become a validator, the node has to deposit a certain amount into the network as stake. The size of the stake determines the chances of the validator to be chosen to forge the next block. The way we can trust validators in the network is that they will lose their stake if they approve fraudulent transactions.
The development team at Ethereum has chosen to use PoS and although by itself, it is promising, there are still some issues that need to be solved before using it, like the nothing at stake problem.
The nothing at stake problem is the assumption that every validator will build on every fork when a fork takes place. This happens because there is nothing that prevents them from collecting the fees from building in both (or more) chains.
Enter Casper FFG
Casper is a PoS consensus protocol being developed by the team at Ethereum and just as we would expect from a PoS protocol, it reduces the cost of consensus - which is the amount of energy cost when compared with PoW. However, it does a lot more than that.
Firstly, Casper is designed to work in a trustless system and be more Byzantine fault tolerance. This is anyone who acts in a malicious/Byzantine manner will get immediately punished by having their stake slashed off. Casper solves the nothing at stake problem because if a validator acts in a malicious manner, the validator will immediately lose his/her stake. It is designed to encourage and guarantee network security, including punishing miners who go offline, unintentionally or not.
Secondly, Casper FFA is a hybrid PoW/PoS consensus mechanism. It implements a PoS mechanism as an overlay on top of a PoW chain. This is designed to ease the transition into PoS. The way it will work is the main blockchain will still be mined via PoW and every certain block is going to be a PoS checkpoint where finality will be assessed by a network of validators.
Finality is the guarantee that past transactions can never change. This concept is very important because most blockchain systems only offer probabilistic transaction finality, which is transactions are not immediately final, but become so eventually.
In the case of Casper FFG, it will achieve finality by introducing the notion of validators. The validators are responsible for confirming the blockchain at key checkpoints . At these checkpoints, once 2/3 of the validators confirm a particular block, then it becomes finalized. Once finalized, it will no longer be possible to change any of the blocks before the checkpoint.
More about Byzantine Fault Tolerance here.
Main Changes in the Sharding Roadmap
Ethereum has an active area of research and development for Ethereum 2.0, which is the transition to sharding and Casper FFG implementation.
The implementation of Casper FFG in a PoS beacon chain is the first priority for the Ethereum team, and even though there isn’t a particular release date, it seems to be slated for 2019.
On the other hand, sharding was broken down into two big phases.
The first phase won't have any execution or EVM, so it won't integrate with the main net. This phase will focus on establishing the basic structure of sharding, which is the data layer, coming to consensus as to what data is in the shards.
Phase two is all about the state, giving meaning to the data and the notion of transaction. Here is where it will have an EVM, and will introduce backwards incompatible changes at a smart contract level, like storage rent.
Just like in the Casper implementation, there is not an official release date but it is expected that we will see these changes in 2020 for the first phase and 2021 for the second.
Cross-Shard Communication
One of the most important aspects of sharding would be to implement some method of cross-shard communication. What if you wanted to send a transaction from address X in shard 1 to address Y in shard 3?
This cross-shard communication will be achieved through applying the concept of transaction receipts. The receipt for a transaction will be stored in a transaction group merkle root in the main chain block. The shard receiving a transaction from another shard will check the merkle root to ensure that the receipt has not been spent. Essentially, the receipts are stored in a shared memory that can be verified by other shards, but not altered. Therefore, shards will be able to communicate with each other.
Example of the concept of receipts for the cross-sharding system (Diagram by Sharding FAQ)
Future Implementations in Sharding
Going beyond the initial sharding development, it is possible that Ethereum will adopt a super-quadratic sharding scheme. This means that the system will have shards within shards.
The potential for scalability would be massive — a super-quadratically sharded blockchain could potentially go to into hundreds of thousands of transactions per second (perhaps even more). This will offer tremendous benefits to users, decreasing transaction fees and serving as a more general purpose infrastructure for new applications. For now, this is way down the road in the development roadmap (Phase 6) but it is certainly worth mentioning.
Conclusion
Sharding may look like a very comprehensive solution for scalability, and in many ways it is, but there is still a lot of work to be done.
It is important to highlight that sharding will exist exclusively at the protocol layer and will not be exposed to developers. The Ethereum state system will continue to look as it currently does, but the protocol will have a built-in system to manage the shards so everything will be in the background for developers.
Finally, one thing is sure, once sharding and Casper are fully merged into the blockchain and another game or app can become as popular as Crypto Kitties, it will be very difficult for the network to get congested again.