Fork Intro
What’s fork?
When we come into contact with blockchain, we certainly often hear about fork, sometimes soft fork, sometimes hard fork. So what do these concepts mean? How are forks generated?
A blockchain system, whether it is Bitcoin, Ethereum or Ripple, is actually a software.In the beginning, all nodes run the same version of software, follow the same consensus protocol, and maintain the same blockchain.Then one day, the blockchain software has a new version. At this time, some nodes will choose to upgrade to the new software version,Can the blocks generated by the new software be recognized by the old software?Of course, there are two possible outcomes. either the old software recognizes the new block or it doesn't. Both situations have occurred in history.Let's look at two examples.
For the first example, let's look at the following figure:
This is the digital format for recording a transaction in Bitcoin, which is called the transaction data format. That is to say, we can use such a format to clearly represent a transaction.
Let's take a look at the field highlighted in in red and bold above. Originally, this field was not explicitly defined and was not used. Satoshi Nakamoto, the creator of the Bitcoin system, reserved this field but did not use it.Later, in 2016, the community wanted to use this field for smart contract functionality for payments on the Bitcoin blockchain, using the four bytes of space available.This is actually an upgrade to Bitcoin software.
Because such undefined field are not carefully verified on the node of the old version, nodes that choose to upgrade their software can produce new blocks according to defined rules, which can be verified by older versions of nodes.
However, this is obviously not a long-term solution, as defining one field means it becomes redundant after one use.
Let's look at the second example.Also in 2016, Ethereum underwent a relatively large change. Due to the large change, the blocks generated by the new software could not pass the verification of the old software.Ideally, it is better for everyone to choose to upgrade the new software.But the reality is that this change has not been approved by everyone, and some people chose not to upgrade the software.What is the final result?The new version and the old version go their own way and generate their own blockchains. In this way, the blockchains that originally belong to the same one are split into two at the end, that is, ETH and ETC.
Soft Fork & Hard Fork
Soft fork occurs: When a new version of software (or protocol) appears in the system, and the old software accepts blocks of the new software, both the old and the new versions work on the same chain from beginning to end, this is called soft fork.
Hard fork occurs: When a new version of software (or protocol) appears in the system and is incompatible with the previous version , the old software node cannot accept all or part of the new blocks mined by the new software node (considered illegal), resulting in two separate chains.Even if the new nodes have a larger hash power,such as 99%
of the computing power is new nodes, 1%
of the old nodes will still maintain a separate chain, because the blocks generated by the new nodes are unacceptable to the old nodes (although they know that 99% of the nodes on the network accept them ).This is called a hard fork.
As you can see from the above concept, only hard fork lead to the blockchain being split into two chains. Therefore, at first, the term fork
referred only to a hard fork, and later the concept of a soft fork was introduced to distinguish between the two.
Soft forks do not require all nodes to upgrade at the same time,allowing for gradual upgrades, and do not affect the stability and effectiveness of the system in the process of soft fork process. It can ensure that people who do not want to upgrade do not have to, which is a common requirement in real life.
The upgrade space for soft forks is limited because the current Bitcoin transaction data structure and block data structure have all fields well defined. If you want to ensure forward compatibility, t is impossible to add new fields, or the old node will reject them.So the upgrade space for soft forks is restricted to the redefinition of existing fields. This includes that the "block size" field in the block data structure cannot be redefined, soft fork can never achieve a breakthrough in the 1M block. Moreover, if there is a slight error in this extremely complex compatibility, it may cause new and old nodes to be incompatible, leading to a hard fork. This has already happened once before.
The upgrade space for hard forks is much larger,because hard forks only need to consider whether old nodes can accept the transactions and blocks produced by previous versions,and do not need to consider whether the old nodes can accept the transactions and blocks produced by the new node.Therefore,hard forks allow for more aggressive modifications to transaction and block data structures.
Finally, we will see that there is a field called "version number" in the transaction data structure and the block data structure, which means "clarifying the rules for this transaction or block reference".This means that it was originally intended to use a hard fork to modify these rules. That is, if we want to modify the rules, we need to redefine the version number. However,a softfork modifies the rules without modifying the "version number".The soft fork and hard fork involve the problem of decentralized node software, protocol and version upgrade, which is very important and worth discussing.Therefore, when developing and maintaining a blockchain, forking is an important issue that needs to be considered in advance.
Why Forks Occur?
Forking occurs mainly when a transaction is initiated again when the network node has not finished processing, resulting in a UTXO being spent more than twice. This creates different blocks in the network.Because block creation depends on the previous block, the subsequent blocks will also gradually diverge, which is the most common reason for generating blocks.
Generally,in this situation, there are two possibilities:
- Due to network congestion, if a newly created transaction from an account fails to be broadcasted to the entire network, using the same UTXO to initiate another transaction and building a block can cause a fork.
- If an account initiates transactions simultaneously on multiple nodes and broadcasts them through different paths for block building, it can cause forks.
Double-spending
Double-spending [^5]:https://en.wikipedia.org/wiki/Double-spending
Management of Forks
When a transaction is created and passes through the consensus layer, it will be verified on a randomly selected node that has the best data state. The first scenario mentioned above occurs when a transaction is used for the second time after the UTXO is first used. The available nodes with the best data state will identify this and abandon the transaction to prevent double spending. If the second scenario occurs, the transaction will be discarded after being checked during the pre-growth phase in the block pool, making it impossible to complete the consensus and avoiding double spending.
If for some reason a fork still occurs, it is necessary to resolve the issue as soon as possible. Transformers has established a cyclic check mechanism in the synchronization algorithm, which uses the Practical Byzantine Fault Tolerance algorithm to perform block missing and block incorrectness checks on node data. If a node detects a missing or incorrect block, it will use the aforementioned mechanism to add or remove the block.