In this article, we take a deeper look into Ethereum’s technology. The specific topics are covered can also be accessed from the links below:
The ‘state’ of Ethereum is not abstract, changes continually, and is significantly more complex than that of Bitcoin.
Managing the ‘state’ or status of things in Ethereum. The first step to understanding how data is structured in the blockchain.
The state or status of things in the Ethereum network continually changes as new users join the network, transactions are executed, contracts are executed, and so forth. In contrast to Ethereum, Bitcoin does not maintain user account balances. With bitcoin, a user simply holds the private keys to one or more unspent transaction outputs (UTXOs) at any given point in time. Digital wallets visually make it appear as though the Bitcoin blockchain automatically stores and organizes user account balances, even though this is not what happens in the background.
A user account balance in Bitcoin is an abstract notion. In reality, a user’s account balance is the sum total of each individual unspent transaction outputs (UTXOs) for which that user holds the corresponding private key. The key[s] which a user holds can be used to individually sign/spend each of the UTXOs. In contrast, Ethereum is able to manage account balances, execute smart contracts, and more. The ‘state’ of Ethereum is not abstract, changes continually, and is significantly more complex than that of Bitcoin.
Recall the simple Bitcoin block header structure?
These are 80-byte chunks of data for each block that contains:
- A hash of the previous header;
- A timestamp;
- A mining difficulty/target value;
- A proof of work nonce; and
- A root hash for the Merkle tree containing the transactions for that block.
With Ethereum, the data structure becomes more complex.
Because Ethereum is designed to be a general-purpose, blockchain-based smart contract platform on which other blockchain applications can be built, it now becomes significantly more complex than Bitcoin. Ethereum nodes now have to store contract information, account information, state information, logs and other data that we’ll get to. Executing transactions, such as sending Ether to someone, or executing the code in a smart contract will continually change the state of the Ethereum network. For example, an account balance would change every time a transaction in relation to that account takes place. Therefore, each block in the Ethereum blockchain represents a snapshot in time.
Interestingly, data such as account balances and transactions are not stored directly in the blocks of the Ethereum blockchain, but more on this later. For scalability reasons, the blockchain size is kept to a minimum by storing cryptographically compacted forms of the underlying data, root hashes.
Ethereum block headers contain additional root hashes
Unlike Bitcoin, Ethereum has the property that every block contains an additional root hash, something called the “state root“. This is the root hash of a specialized type of Merkle tree which links to the entire state of the system: all account balances, contract storage, contract code and account nonces. In fact, in Ethereum, the root hashes of 3 groups of data are stored in the block header:
- Transactions – Transaction request information such as gas price, gas limit, recipient, transfer value, transaction signature values, etc.
- Receipts – Transaction outcome information that shows the effect of each transaction, such as the cumulative gas used and the set of logs created through the execution of the transaction.
- State – Current state information such as accounts, account balances and contract data.
There is another important root hash, storageRoot, the root hash of the storage trie (more on this below), which is where contract data resides. The storage trie root hash is contained in the state trie. Only the root node hashes of the transaction trie, state trie and receipts trie, along with other header information which we’ll get to, are stored directly on the blockchain. The concept of ‘tries’ will be clarified shortly. In the meantime, the storage of the root hashes in the blockchain header is illustrated in the diagram below:
The important take-out for now, is that state data, which can become very large over time, is not stored on the blockchain itself. In the diagram, the root node hash of the storage trie (where smart contract data is kept) actually points to the state trie, which in turn points to the blockchain.
There are two types of data in Ethereum: permanent data and temporary data. An example of permanent data would be a transaction. Once a transaction has been fully confirmed, it is recorded in the transaction trie. It is never altered. An example of temporary data would be the balance of a particular Ethereum account. The balance of an account is stored in the state trie and is altered whenever transactions against that particular account occur. It therefore makes sense that permanent data such as mined transactions, and temporary data such as account balances, are stored separately. Furthermore, Ethereum uses trie data structures (as referred to above), to efficiently manage data on the blockchain.
Merkle Trees
A quick digression to Merkle Trees (also referred to as Merkle tries) for efficient blockchain data storage
For scalability reasons, instead of storing large amounts of data, such as state data, on the blockchain itself, hashes that are cryptographically linked to the underlying data are stored in the blocks instead. The blocks therefore contain transaction data and the root hash of a data structure called the Merkle tree. A Merkle tree is a binary tree composed of a set of nodes (not to be confused with the Ethereum nodes on the network) with:
- a large number of leaf nodes at the bottom of the tree containing the underlying data,
- a set of intermediate nodes where each node is the hash of its two children, and
- a single root node, also formed from the hash of its two children, representing the “top” of the tree.
The overall data is split into chunks (represented by the leaf nodes at the bottom). Groups of two chunks are combined and a hash of this group is calculated. This process is repeated up the tree until the total number of hashes remaining becomes only one, the root hash as demonstrated below:
Ethereum exclusively uses what is known as the “Practical algorithm to retrieve information coded in alphanumeric”(Patricia) tree or trie. The main advantage of the Patricia trie is its compact storage. So we now have a means of drastically reducing the amount of data in the blockchain. Furthermore, the root hash also serves as a unique idenitifier for the underlying data. The underlying data cannot be changed without changing the root hash. In addition, it also makes provision for something called a ‘light node’, but more on this later!
Another very important benefit of structuring data according to a Merkle tree, is the ability to calculate or verify the underlying information, using the root hash. Using the Merkle tree data storage method, a node does not need to hold the underlying data in the blockchain, as the underlying data can be recalculated if necessary. Naturally, nodes/clients on the Ethereum network are able to validate chunks of transactions that have been mined by other mining nodes, without having to store the transaction data itself.
Coming back to Ethereum, the state trie’s root node is a hash of the entire state trie at a given point in time, and is used as a secure and unique identifier for the state trie. The state trie’s root node is cryptographically dependent on all internal state trie data. A tree is required to have a key for every value stored inside it. Beginning from the root node of the tree, the key should tell you which child node to follow to get to the corresponding value, which is stored in the leaf nodes. In the case of Ethereum, the state tree contains a key/value pair for every single Ethereum account that has had a transaction executed against it. The key is a single 160 bit identifier (the address of an Ethereum account). The key/value mapping for the state tree is between the account addresses and their associated account values (which includes things like the balance, nonce, codeHash, and storageRoot for each account). An Ethereum client/node is able to access keys or account addresses using the stateRoot value from a block in the blockchain. As a more concrete example, it is able to calculate an account’s correct balance using the stateRoot value and account address only.
More on the account state…
An account state consists of four components, which are present regardless of the type of account (either a user account or a contract account, but more on this later):
- nonce: If the account is an externally owned, user account, this number represents the number of transactions sent from the account’s address. If the account is a contract account, the nonce is the number of contracts created by the account. This is not to be confused with the nonce used in Proof of Work.
- balance: The number of Wei owned by this address. There are 1018 Wei per Ether.
- storageRoot: A hash of the root node of a Merkle Patricia tree, which encodes the hash of the storage contents of this account.
- codeHash: The hash of the EVM (Ethereum Virtual Machine ) code of this account. For contract accounts, this is the code that gets hashed and stored as the codeHash. For externally owned accounts, the codeHash field is the hash of the empty string. The following diagram from the Ethereum Whitepaper illustrates this in the contex of a trie.
How data is managed in the blockchain
The storage trie is where the contract data lives. Each Ethereum account has its own storage trie. A 256-bit hash of the storage trie’s root node is stored as the storageRoot value in the global state trie.
Each Ethereum block also has its own separate transaction trie. A block contains many transactions, the order of which are determined by the miner that assembled the block. The path to a specific transaction in the transaction trie is via the index of where the transaction sits in the block. Mined blocks are never updated. The position of the transaction in a block is never changed. This means that once you locate a transaction in a block’s transaction trie, you can return to the same path over and over to retrieve the same result.
The structure of the Ethereum block
In Ethereum, a block consists of:
- the block header
- information about the set of transactions included in that block
- a block hash for the current block’s ommers.
An ommer is a block whose parent is equal to the current block’s parent’s parent. Because Ethereum block generation times are much lower (~15 seconds) than those of other blockchains like Bitcoin (~10 minutes), more competing block solutions are found by miners. These competing blocks that do not end up as part of the main chain are also referred to as “orphaned blocks” (i.e. mined blocks do not make it into the main chain) or uncles (because they are close enough to being a parent block, but not the actual parent block).
The reason for catering for ommers is to help reward miners for mining these orphaned blocks. Ommer blocks receive a smaller reward than a full block. Nevertheless, there’s still some incentive for miners to include these orphaned blocks and reap a reward. Creation of multiple competing blocks create multiple paths and a “fork” occurs. To avoid forks and to determine the most valid chain or path, Ethereum uses a mechanism called the “GHOST protocol.” “GHOST” = “Greedy Heaviest Observed Subtree”. The GHOST protocol says we must pick the path that has had the most computation done upon it. One way to determine that path is to use the block number of the most recent block, which represents the total number of blocks in the current path. The higher the block number, the longer the path and the greater the mining effort that must have gone into arriving at the block.
The block header
A block header is a portion of the block consisting of:
- parentHash: a hash of the parent block’s header (this is what makes the block set a “chain”)
- ommersHash (also referred to as uncleHash): a hash of the current block’s list of ommers
- beneficiary: the account address that receives the fees for mining this block
- stateRoot: the hash of the root node of the state trie (recall how we learned that the state trie is stored in the header and makes it easy for light clients to verify anything about the state)
- transactionsRoot: the hash of the root node of the trie that contains all transactions listed in this block
- receiptsRoot: the hash of the root node of the trie that contains the receipts of all transactions listed in this block
- logsBloom: a Bloom filter (data structure) that consists of log information
- difficulty: the difficulty level of this block
- number: the count of current block (the genesis block has a block number of zero; the block number increases by 1 for each each subsequent block)
- gasLimit: the current gas limit per block
- gasUsed: the sum of the total gas used by transactions in this block
- timestamp: the unix timestamp of this block’s inception
- extraData: extra data related to this block
- mixHash: a hash that, when combined with the nonce, proves that this block has carried out enough computation
- nonce: a number that, when combined with the mixHash, proves that this block has carried out enough computation
Before delving further into the other bits of data contained in the blocks, we’ll quickly digress to the concept of a light node, as it relates directly to the previous discussion about the block header.
Different types of nodes
The ability to store information efficiently in Merkle tries is useful in Ethereum for what we call “light clients” or “light nodes.” A node is a device/programme that communicates with the Ethereum network. Nodes are also referred to as clients. Software programmes that can act as an Ethereum node include Parity and Go-ethereum (Geth).
In general, there are two types of nodes: full nodes and light(weight) nodes. Full nodes verify/validate the mined blocks that are broadcast onto the network. That is, they ensure that the transactions contained in the blocks (and the blocks themselves) are valid and follow the rules defined in the Ethereum specifications. For example, the total gas limit of transactions included in a block cannot exceed the block gas limit (more on this later). Another example is when A attempts to send 100 Ether to B, but A has 0 Ether. If a block includes this transaction, the full nodes will detect this and reject that block as invalid. The execution of smart contracts is also an example of a transaction. Whenever a smart contract is used in a transaction (e.g., sending ERC-20 tokens), all full nodes will have to run all the instructions to ensure that they arrive at the correct, agreed-upon next state of the blockchain.
A full archive node synchronizes the blockchain by downloading the full chain, from the genesis block to the current head block, executing all of the transactions contained within. Typically, miners store the full archive node, because they are required to do so for the mining process. Regardless, any full node contains the entire chain and related data. Full nodes that preserve the entire history of transactions are known as full archiving nodes. Nodes may opt to discard old data. So if B wants to send 100 Ether to C, it doesn’t matter how the Ether was obtained, only that B’s account contains 100 Ether.
Light nodes however, do not verify every block or transaction and may not have a copy of the current blockchain state. Unless a node needs to execute every transaction or easily query historical data, there is no need to store the entire chain and historical data. Light nodes rely on full nodes to provide them with missing data if it is required. Light nodes can get up and running much more quickly, can run on lower-spec machines, and don’t require nearly as much storage as a full node. Instead of downloading and storing the full chain and executing all of the transactions, light nodes only download the chain of headers, from the genesis block to the current block, without executing any transactions or retrieving any associated state data. Because light nodes have access to block headers, which contain hashes of three tries, they can still easily generate and receive verifiable answers about transactions, events, balances, etc.
Accounts
Generally when one thinks of accounts, we think of it as an account opened by some user to participate and transact on the network. In Ethereum, accounts are more general than that. There are two types of accounts:
- Externally owned accounts – These accounts have an account balance, can send/receive Ether, are controlled by private keys and have no code associated with them.
- Contract accounts – These accounts have an account balance, contains code, and can execute that code when triggered by transactions or messages (calls).
All actions in Ethereum are set in motion by transactions fired from accounts. For example, every time a contract account receives a transaction, its code is executed as instructed by the input parameters sent as part of the transaction. In Ethereum, a transaction refers to a digitally signed data package that stores messages to be sent to an account.
An externally owned account can send messages to other externally owned accounts as well as to other contract accounts by creating and signing a transaction using its private key. A message between two externally owned accounts is simply a value transfer. But a message from an externally owned account to a contract account activates the contract account’s code, allowing it to perform various actions (e.g. transfer tokens, write to internal storage, mint new tokens, perform some calculation, create new contracts, etc.). Unlike externally owned accounts, contract accounts can’t initiate new transactions on their own instead, contract accounts can only fire transactions in response to other transactions they have received (from an externally owned account or from another contract account).
Transactions and Messages
Transactions and Messages are terms that are frequently used interchangeably. However messages, unlike transactions, are produced by a contract and not an external actor. A message is produced when a contract currently executing code executes the CALL or DELEGATECALL operation code (opcode). Messages are also sometimes called “internal transactions”. Like a transaction, a message results in the recipient account running its code. Contracts can therefore have relationships with other contracts in exactly the same way that external actors can.
All transactions contain the following components, regardless of their type:
- nonce: a count of the number of transactions sent by the sender.
- gasPrice: the number of Wei that the sender is willing to pay per unit of gas (analgous to fuel that powers vehicles) required to execute the transaction.
- gasLimit: the maximum amount of gas that the sender is willing to pay for executing this transaction. This amount is set and paid upfront, before any computation is done.
- to: the address of the recipient. In a contract-creating transaction, the contract account address does not yet exist, and so an empty value is used.
- value: the amount of Wei to be transferred from the sender to the recipient. In a contract-creating transaction, this value serves as the starting balance within the newly created contract account.
- v, r, s: used to generate the signature that identifies the sender of the transaction.
- init (only exists for contract-creating transactions): An EVM code fragment that is used to initialize the new contract account. init is run only once, and then is discarded. When init is first run, it returns the body of the account code, which is the piece of code that is permanently associated with the contract account.
- data (optional field that only exists for message calls): the input data (i.e. parameters) of the message call. For example, if a smart contract serves as a domain registration service, a call to that contract might expect input fields such as the domain and IP address.
One important thing to note is that internal transactions or messages do not contain a gas limit. This is because the gas limit is determined by the external creator of the original transaction (i.e. some externally owned account). The gas limit that the externally owned account sets must be high enough to carry out the transaction, including any sub-executions that occur as a result of that transaction, such as contract-to-contract messages. If, in the chain of transactions and messages, a particular message execution runs out of gas, then that message’s execution will revert, along with any subsequent messages triggered by the execution. However, the parent execution does not need to revert.
Transaction verification and execution
Before a transaction can be executed, it must meet an initial set of requirements. These include:
- The transaction must be a properly formatted RLP. “RLP” stands for “Recursive Length Prefix” and is a data format used to encode nested arrays of binary data. RLP is the format Ethereum uses to serialize objects.
- Valid transaction signature.
- Valid transaction nonce. Recall that the nonce of an account is the count of transactions sent from that account. To be valid, a transaction nonce must be equal to the sender account’s nonce.
- The transaction’s gas limit must be equal to or greater than the intrinsic gas used by the transaction. The intrinsic gas includes:
-
- a predefined cost of 21,000 gas for executing the transaction
- a gas fee for data sent with the transaction (4 gas for every byte of data or code that equals zero, and 68 gas for every non-zero byte of data or code)
- if the transaction is a contract-creating transaction, an additional 32,000 gas
- The sender’s account balance must have enough Ether to cover the “upfront” gas costs that the sender must pay. The calculation for the upfront gas cost is simple: First, the transaction’s gas limit is multiplied by the transaction’s gas price to determine the maximum gas cost. Then, this maximum cost is added to the total value being transferred from the sender to the recipient.
If the transaction meets all of the above requirements for validity, it can then be processed. Once all the steps required by the transaction have been processed, and assuming there is no invalid state, the state is finalized by determining the amount of unused gas to be refunded to the sender. Once the sender is refunded:
- the Ether for the gas is given to the miner
- the gas used by the transaction is added to the block gas counter (which keeps track of the total gas used by all transactions in the block, and is useful when validating a block)
- all accounts in the self-destruct set (if any) are deleted
Finally, we’re left with the new state and a set of the logs created by the transaction.
One very important concept in Ethereum is the concept of fees. Every computation that occurs as a result of a transaction on the Ethereum network incurs a cost. Miners are expending energy to run computations and validate transactions, and therefore require a fee for their effort. They go through the transactions listed in the block they are verifying and run the contract code as triggered by those transactions. Each and every full node in the network does the same calculations and stores the same values. The fact that contract executions are redundantly replicated across full nodes naturally makes them expensive, hence the need for fees.This fee is paid in a denomination called “gas.” Gas is the unit used to measure the fees paid for a particular computation. Gas price is the amount of Ether you are willing to spend on every unit of gas, and is measured in “gwei”. “Wei” is the smallest unit of Ether, where 1018 Wei represents 1 Ether. One gwei is 1,000,000,000 Wei.
The fact that contract executions are redundantly replicated across full nodes naturally makes them expensive…
With every transaction, a sender sets a gas limit (the total amount of gas the sender is willing to pay for its transaction to be executed and validated) and gas price (the amount of Ether or Wei per unit of gas). The product of gas price and gas limit represents the maximum amount of Wei that the sender is willing to pay for executing a transaction, whether this is to transfer Ether to another user, or to have a particular piece of functionality in a smart contract to be executed. For example, let’s say the sender sets the gas limit to 50,000 and a gas price to 20 gwei per unit of gas. This implies that the sender is willing to spend at most 50,000 x 20 gwei = 1,000,000,000,000,000 Wei = 0.001 Ether to execute that transaction.
Unfortunately, it is often not easy and in general even impossible to know in advance how much gas a transaction will need eventually. Therefore, transactions have a gas limit field to specify the maximum amount of gas the sender is willing to buy. If the gas used exceeds this limit during execution, processing is stopped. The sender still has to pay for the performed computation, but they are protected from running completely out of funds. The transaction gas limit also protects full nodes from attackers, who could, without a gas limit, make them execute effective infinity loops. If such a transaction would take longer than one block to process, it could never be included in a block, and, thus, the attacker wouldn’t need to pay for it.
Each operation in the EVM is assigned a number of how many gas units it consumes. The gasUsed value is the sum of all the gas amounts for all the operations executed. For estimating gasUsed, there is an ‘estimateGas API‘ that can be used but, but it comes with some caveats.
In the case that the sender does not specify a sufficient amount of gas (gas limit, sometimes called startGas) to execute the transaction, the transaction runs “out of gas” and is considered invalid. In this case, the transaction processing aborts and any state changes that occurred are reversed, such that we end up back at the state of Ethereum prior to the transaction. In addition, a record of the transaction failure is recorded, showing what transaction was attempted and where it failed. And since the machine(s) already expended effort to run the calculations before running out of gas, logically, none of the gas is refunded to the sender.
Miners have the choice of whether to include a transaction in a block and collecting the fee or not. In reality though, all transactions are picked up by miners eventually, but the amount of transaction fees that a user chooses to send affects how long it will take until the transaction is mined. If a user was using the Mist application to execute transactions, the following screenshot would be an example of how to set the fee:
All the money spent on gas by the sender is sent to the “beneficiary” address, which is typically the miner’s address. Typically, the higher the gas price the sender is willing to pay, the greater the value the miner derives from the transaction and the more likely miners will be to select it. In this way miners are free to choose which transactions they want to validate or ignore. In order to guide senders on what gas price to set, miners have the option of advertising the minimum gas price for which they will execute transactions.
Not only is gas used to pay for computation steps, it is also used to pay for storage usage. The total fee for storage is proportional to the smallest multiple of 32 bytes used.
Fees for storage have some nuanced aspects. For example, since increased storage increases the size of the Ethereum state database on all nodes, there’s an incentive to keep the amount of data stored small. For this reason, if a transaction has a step that clears an entry in the storage, the fee for executing that operation of is waived, AND a refund is given for freeing up storage space.
The block gas limit is the maximum amount of gas allowed in a block and will determine how many transactions can fit into a block. For example, let’s say we have 5 transactions where each transaction has a gas limit of 10, 20, 30, 40, and 50. If the block gas limit is 100, then the first four transactions can fit in the block. Miners decide which transactions to include in a block. A different miner could try including the last 2 transactions in the block (50+40), and they only have space to include the first transaction (10). If you try to include a transaction that uses more gas than the current block gas limit, it will be rejected by the network and your Ethereum client will give you the message “Transaction exceeds block gas limit”. The protocol allows the miner of a block to adjust the block gas limit by a factor of 1/1024 (0.0976%) in either direction.
Miners on the network decide what the block gas limit is through a voting process. Miners on Ethereum can use a mining program such as ethminer, which connects to a geth or Parity client/node. geth and Parity have command line options, e.g. ‘targetgaslimit value’, that miners are able to change (for miners or for developers on a private network?). While it’s advised to not alter a chain’s gas limit once created, it may become necessary to fiddle with the gas limit of an existing private blockchain, especially during development.
Mining in Ethereum is somewhat complex, and will not be covered here. Ethereum’s proof-of-work algorithm is called “Ethash”. The principle is the same as that of Bitcoin’s proof of work. Howerver, Ethereum chose to make its PoW algorithm (Ethhash) sequentially memory-hard. This means that the algorithm is engineered so that calculating the nonce requires a lot of memory AND bandwidth. The large memory requirements make it hard for a computer to use its memory in parallel to discover multiple nonces simultaneously, and the high bandwidth requirements make it difficult for even a super-fast computer to discover multiple nonce simultaneously. This reduces the risk of centralization and creates a more level playing field for the nodes that are doing the verification.