Ethereum Yellow Paper Mathematics Deciphered | Part 3: Block

October 8, 2022

This part of the series dives into section 4.3 of the Yellow Paper, that elaborates the structure of blocks.

If you haven't yet, go through Part 0 of this series to get conventions right.

Block

A block, in Ethereum comprises three pieces of relevant information:

  • Block Header,
  • List of transactions, included in the block
  • A deprecated property, which prior to Paris fork, represented headers of blocks whose parents were present block's parent's parent (a.k.a ommers)
  • A collection of validator withdrawals pushed by consensus layer

Hence, a block can be represented as:

The header contains several pieces of information:

  • (): Hash of parent block's header.
  • (): Hash that is now set to constant, since deprecation.
  • (): Beneficiary address priority fee from all transactions in this block.
  • (): Hash of root of state trie after all transaction in block are finalized.
  • (): Hash of root of trie containing all transactions in this block.
  • (): Hash of root of trie containing all receipts corresponding to each transaction in this block.
  • (): Bloom filter created from indexable information in logs of transaction receipts of transactions in this block. (See bloom filter section).
  • (): A scalar value representing difficulty of this block.
  • (): Number of all ancestor blocks of this block.
  • (): Current limit of gas expenditure per block.
  • (): The total gas used by all transactions in this block.
  • (): Unix time when this block was created.
  • (): An arbitrary byte array containing data relevant to this block. This must be 32 bytes or fewer.
  • (): The latest RANDAO mix (a pseudorandom value) after finalization of previous block. RANDAO is a mechanism for generating randomness for the network. See here for detail.
  • (): A 64-bit value which now deprecated and set to constant 0 due to replacement of proof-of-work to proof-of-stake.
  • (): Amount of wei burned per unit of gas consumed in this block.
  • (): Hash of root of trie populated with withdrawals by validators.

Transaction Receipts

The transaction receipts correspond to each transaction in a block and contains information related to execution of the transaction. Given a block having a list of receipts , the th receipt is indexed/denoted by . A single receipt is tuple of five items:

  • - Type of transaction.
  • - Status code of transaction.
  • - Cumulative gas used in the block immediately after this transaction happened.
  • - Set of logs created (aka events emitted) through execution of this transaction.
  • - The bloom-filter composed of the logs.

Hence it can be denoted as:

And, like other entities, corresponding collapse function for a receipt, is defined as:

The values are restricted/belong to their respective sets:

Notice that is list/series of log entries - where each log entry is tuple comprising:

  • - Logger's address (contract that emitted event log)
  • - Series of log 32-byte topics, . You might know these as indexed parameters in an event. This could possibly be empty.
  • - Some number of bytes of data. The un-indexed parameters of an event go into these.

This is saying same as:

where is 20-bytes address. Each entry in list is 32-byte indexed event item. And is arbitrary length byte data representing non-indexed events' data. In mathematical words it is same as conveying:

Bloom Filter

The bloom filter is derived from a Bloom filter function, . A bloom filter is a probabilistic data structure that allows to ascertain whether an element in a set efficiently & rapidly.

I highly recommend you to read up more about Bloom filters here before proceeding this section.

operates on logger's address and all each of the indexed log entries, and produces a bloom filter which is 256 byte (2048 bits) hash.

Note: Only event logs marked "indexed" (in Solidity) are included in bloom filter. This is why you can only filter events data by these "indexed" event parameters.

The paper formally defines the Bloom filter function, as,

Let's decode it.

is a specialized bloom filter that takes an arbitrary byte sequence and outputs a 2048 bits (256-bytes) filter. This has 3 bits set out of 2048 bits.

The symbol above represents the bitwise operation over a series of byte sequences i.e. multiple 2048-bits sequences. In this context, the equivalence can be written as:

where is the bitwise operation.

Now onto the working of function.

takes an arbitrary length byte sequence, , and outputs a 2048-bits or 256-bytes length sequence , i.e.

This 2048 bits length output has all bits set to 0, except 3 bits.

Which of the 3 bits to set is defined by function's algorithm:

  • First it keccak-256 hashes the byte sequence i.e. .
  • Then takes the first three pairs of bytes (byte pairs at indices ) from the hash, .
  • And then from each pair of bytes at indices (), it takes the low-order 11 bits out of 16 bits (each byte pair is 16 bits).
  • Each of these 11-bit value (corresponding to each of 3 byte pairs) is subtracted from 2047 (highest 11-bit value in decimal) to output a set of 3 indices ranging from 0 to 2047.
  • These are the obtained indices at which bit is set to 1 in byte sequence from .

Mathematically, the paper defines a helper function which extracts the required 11-bits:

As described earlier, it extracts 11 bits from two bytes ( & ) of hash of . The sequence formed by bytes at and are 16-bits and by taking its with 2048 reduces it to 11 bits. This is because 2048 is base for 11 bit i.e. (11 bits). Hence applying on any value yields output between 0 and 2048 i.e. max 11 bits.

Paper also defined a rather simple , bit-reference function which simply references bit at index in a given byte sequence.

is utilized to set bits at calculated indices to 1 in as:

And this is how bloom-filters are calculated for logs.

Block Validity

A block's validity is satisfied if and only if certain conditions are met:

  • Block's ommers field must be empty (post Paris fork).
  • The new root of state trie, must be equal to root of the trie derived from new state after transactions of this block, are finalized at state, .

where is state collapse function.

Recall from that if is current world state then defined the next state.

  • The must be the root of the trie which is derived from pairwise RLP-encoding withdrawals:

where

  • The block header , must be equal to root of trie which is derived from RLP-encoding of all transactions, in the block. The transaction, at index , (= ) is RLP-encoded based on transaction its type, .

where function RLP-encodes a transaction, depending on type:

Note that index ( in above) of transaction in block is also encoded.

  • Likewise the , is derived from RLP-encoding of receipts which depends of receipt transaction type, :

where function RLP-encodes a transaction, depending on type:

  • must bitwise of all bloom filters, derived from each receipt.

Combining all of the above, a block's validity is constrained by:

Futhermore, paper defines the equation,

If is current block then its parent block is . Then, this parent's header's state trie root component is denoted as . The equation above says that is equal to the state trie root at world state which is when the block, was finalized.

Serialization

The paper lays out functions and for serialization of block header and block respectively:

where takes a special care of EIP-2718 transactions:

with , and are corresponding element-wise serialization functions for transactions, ommers and withdrawals in the block respectively.

The header component values belong to respective sets:

where, as already defined in convention, is a bytes sequence and exactly equal to bytes in size. Saying same as:

Block Header Validity

The parent block of block, is denoted as () which is formally defined as:

i.e. the parent of is such that , of of 's header is equal to keccak hash of (RLP encoded) header of . That is basically definition of holding true.

The block number, is simply parent's block number, incremented by 1:

The base fee per gas field of block header was introduced in London release. is the amount of wei burned per unit of gas consumed in the block.

The expected value of base fee per gas is defined by function :

Remember is the block number at London fork.

is gas target, a value obtained by dividing gas limit by a global constant a.k.a the elastic multiplier. is set to 2.

is the value by which the fee is adjusted.

You may know that gas fees of the Ethereum network adjusts in response to the conjestion of transactions - more transactions being broadcasted means more fees. More transactions also mean more gas being consumed in a block i.e. goes up. Therefore, increase in leads to increase in fee . Similarly, decrease in makes also go down. The equivalence above defines exactly this mechanism. How? Let's see.

is defined as:

This basically conveys that the adjustments to the fee should occur (increase/decrease) depending on whether gas consumed in previous (parent) block hit above or below the half (=2) of its gas limit. Needless to say, this "half of previous block's gas limit" is just target .

The first case, of simply resets the base fee per gas to at the block of London fork. So that future adjustments vary around this value.

The second case, of sets current block's base gas fee same as that of parent block, since target hit perfectly.

The next two cases ( and ) of defines the most cases where adjustment mechanism kicks in. For clarity, we only focus on the case where target was not even reached i.e when for now.

When gas consumed in parent block was less than the target , the fee must go down by some magnitude to incentivize the market to perform more transactions. The magnitude of the decrease by which fee must decrease from parent block's fee is denoted by . This can also be seen in the third case of where is subtracted from previous block's fee.

is calculated based of the difference in the gas target and gas consumed in parent block i.e. .

Precisely according to paper is proportional to a value as:

Since Ethereum is not a dead chain, in almost all cases will be greater than zero as long as at least one transaction is processed in the block. In that case, it implies that . It also implies that the in above equation. That means will be less than gas fee of parent block since multiplied a value less than 1 will only decrease it.

But technically, can assume 0 value if there were no transactions in parent block (hence no gas consumed). In that case, will simply be equal to .

In set notation, above is same as saying:

Let's assume instead of being proportional, was set equal to . In that case, as approached very close to , will become approximately equal to . If that was the case the base gas fee as defined in will become 0. But that would be devastating to the network!

To avoid this situation the effect of gas consumed on mechanism is dampened. This is done by introducing a global constant a.k.a base fee max change denominator. Currently is set to 8:

Instead of setting equal to , we divide the by :

Now, even if becomes equal to , will be 1/8th of at max. Thus, the current block's fee can only be changed 1/8th or 12.5% of parent block's fee.

That explains the base fee adjustment mechanism. The logic for the last case can be followed the same as the third one to reach similar conclusion.

and is laid out in paper as:

A minor difference is for case for value of as in that case it may assume value 0 if we inspect for that case. For that case, we simply assert using that minimum value of can be 1.

The canonical gas limit of block is constrained to be within certain tight bound:

Notice that is not exactly parent block gas limit but instead a slightly adjusted value of actual parent block gas limit . This value is adjusted from the London fork i.e. at block and by - the elastic multiplier. Precisely:

is multiplied at London fork because that is also where base fee per gas was redefined with above. Notice that at that point the target for gas consumption in block starts to be calculated as parent block gas limit divided by same (see ). From this information we can infer that is updated by multiplying by in to avoid discontinuity in gas usage graph.

The block timestamp is constrained to be strictly greater than parent block timestamp:

From Paris fork (change in consensus mechanism from proof of work to proof of stake) deprecated multiple block header properties:

  • , has been replanced by ,
  • is set to a constant:
  • is set to a constant:
  • is set to a constant:

Finally, we can define validity of a block header, :

  • Gas used must be less than or equal to the gas limit:
  • must be constrained by
  • must be constrained by
  • Block number must be equal to parent block number incremented by 1:
  • The field must be 32 bytes at most in size:
  • The block's base gas fee must be determined as defined in :
  • Deprecate fields as defined in , and
  • must be determined Beacon Chain (consensus):

Combining all constraints defines formally in paper as:

And that wraps up the blocks! Up next we see Gas and Payment.