Date

May 6, 2022

Author

Joanne Fuller

Formal Verification of Ethereum 2.0 Part 1: Fixing the Array-Out-of-Bound Runtime Error

Formal Verification of Ethereum 2.0 Part 1 explores how to locate, understand, and fix a subtle array-out-of-bound runtime error within the state transition component.

Reading time

9 min. read

Image: Formal Verification of Ethereum 2.0 Part 1: Fixing the Array-Out-of-Bound Runtime Error

In part one of this series dedicated to Formal Verification of Ethereum 2.0, Joanne Fuller from the Consensys R&D illustrates how to locate, understand, and fix a subtle array-out-of-bound runtime error within the state transition component.

The Phase 0 Beacon Chain reference implementation describes the state machine that every Beacon node has to implement in the decentralised Ethereum 2.0 ecosystem. The reference implementation comprises three main sections:

1. State Transition,
2. Simple Serialize (SSZ) library, and
3. The Merkleise library.

The state transition is the most critical component and describes how the state is to be updated upon processing of a new block.

Below is research that supports how to locate, understand, and fix a subtle array-out-of-bound runtime error within the state transition component. This error is dependent on the number of active validators as summarised below and can therefore be fixed by imposing an upper bound on the number of active validators.

Number of active validators (V)	No array-out-of-bound error?

Table 1: Summary of results.

Detecting this error would be difficult using any other strategy, e.g. code peer review or testing. In contrast, using formal verification, we can detect and fix the error, we are able to mathematically and mechanically prove the correctness of the fix which goes beyond what testing techniques can do. It is a technical blog (written in a tutorial style) with the aim to allow readers, even those not completely familiar with formal verification, to be able to understand how these techniques are useful in the blockchain context.

The Project

The project to formally verify Phase 0 of the Ethereum 2.0 Beacon Chain reference implementation using the verification-friendly language Dafny was complete earlier this year.

The primary goal was to formally verify the absence of runtime errors (under/overflows, array-out-of-bounds, division by zero) and our secondary goal (not covered in this blog) to prove functional correctness.

For a more detailed description of the whole project, you can refer to our paper Formal Verification of the Ethereum 2.0 Beacon Chain. To see the full project code, please refer to our GitHub repository: https://github.com/Consensys/eth2.0-dafny.

An Overview of the State Transition

Let’s begin with a simplified view of the Beacon Chain system (Figure 1). Time is logically divided into epochs and each epoch is then divided into a fixed number of slots. Blocks are associated with a particular slot; however a slot may be empty and therefore no block will be processed. As shown in Figure 1, the arrival of a block causes the state to change, as does the passing of an epoch boundary (even if no further blocks have arrived). This view of the system is a simplified one; please note that the BeaconState is actually updated at every slot; but in the case of empty slots that aren’t at an epoch boundary, this update is simply an increment to the slot field.

Now let’s consider the state_transition; it governs how the BeaconState is updated. The state_transition is a complex procedure but at the highest level it comprises two distinct steps; process_slots and then process_block(Figure 2).

The state_transition procedure is triggered when a new block arrives for a given slot. The first step (process_slots) progresses the BeaconState through any empty slots such that the state can ‘catch up’ from the last recorded block/slot to the current one. The BeaconState is updated as each empty slot is processed (i.e. the slot field is incremented) and if the empty slot is an epoch boundary, additional processing occurs to implement, for example, justification and finalisation, the rewards and penalties, slashing, etc.

Once the state slot is equal to the slot of the block that has just arrived, the second step (process_block) can begin. Within the block processing section further complexity arises. In particular, the process_operations section contains many components such as processing the proposer and attester slashings, attestations, any new deposits and any voluntary exits; each component affecting its own subset of the BeaconState fields.

Using Formal Verification to find an Array-out-of-bound Runtime Error

We are now ready to look at the details of an array-out-of-bound runtime error discovered during our project. This error serves as an important example of the benefits of formal verification and in particular, demonstrates its power when dealing with complex calculations that span multiple levels of function calls and include less familiar operations such as floor division.

Where does the error occur?

The function get_attesting_indices is called within the process_rewards_and_penalties, process_justification_and_finalisation, and process_attestion components of the state_transition (Figure 3). In the final line of get_attesting_indices (Figure 4) the set of validator indices to be returned is determined by enumerating over committee such that committee[i] is included if bits[i] is set to 1.

The bits parameter is a Bitlist with a maximum length of MAX_VALIDATORS_PER_COMMITTEE (i.e. 2048). Hence, if

|committee| > MAX_VALIDATORS_PER_COMMITTEE

then an array-out-of-bound runtime error will occur.

Can this occur?

This is where we see the power of using a tool like Dafny. When this function was coded in Dafny the verifier automatically checked for runtime errors and in this case the verifier identified a potential array-out-of-bound error. (Note, I use the word 'potential' as there are instances where the verifier may not have enough information to determine correctness, or may 'time out' while trying to verify.) So we had to trace the problem and further understand the calculation of committee. The aim was to determine whether additional information (such as pre and post conditions) would facilitate verification, or whether a bug existed and if so, specifically under what circumstances would it be triggered.

Tracing the calculation

The committee variable is set by calling get_beacon_committee, which in turn calls compute_committee as shown in the Python code below (Figures 5 and 6).

Starting with the return statement in compute_committee we see a sequence with length end-start is returned, hence:

|compute_committee(indices, seed, index, count)|
= end-start
= (len(indices) * (index+1)) // count - (len(indices) * index) // count

When compute_committee is called from get_beacon_committee we see that:

indices = get_active_validator_indices
seed = get_seed
index = (slot % SLOTS_PER_EPOCH) * committees_per_slot + index
- Note that the index parameter in get_beacon_committee is different from the index parameter in compute_committee.
count = committees_per_slot * SLOTS_PER_EPOCH

Making these substitutions:

end = (len(get_active_validator_indices) * (((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)+1)) // (committees_per_slot * SLOTS_PER_EPOCH)
start = (len(get_active_validator_indices) * ((slot % SLOTS_PER_EPOCH) * committees_per_slot + index)) // (committees_per_slot * SLOTS_PER_EPOCH)

As committees_per_slot is determined by a call to get_committees_per_slot and is a function of the number of active validators:

committees_per_slot = max(1, min(MAX_COMMITTEES_PER_SLOT, len(get_active_validator_indices) // SLOTS_PER_EPOCH // TARGET_COMMITTEE_SIZE))
- It will be a value between 1 and 64 inclusive.

Substituting this formula for committees_per_slot into end and start we conclude that end-start is a function of:

len(get_active_validator_indices),
slot % SLOTS_PER_EPOCH where 0 <= slot % SLOTS_PER_EPOCH < 32, and
index where 0 <= index < committees_per_slot (i.e. the committee index parameter of get_beacon_committee).

Putting all of this together, when get_beacon_committee(state, data.slot, data.index) is called from get_attesting_indices for a state of a particular number of active validators, it can be done so for multiple (data.slot,data.index) combinations. Also:

as committees_per_slot depends on the number of active validators, the range of index also depends on the number of active validators; and
once the number of active validators is sufficiently large, there will be a maximum of 32 × 64 = 2048 of these (slot % SLOTS_PER_EPOCH,index) combinations to consider for a state with a particular number of active validators.

To determine which combinations of (len(get_active_validator_indices), slot % SLOTS_PER_EPOCH, index) cause the committee length to exceed MAX_VALIDATORS_PER_COMMITTEE we need to analyse the above formulas. The use of floor division makes this analysis more than simple algebra but we can use visualisations to help understand what is happening.

The relationship between number of active validators and committees_per_slot is shown in Figure 7. Notice that once there are 2¹⁸ = 262,144 or more active validators, committees_per_slot will reach its upper bound of 64.

The value of committees_per_slot then determines the number of (slot % SLOTS_PER_EPOCH,index) combinations to consider for a state with a particular number of active validators.

In Figure 8 we plot the number of (slot % SLOTS_PER_EPOCH,index) combinations, as well as how many of these input combinations will produce |committee| = end - start > MAX_VALIDATORS_PER_COMMITTEE. We observe that the number of (slot % SLOTS_PER_EPOCH,index) combinations reaches its maximum of 32 × 64 = 2048 at 2¹⁸=262,144 active validators.

We can also observe that there are zero (slot % SLOTS_PER_EPOCH,index) combinations where |committee| > MAX_VALIDATORS_PER_COMMITTEE until the number of active validators exceeds 2²² = 4,194,304; as shown in the section of graph reproduced in Figure 9.

Hence, once the number of active validators exceeds 2²² = 4,194,304 there are (slot % SLOTS_PER_EPOCH,index) combinations that cause |committee| > MAX_VALIDATORS_PER_COMMITTEE. Further, once the number of active validators reaches 2²² + 2¹¹ = 4,196,352 we can observe that all (slot % SLOTS_PER_EPOCH,index) combinations will result in the error.

Example: 4,194,355 active validators

When the number of active validators is 4,194,355, multiple (slot % SLOTS_PER_EPOCH,index) combinations result in the error. Figure 10 shows the particular combinations that result in the error (i.e. the red blocks).

Conclusion

This previously undocumented bug was difficult to detect. It required many hours of effort to model the dynamics of the problem; the analysis was quite complex due to the multiple interrelated parameter calculations, and the use of floored integer division. The ability of Dafny to initially detect the problem is significant and provides clear justification for the use of formal verification techniques in the blockchain context. The full analysis of this bug has been reported as an issue to the reference implementation github repository. The issue was confirmed by the reference implementation writers.