Polkadot nodes failed to build block 5202216 on 24.05.2021, due to an out of memory (OOM) problem. The runtime (i.e. the blockchain’s state transition mechanism) crashed, not the nodes themselves. Polkadot’s runtime is written in WebAssembly and may be run by a Wasm interpreter or compiler. However, a set amount of memory is always provided as part of the runtime execution environment (64MB at the moment). This also wasn’t enough for this block.
This block was the final block of the era’s penultimate session. It requires the election of a new validator set for the new era that would begin after the next session. The validator set can be chosen off-chain or on-chain, although off-chain is recommended because the election procedure is computationally intensive. However, no validator provided a solution for this session. Probably because they encountered the same OOM while completing the election off-chain. Thus, it had to be done on-chain. The result is the OOM that all validators encountered while attempting to author this block. The solution to the OOM was simple: simply raise the Wasm runtime’s default memory size to 128MB.
Releasing a New Update
To apply this change to all validators, a new release would be required, as well as the updating of a large number of validators. In the short term, however, there was a far easier answer to this difficulty (and most importantly faster to deploy). Polkadot’s runtime builds not only to Wasm but also to native code for greater performance, and the native runtime, most crucially, does not impose any memory limits during execution. However, when the executing node is from the same version as the on-chain runtime, the native runtime matches it. At this moment, the on-chain runtime matched the v0.8.30 release, which released on April 8, 2021. There have already been three fresh releases since then.
To get around the problem as quickly as possible, all validator operators had the instructions to downgrade their validators to v0.8.30 and execute them with the ‘—execution native’ parameter to force the use of the native runtime. Overall, it took around 1 hour and 10 minutes to identify the problem, devise a short-term solution, notifies validators, and have new blocks generated, allowing the network to fully recover.
The Compiler’s Possibility to Generate Defective Code
Importing a block does not lead to the same storage root the block author stated, which is the storage root mismatch. In principle, the same input should always result in the same output in a blockchain. However, because we had told all validators to run with the native runtime, the network was still operating and constructing blocks, which could only mean that there was non-determinism between the native and Wasm runtimes.
This led to the hypothesis that the rust compiler may have generated defective code, resulting in the mismatch seen. Someone at Parity, by chance, still had a binary of this release laying around that wasn’t the same as the one attached to the github release. This binary successfully synchronized the chain with the native runtime. The only difference between this binary and the previous one was using the version of the rust compiler. So it’s possible that something changed between the most recent compiler version and the one used to create the node back then. Yes, the node was able to sync successfully after downgrading the rust compiler and rebuilding the release branch.
Synchronizing with Wasm
Compiling the rust compiler without this patch and compiling the node with the self-built compiler revealed that the native runtime supplied the right data and the team was able to sync the chain. When there are several matches, the commit altered the binary search by function so that it might return a different index. Using this function in the runtime can result in a small reordering of the data saved in the state in. Therefore, it results to a different storage root.
As a result, the native runtime blocks could not be in synchrony with the Wasm runtime. Also, updating the on-chain Wasm runtime to fix this is difficult. Changing the history of the blockchain without forking works. Then there’s the pull request that adds ‘code substitute’ to the chain specification. It mostly holds the chain’s origin and other data. The new field ‘code substitute’ is a map that maps to a Wasm runtime code blob and utilizes a block hash as the key. It tells the node to replace the on-chain Wasm runtime with the given one from every block following the one specified in the chain specification until the spec version of the runtime doesn’t already match.
The Enhancements of the Current Condition in the Future:
- The native runtime’s deprecation will now be a priority much more aggressively. The same performance with the Wasm compiler Wasmtime can also be with the achievable native runtime. Therefore we no longer need the native optimization. Especially considering all of the potential drawbacks.
- The allocator’s modification is to provide a considerably more flexible resource allocation. This means this won’t cap the maximum allocation at 128MB and will most likely support Wasm’s maximum (4GB).
- On-chain elections will no longer be possible; elections must now take place off-chain and submit to the runtime.
- The off-chain worker will consume a higher memory limit than the on-chain Wasm runtime execution until the allocator upgrades. This should ensure that off-chain elections do not run out of memory and may submit properly.
- For the time being, with a native and Wasm runtime, it will ensure that the native and Wasm builds use the same compiler version. This should prevent issues arising from the use of different toolchain versions.
“The views and opinions on this Crypto News Website are solely those of the authors and contributors. These views and opinions do not necessarily represent those of iBaseTrading or its partners.”