Nethermind’s full pruning is here — Cutting the Gordian Knot

Ismael Darwish
Nethermind.eth
Published in
8 min readMar 11, 2022

--

By Ismael Darwish.
Special thanks to
Łukasz Rozmej for leading the project and providing feedback for this article.

Table of contents:

  • Introduction
  • Pruning — Why was it so difficult in the first place?
    - Key-value database
    - Merkle tree
    - State change
  • Proposed solutions
    - Reference counting
    - In-memory pruning
    - Full pruning
    - Other & experimental solutions
  • Nethermind’s full pruning
  • How to enable it
  • What’s next?

Introduction

Ethereum’s state is growing rapidly. More than 10TB of storage is needed today to run a full archive node — one that stores all the state since genesis. Users can choose to run a pruned node instead, which keeps the state of the blockchain in the past 64 to 128 blocks, which comes at about 100Gb today. This greatly lowers the barrier of entry for anyone wanting to run a node and contribute to the chain but poses a big challenge for node developers.

In order to only keep the state of the last 128 blocks, the older state needs to be removed in a process we call pruning, and given the peculiarities of how data is stored in Ethereum, this is not an easy task. Over the years, many solutions have been proposed and implemented, accompanied by drawbacks that make it impossible to automate the process of pruning the database while still running the node.

Our core team at Nethermind has managed to cut the Gordian knot, and our client can now perform full pruning, keeping the size of the database only as big as completely necessary, all while running the node in full capacity.

But before jumping to the solution, let’s clarify the challenge.

Pruning — Why was it so difficult in the first place?

Let’s first recap how nodes store data.

Key-value database

The majority of Ethereum clients use either LevelDB or RocksDB to store data. These are particularly optimized database solutions that can only store key-value pairs of arbitrary data in a very efficient way. The drawbacks are that developers can’t perform advanced queries on the database, and the only allowed operations are add(key, value), get(key), and delete(key)

Modified Merkle-Patricia trie

A Merkle-Patricia tree is used in order to be able to verify the inclusion of certain data in O(log n) time. A hash of the state of each account in Ethereum is included in the leaf nodes, and bundles of hashes of leaf nodes are included in higher-level nodes until the root of the tree is reached.

Each node’s hash is included in the key-value database as the key and all the content of the node as the value.

To find the state of an account in Ethereum, a user has to traverse the tree from the top. Let’s have account “96” as an example.

  • First, we need to retrieve the content of the root node, knowing first its hash 0xa44ef2… and looking it up on the key-value table
  • In the root node’s content, we can see that accounts starting with 9 are included in a node with hash 0xc71b3…
  • We retrieve that node, looking it up on the key-value table.
  • We see that account followed by a 6 has a hash of 0x3a1b8…
  • We retrieve that node from the key-value table and we get to the content of that account

This is a simplified mechanism of how the state is retrieved. But this is just a snapshot in time. Let’s see how the database changes when a new transaction comes in.

State change

Assume, given the previous state, that account 96 receives some ETH. This changes the balance of that account and in turn, changes its state. You can see in the picture that the hash of the node for account 96 changes, which makes every node in higher levels change as well. The key-value table grows as these new hashes and nodes are included as new entries. Note that the hashes for the previous states of those nodes are still kept in the database.

But why not delete them and keep the database small? Ethereum nodes are required to keep the state of the last 64 to 128 blocks, so they can’t be deleted right away.

And what about after those blocks? Turns out here’s where the issue lies. It’s very difficult, looking at the key-value table, to know which entries are being used in the last 50+ blocks and which aren’t.

Proposed solutions

Some solutions have been proposed over the years to keep the database slim.

Reference counting

As early as 2015 Vitalik proposed a method for database pruning that used reference counting. It worked by keeping track of the nodes that were being outdated and the block they were last used in in a separate “death row” database. The pruning method would work by retrieving the nodes from the death row database older than a certain block and deleting them from the death row database as well as the LevelDB table.

This technique was pretty valid back then when a full node required 1.3Gb of storage, but given today’s need of more than 100Gb, it’s not feasible. It can’t be done in memory and writing and reading from disk would be too slow and too big to manage efficiently.

In-memory pruning

This is a method used by major Ethereum clients today. Instead of writing every new trie node into the database, nodes from recent blocks are kept in memory. While nodes are there, they are cheap to reference count and garbage collect. Clients keep these nodes for as long as possible in hopes that a future block will make it obsolete, saving a database write.

This solution doesn’t solve the issue of unnecessary data being stored, but greatly reduces the rate at which the database grows.

Full pruning

Today, clients like geth offer the functionality of offline pruning. By stopping the client, node operators can run a task that trims the database to the strictly necessary size. This is the only viable solution today, and it requires stopping the node for at least a few hours.

Current methods are not good enough for node operators. There should be a way to trim the database without stopping the node. Some complex solutions are being theorized, but up until recently no implementation has been successful

Other & experimental solutions

Besu has been developing a new storage model named “Bonsai Tries”, where nodes are stored by their location in the trie instead of the hash of the node. It’s still under their experimental features, but it’s a very promising solution. Erigon uses a similar approach, dealing with intermediate hashes and state hashes separately.

Nethermind’s full pruning

Thanks to the unique database architecture of the Nethermind client, our core developers have come up with the solution.

Let’s start by clarifying the main difference between the Nethermind client’s database and others’. Earlier we talked about key-value databases, the solution that most Ethereum clients use. While other clients keep all their data in a single database, Nethermind keeps separate databases for different parts of the blockchain like state, headers, transactions, and receipts.

Taking advantage of this structure, our pruning mechanism doesn’t require an Ethereum node to be stopped. Here’s how it works:

  1. When a Nethermind client starts the pruning task, it takes the current block’s root and starts generating a brand new state trie database file that only contains the nodes for that specific block. This is a parallel task that runs alongside the normal mining operations of the node.
  2. While this is happening, new blocks are included. Both the old and the new state trie databases are updated with new blocks, even when in-memory pruning is switched on.
  3. When the new state trie database finishes copying all the data, the old state trie database file is deleted and the client starts using the new database by default.

The nodes we want to get rid of belong to the state trie. In other clients, keeping track of these nodes in the database and deleting them one by one is a costly operation. Nethermind’s solution circumvents this challenge by deleting the whole database at once, which requires way fewer resources. This is possible due to the fact that all the state nodes are kept in a dedicated database, as opposed to having state, transactions, and receipts in the same one.

Although this is a great solution, there are some things to keep in mind:

  • Creates a lot of disk writes: Creating a new database and copying all the nodes can be taxing for an SSD if it’s performed often. SSDs have limited write capacity. In modern 1TB drives, this limit is around 600TBW (terabytes written). Our pruning mechanism today creates at least 100Gb, so it’s definitely something to be aware of. The recommended frequency to run this pruning task is every few weeks.
  • It’s a long operation: Running the pruning task requires a lot of I/O and CPU resources. The time to run the task can range from 6 hours on powerful machines to more than 24 hours on machines with slow I/O and CPU.

How to enable it

For node operators, the new full pruning mechanism can be enabled from Nethermind version 1.12.5 by adding the following to the configuration file:

"Pruning": {"Mode":"Full",},

By default, it will use the number of threads of your machine. This can be overwritten manually with the parameter FullPruningMaxDegreeOfParallelism.

Full pruning can be invoked by the new JSON RPC method admin_prune. Please note that this method, like others from the admin module, shouldn’t be publicly exposed.

There are also other experimental triggers configurable by adding FullPruningTrigger and FullPruningThresholdMb values in the configuration file, where you can run full pruning when the state database size exceeds the threshold or volume free space gets too low.

In addition to this, there is also the more experimental Hybrid method, which combines full pruning with in-memory pruning. On critical nodes we encourage you to run in ‘Memory’ pruning mode, switching to ‘Full’, invoking pruning and after it finishes switching back to ‘Memory’ mode.

What’s next?

This new pruning mechanism greatly improves the experience of running a node, solving an issue that has impacted node operators since the beginning of Ethereum, at the expense of more resources from the machine running it.

Nethermind’s core developers know there is an even better solution and there’s ongoing research around the topic. We are experimenting with different storage models that in a few months will make tree pruning as it is today completely obsolete and will keep the storage requirements flat.

If you’re interested in solving Ethereum’s hard challenges like this, we’re always looking for passionate people to join us at Nethermind. Drop us an email with your CV at talent@nethermind.io

--

--