Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-2124: Fork identifier for chain compatibility checks #2125

Closed
Tracked by #5111
karalabe opened this issue Jun 19, 2019 · 15 comments
Closed
Tracked by #5111

EIP-2124: Fork identifier for chain compatibility checks #2125

karalabe opened this issue Jun 19, 2019 · 15 comments

Comments

@karalabe
Copy link
Member

karalabe commented Jun 19, 2019

Discussion thread for #2124.

Abstract

There are many public and private Ethereum networks, but the discovery protocol doesn't differentiate between them. The only way to check if a peer is good or bad (same chain or not), is to establish a TCP/IP connection, wrap it with RLPx cryptography, then execute an eth handshake. This is an extreme cost to bear if it turns out that the remote peer is on a different network and it's not even precise enough to differentiate Ethereum and Ethereum Classic. This cost is magnified for small networks, where a lot more trial and errors are needed to find good nodes.

Even if the peer is on the same chain, during non-controversial consensus upgrades, not everybody updates their nodes in time (developer nodes, leftovers, etc). These stale nodes put a meaningless burden on the peer-to-peer network, since they just latch on to good nodes, but don't accept upgraded blocks. This causes valuable peer slots and bandwidth to be lost until the stale nodes finally update. This is a serious issue for test networks, where leftovers can linger for months.

This EIP proposes a new identity scheme to both precisely and concisely summarize the chain's current status (genesis and all applied forks). The conciseness is particularly important to make the identity useful across datagram protocols too. The EIP solves a number of issues:

  • If two nodes are on different networks, they should never even consider connecting.
  • If a hard fork passes, upgraded nodes should reject non-upgraded ones, but NOT before.
  • If two chains share the same genesis, but not forks (ETH / ETC), they should reject each other.

This EIP does not attempt to solve the clean separation of 3-way-forks! If at the same future block number, the network splits into three (non-fork, fork-A and fork-B), separating the forkers from each another will need case-by-case special handling. Not handling this keeps the proposal pragmatic, simple and also avoids making it too easy to fork off mainnet.

To keep the scope limited, this EIP only defines the identity scheme and validation rules. The same scheme and algorithm can be embedded into various networking protocols, allowing both the eth/6x handshake to be more precise (Ethereum vs. Ethereum Classic); as well as the discovery to be more useful (reject surely peers without ever connecting).

@karalabe karalabe changed the title EIP-2124: Zero RTT netsplit on chain mismatch EIP-2124: Zero RTT chain compatibility check Jun 20, 2019
@karalabe karalabe changed the title EIP-2124: Zero RTT chain compatibility check EIP-2124: Fork identifier for chain compatibility checks Jul 3, 2019
@ritzdorf
Copy link

ritzdorf commented Oct 6, 2019

Hi @karalabe ,

very interesting proposal. I had a look after it was mentioned in the allcoredevs call.

I have a question:
Assuming that a fork is happening at block 100, when is this fork considered for the FORK_HASH?

  • After the local block number is >= 100
    or
  • After the local block number is >= 99?

I assume that you intended the first case. Then, it seems to me that the edge case of Istanbul-Ropsten case is not properly covered. At block B-1 the situation would be:

Updated nodes would have:
Past Forks: A
Future Forks: B

Stale Nodes would have:
FORK_HASH: A
FORK_NEXT:

  • The mining power of the updated nodes is very weak, hence they stall at B-1.
  • The stale nodes proceed and hence reach B, B+1, B+2...
  • All the updated nodes will still connect to the stale nodes.

I see two possible solutions X and Y:

X: Considering the fork for the FORK_HASH on the block before.
In the Istanbul-Ropsten case this would mean at block B-1:
Updated nodes would have:
Past Forks: A,B
Future Forks:
and hence would no longer connect to stale nodes.

Y: Instead of sending FORK_NEXT, the current block number (BLOCK_NUM) is sent. As FORK_NEXT is used to tell "whether a remote node is out of sync or whether its software is stale.", this seems easier to do with a block number. New decision tree:

FORK_HASH matches local past forks:

  • No local future fork
    • connect
  • Local future fork > remote BLOCK_NUM
    • connect
  • Local future fork <= remote BLOCK_NUM
    • don't connect (this is the improvement over your proposal, it detects remote stale nodes on a fork with mining power)
      FORK_HASH is subset of local post forks:
  • remote BLOCK_NUM >= last local past fork
  • don't connect
  • remote BLOCK_NUM < last local past fork
  • connect (here it potentially doesn't detect stale nodes on a fork with 0 mining power)
    FORK_HASH is a superset of local past forks:
  • (local past forks + subset S of local future forks) matches FORK_HASH:
    • Local future fork not in S > remote BLOCK_NUM:
      • don't connect (Example: a freshly started ropsten node connecting to a stale ropsten node)
    • else
      • connect
  • (local past forks + subset of local future forks) never matches FORK_HASH:
    • don't connect
      Else: (e.g. different genesis)
  • don't connect

I can also transform it into a table if you prefer that. There are a few cases that are different.

See you in Osaka

@ritzdorf
Copy link

Hi @karalabe,

as discussed in Osaka BLOCK_NUM might not be ideal. So disregard my proposal from above.

Quick Question

Hence, I still have the following question:
Assuming that a fork is happening at block 100, when is this fork considered for the FORK_HASH?

  • After the local block number is >= 100
    or
  • After the local block number is >= 99?

I assume that you intended the first case. Then, it seems to me that the edge case of Istanbul-Ropsten case is not properly covered. At block B-1 the situation would be:

Updated nodes would have:
Past Forks: A
Future Forks: B

Stale Nodes would have:
FORK_HASH: A
FORK_NEXT:

  • The mining power of the updated nodes is very weak, hence they stall at B-1.
  • The stale nodes proceed and hence reach B, B+1, B+2...
  • All the updated nodes will still connect to the stale nodes.

Of course, as soon as the updated nodes find a single block (and manage to distribute this block), the issue is resolved. Is this sufficient from your point of view?

Comments on the Table

I found the following entries missing inside the table, even though they are probably clear.

Past forks Future forks Remote FORK_HASH Remote FORK_NEXT Connect Reason
A B,C1 A+B C2 Yes Local out of sync, differing future forks, but those are uncertain.
A B,C A B Yes Possibly out of sync. Possibly stale.
A B A+B C Yes Local out of sync.
A B1 A+B2 No Someone needs software update.
A B No Different genesis blocks

Here are the cases that are not clear to me, they are taken from your table:

Past forks Future forks Remote FORK_HASH Remote FORK_NEXT Connect Reason
A A B Yes? What if B < local block number. Then, we know that one client is stale.
A B1 A B2 Yes? What if B2 < local block number. Then, we know that one client is stale.

@karalabe
Copy link
Member Author

@ritzdorf This is a very nice corner case indeed, thank you!

In your expanded example with Ropsten Istanbul, getting stuck on fork-1 block indeed is nasty and does have the side effect of not detecting the split until the forked chain passes that barrier too. Your solution of marking fork-1 as "already forked" and including it in FORK_HASH would potentially solve it, but I find it a bit odd. It also doesn't address your second 2 valid questions.

Wrt the second two cases you brought up, I think both are valid corner cases. If we would have the first special case of your table handled correctly, that would also solve the Ropsten Istanbul splitting correctly, because non-updaters would have split off from not-yet-forked Istanbul nodes, even though the Istanbul ones would not detect the issue. This case however cannot be solved by the more complex FORK_HASH modification.

Perhaps a better alternative solution would be to extend the validation code a bit. The fork ID fields would remain the same as is now, but the validator would also take into consideration the local chain head. Then:

  • Your extended issue would become equivalent to the first special case from the table. The non-forking nodes would detect that they are past the Istanbul FORK_NEXT, and thus reject Istanbul peers. It's a one way street (i.e. Istanbul peers wouldn't reject non-Istanbul ones), but this special case should exist for only a single-block window, so I think it's not thatbad. At least as long as the non-forking chain also progresses (interesting issue for Clique, ha, need to consider this too).
  • The second special case is also covered by the local head comparison the same way.

Does this make sense to you? (still need to figure out Clique, might need that initial solution too after all)

@ritzdorf
Copy link

Thanks for your feedback @karalabe.

Your solution of marking fork-1 as "already forked" and including it in FORK_HASH would potentially solve it, but I find it a bit odd.

I agree that it is odd. It was more like a brainstormed suggestion.

Does this make sense to you? (still need to figure out Clique, might need that initial solution too after all)

Yes. Thank you. That is roughly what I had in mind. Sorry for not writing it more clearly.

(still need to figure out Clique)

With Clique do you mean a set of non-forking nodes that are isolated from the non-forking miners?

  1. Wouldn't they eventually find the non-forking miners and thereby resolve this?
  2. As soon as the forking nodes pass the fork block they anyway wouldn't connect to the Clique any longer, so it is a case that only exists for a single-block window, right?

@djrtwo
Copy link
Contributor

djrtwo commented Feb 10, 2020

taking the first 4 bytes of a Keccak256 hash (seems odd)

Can you explain a bit more why this "seems odd"?

Taking a slice of bytes from a cryptographic hash function is a safe and reasonable way to get well-distributed bytes on some domain.

Agreed CRC32 is fine for the task, but avoiding additional dependencies is nice.

@karalabe
Copy link
Member Author

This ship sailed long ago :)

That said, I wrote that it seems odd because we're discarding part of the checksum vs. crc32 would keep all the entropy.

@djrtwo
Copy link
Contributor

djrtwo commented Feb 13, 2020

This ship sailed long ago :)

No worries! Just chiming in while going over the specs

@ethereum ethereum deleted a comment from mikyR Feb 7, 2021
@poojaranjan
Copy link
Contributor

Watch an overview of the 'fork identifier', its purpose, and usage for the Ethereum chain in syncing nodes by @fjl
https://www.youtube.com/watch?v=2Yg-MX0ubJQ

@fulldecent
Copy link
Contributor

  • Can this please be implemented in two or more competing implementations before this is published as final status?

@karalabe
Copy link
Member Author

karalabe commented May 14, 2021 via email

@fulldecent
Copy link
Contributor

fulldecent commented May 14, 2021

Thank you. Can specific clients (at least some) please be noted in the EIP text?

@MicahZoltu
Copy link
Contributor

EIPs should not be linking out to client implementations. See the template at https://raw.githubusercontent.com/ethereum/EIPs/master/eip-template.md and note that there is no longer an Implementations section, there is now only a reference implementation section which is intended to contain an inline implementation and it is optional.

@fulldecent
Copy link
Contributor

It is okay that EIP reviews (which are unpaid, I'm assuming) do not accept a responsibility of verifying technical aspects of EIPs.

But if EIPs are to be anything other than a pastebin, then any interop specifications (i.e. all EIPs) should include multiple reference implementation. And they should be verified by the community (e.g. the Last Call process).

@karalabe
Copy link
Member Author

karalabe commented May 16, 2021 via email

@MicahZoltu
Copy link
Contributor

Closing this for housekeeping purposes. Feel free to continue using this issue for discussion about EIP-2124.

@ethereum ethereum locked and limited conversation to collaborators Feb 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants
@axic @karalabe @fulldecent @MicahZoltu @djrtwo @ritzdorf @poojaranjan and others