by Pawel Bylica @chfast, Andrei Maiboroda @gumb0, and Alex Beregszaszi @axic.
This is a short analysis of certain properties of EIP-2315: Simple Subroutines for the EVM (version as of 2020-04-16).
We have focused on both usability from compilers point of view and the effect on EVM implementations.
First we discuss two problems, followed by two relevant remarks, which could benefit from the solutions to the problems.
Fallthrough to next subroutine
Problem
During instruction-by-instruction execution when BEGINSUB
is encountered the execution continues (BEGINSUB
works as no-op). I.e. execution falls through from one subroutine to the next one.
PUSH 0
CALLDATALOAD
PUSH 32
CALLDATALOAD
JUMPSUB $sub1
PUSH 0
SSTORE
BEGINSUB $sub1
ADD
BEGINSUB $sub2
PUSH 1
ADD
RETURNSUB
This feature is not very useful for code generators (Solidity, EVM LLVM) therefore it will not be used. But an EVM implementation has to support it. And this behavior requires additional testing.
This is especially problematic when translating EVM to more structured representation (e.g. LLVM IR). Here we will represent the above example with C-like pseudo-code (as a substitute for LLVM IR functions):
void entry(char* calldata)
{
int depth = 0;
Stack stack;
stack.push(uint256(calldata[0]));
stack.push(uint256(calldata[32]));
// Call the subroutine with bumped depth.
sub1(&stack, depth + 1);
sstore(0, stack.pop());
}
void sub1(Stack* stack, int depth)
{
if (depth == 1023)
abort();
uint256 a = stack.pop();
uint256 b = stack.pop();
uint256 result = a + b;
stack.push(result);
// Because the sub1 does not end with RETURNSUB
// or any other terminating instruction
// the next subroutine must be called at the end
// to emulate subroutine fallthrough behavior.
// Depth is not bumped though because this is not `JUMPSUB`.
// Therefore, we can run out of system call stack space.
sub2(stack, depth);
}
void sub2(Stack* stack, int depth)
{
if (depth == 1023)
abort();
uint256 a = stack.pop();
uint256 result = a + 1;
stack.push(result);
}
Solution
Change the specification in a way that BEGINSUB
can only be reached via JUMPSUB
. Specifically:
-
Execution of
BEGINSUB
causes exception (OOG: all gas consumed) and terminates execution. This wayBEGINSUB
behaves likeINVALID
(aka0xfe
) and it should never be executed in a well-formed EVM program. -
JUMPSUB
sets thepc
tolocation + 1
(As opposed tolocation
in the current spec). -
In the edge case when
BEGINSUB
is the last instruction in code and this subroutine is jumped-to, the implementations should executeSTOP
. This is consistent with the other similar case of returning from a subroutine jumped-to from theJUMPSUB
being the last instruction in code.
JUMPs across subroutines
Problem
The EIP intentionally does not modify the semantics of JUMP
/JUMPI
s. They are still only restricted to targeting JUMPDEST
s. It is allowed to jump from any point in one subroutine to any JUMPDEST
-marked point in another subroutine.
Current EIP properly specifies what would happen in each of these cases, but this feature is not practically useful (code generators are unlikely to use it). And it creates a family of edge cases which are needed to be properly covered with tests.
It seems very impractical to translate such EVM program to C / LLVM IR. The only partial support may be possible by using setjmp
/ longjmp
. Because of the complexity of such translation C examples are not provided in this section.
- Example: JUMP to the middle of subroutine
JUMP $middle
BEGINSUB
DUP1
JUMPDEST $middle
RETURNSUB # Causes exception as return_stack is empty.
- Example: JUMP between subroutines
JUMPSUB $sub1
BEGINSUB $sub1
JUMP $middle
RETURNSUB
BEGINSUB $sub2
NOT
JUMPDEST $middle
RETURNSUB # Returns to the "main" code after JUMPSUB.
Solution
- Use
BEGINSUB
as strict subroutine boundaries.
1.1. There exists a “main” subroutine starting at PC 0 withoutBEGINSUB
. See Main (subroutine) section.
1.2. EveryBEGINSUB
opcode position in the code marks the beginning of a new subroutine and the ending of the “previous” subroutine.
1.3. This implies that there’s no code outside of subroutines (all the code is either in explicit subroutines or in the implicit “main” subroutine) - When collecting valid
JUMPDEST
locations (must be done before execution) assign them to subroutines. Instead of having flat list as before, we get 2-level collection grouped by subroutines. - During execution keep information which subroutine is currently being executed.
- During execution when validating a jump target only consider
JUMPDEST
s from the list of the current subroutine.
Example
00 PUSH 0
02 CALLDATALOAD
03 PUSH 32
05 JUMPDEST
06 CALLDATALOAD
07 JUMPSUB $sub1
08 JUMPDEST
09 PUSH 0
0a SSTORE
0b BEGINSUB $sub1
0c JUMPDEST
0d ADD
0e JUMPDEST
0f RETURNSUB
10 BEGINSUB $sub2
11 PUSH 1
13 ADD
14 JUMPDEST
15 RETURNSUB
-
Before:
JUMPDEST
s:(05, 08, 0c, 0e, 14)
-
After:
-
"main" @ 00
:(05, 08)
-
"sub1" @ 0b
:(0c, 0e)
-
"sub2" @ 0f
:(14)
-
Benefits
-
When checking if a jump destination is valid search is performed on a smaller collection (limited to current subroutine scope only).
-
JUMPDEST
analysis (collecting valid jump destinations) can be performed lazily per subroutine. Optimized EVM implementation may perform advanced code analysis and/or translation beyond collectingJUMPDEST
offsets. -
May help with code merklization.
With properly isolated subroutines it is possible to collect a list of subroutines used in a given transaction and it is easy to include in a witness those only. The other two options currently which exist for merklization: a) partition byJUMPDEST
s; b) chunk code by even-sized parts, which needs offsets in case of splitting through aPUSH
opcode.
With the original EIP, it seems the gains of merklization-by-subroutine is restricted or even diminished by the ability to jump around. -
Having subroutine-level “pure” control flow (no magic jumps between subroutines) helps performing static analysis.
-
Limits complexity of test cases.
In the original EIP, test cases composed of multiple steps are required. Examples of two-step test cases: (jump cross subroutine,RETURNSUB
reached causing exception), (jump cross subroutine,RETURNSUB
returns successfully). Test cases composed of 3+ steps may be also needed.
This is not required in the version with restricted jumps because it is only needed to check if execution terminates with exception on “jump cross subroutine”.
Potential issues caused by the change
-
Jumps across subroutines allow implementing tail subroutine calls.
- This is not very elegant. There are two distinct kinds of jumping targets:
JUMPDEST
andBEGINSUB
. However this encourages to also useJUMP
s to perform a kind of subroutine call. -
Suggestion: If tail subroutine calls are desired, let’s introduce a
TAILJUMPSUB
instruction.
- This is not very elegant. There are two distinct kinds of jumping targets:
-
This restriction can potentially break existing contracts. E.g.
JUMP $skip # Not allowed as $skip JUMPDEST belongs to other subroutine. BEGINSUB # Previously invalid instruction, but never executed. JUMPDEST $skip
-
Suggestion: analyze existing contracts whether this would cause an issue (e.g. search for the
BEGINSUB
opcode used in existing contracts)
-
Suggestion: analyze existing contracts whether this would cause an issue (e.g. search for the
Remark: example implementation of JUMPDEST analysis
Prior to this EIP, evmone stores JUMPDEST
's code offset in an array. They are already ordered so binary search is used to validate a jump destination.
After the change we can still keep JUMPDEST
s as an array. For subroutines we additionally have to collect ranges of JUMPDEST
s - it is enough to keep 2 pointers: to the begin/end JUMPDEST
's entry in the array. To validate a jump destination do binary search of the current subroutine JUMPDEST
's range.
Main (subroutine)
This section introduces the concept of “main” subroutine as a result of the EIP-2315 “Subroutines” with our two proposed changes.
Execution of an EVM bytecode starts at PC=0 (the “first” byte). We propose that code segment starting as PC=0 and ending just before first BEGINSUB
(or end-of-code) is now called the “main” subroutine. The following restrictions apply to the “main” subroutine":
- The
return_stack
remains empty (i.e. is not populated with the “main” subroutine’s caller return address). - Therefore,
RETURNSUB
is invalid inside “main” subroutine. - Since the “main” subroutine does not start with a
BEGINSUB
, it cannot be called from any other subroutine (including “main” subroutine itself). - The “main” subroutine defines the scope for
JUMP
/JUMPDEST
validity - jumps within “main” subroutine are valid as in any other subroutine.
Comparison with POSIX
We can now compare an EVM bytecode to a "POSIX binary":// alias STOP = RETURN(0, 0);
// An example subroutine.
int doSomething() {
if (random() & 1)
return 1; // => RETURNSUB
else
exit(0); // => STOP
}
// The C language main function.
int main() {
if (doSomething())
exit(1) // => RETURN(0, 1)
// This is not be allowed in EVM, as RETURNSUB is invalid in "main" subroutine.
return 0; // => RETURNSUB
}
// The POSIX start fuction - entry point invoked by the operating system.
void start() {
int ret = main();
exit(ret);
}
Since most current EVM programs start with what resembles "main" above (because they have to ability to "exit" by reaching the end), we don't want to consider and introduce "start" into EVM.
Benefits
We believe this rather small semantical distinction of “main” subroutine provides a cohesive and structured design, and will prove beneficial in the future.
One potential use case could be in "account abstraction" (AA), where instead of paying with supplied gas or Ether token, there needs to exist some other mean to pay the "miner". It is wished this would exists via executing code, which pays the miner.Making this possible with main:
- Disallow
JUMP
/JUMPI
in main - The miner has to be paid before the first
JUMPSUB
- Restrict the number of instructions allowed (and terminate if exceeded)
“BEGINDATA”
A lot of EVM programs contain data. The most prevalent example is constructor bytecode containing the runtime bytecode at the very end. Unfortunately without structure, currently the “data” has to be also analyzed for JUMPDEST
s.
Some proposals were made to alleviate this by introducing a BEGINDATA
opcode (see EIP-615 and EIP-2327).
We however think the above changes could also be used to provide an alternative to BEGINDATA
by placing a unreferenced BEGINSUB
prior to the data.
Downside: this looks like an abuse of BEGINSUB
, but so far the authors have not found any problem with it.
# The "main" subroutine: the contract constructor.
COPYCODE($data+1, S)
RETURN(0, S)
# The unreferenced subroutine with data.
BEGINSUB $data
...