evmone speedups #11

zac-williamson · 2019-04-26T02:43:58Z

I've been tinkering around with your excellent evm interpreter, and have made a few tweaks and additions that speed up the benchmarks by 1.25x - 3x. Perhaps they would be of interest? This branch has the following changes/additions:

I changed the main loop to a direct threaded model that uses computed gotos. Partially to remove the overheads of function calls, partially to remove some conditional branches (e.g. the 'is this a basic block' check can be removed, more on that below)
I added a global memory manager that provides pointers to pre-allocated, zeroed out blocks of memory. The amount of allocated memory is large enough that any out-of-bounds access would trigger an out of gas error (current gas limit caps memory to ~8MB, and when a transaction creates/calls a contract, unused memory is freed to prevent excessive memory consumption). The upside of this is that all of the overheads from updating memory pages during interpretation are removed.
Changed the stack to a fixed-size array that is indexed by a pointer. Changes to the stack size will change the position of the pointer, instead of the size of the array. Some marginal speedups from not having to implicitly track the size of the stack (stack over/underflows are checked against the memory addresses that define the start and end of the stack)
Jump destinations are found by indexing a sparse array, instead of a search. I did a bit of profiling with valgrind and this doesn't seem to noticeably increase the # of cache misses
The main loop doesn't perform a conditional branch to check whether the instruction is at the start of a basic block. Instead, these instructions are indexed with a different jump label to normal opcodes
For conditional branches that perform error checking (e.g. gas accounting), I've made liberal use of __builtin_expect to define the happy path to be the branch that does not lead to an error state

The results have been quite interesting. On my machine (8th gen i7), the sha1_divs benchmark is the weakest performer, with only a ~25% speed increase. The blake2b benchmark, on the other hand, is ~3x faster at the top end, chewing through 3.56 billion gas per second for the blake2b_shifts/65536 benchmark.

If you would like to integrate these changes into evmone, I'd be happy to fix up anything that you think needs attention.

memory::allocate_memory returns uint8_t* instead of std::pair

added more memory tests stack over/underflow checked against explicit variables fixed bug where out of bounds memory indices were being written to

state.stack is no longer initialized by default removed a conditional branch in analyze loop added memory.cpp

push data converted in analysis stage

chfast · 2019-04-26T07:28:12Z

Hey, this looks great! I didn't expect this happen so quickly. I will definitely integrate your changes, but it will probably take me some time.

To clarify one thing: there is some code duplication in the implementation (especially around calls). It was done on purpose because what I want to focus on now is add more unit tests to have full code coverage.

gcolvin · 2019-04-26T16:00:00Z

Nice, @zac-williamson. The first four were on my to-do-if-I-ever-had-time list. I think Geth handles the fourth one with a small (<=4K) bitmap, if I understand you.

gcolvin · 2019-04-26T16:03:15Z

I think the one optimization I still saw to do (and maybe it's in there but I missed it) is to pre-swap the PUSH constants.

zac-williamson · 2019-04-29T14:56:52Z

I think the one optimization I still saw to do (and maybe it's in there but I missed it) is to pre-swap the PUSH constants.

Heya, thanks for the feedback! I did squeeze that optimization in too, should have documented it. It's in analysis.cpp, but is a wee bit of a cludge because I wanted to weasel out of declaring the program stack to be of type uint256 to avoid the default constructor being called on evey stack element.

gcolvin

I see a possible DoS vulnerabilities, and fixing it might be a performance improvement.

gcolvin · 2019-05-01T16:56:24Z

lib/evmone/analysis.hpp

-    ///
-    /// The deque container is used because pointers to its elements are not
-    /// invalidated when the container grows.
-    std::deque<bytes32> args_storage;


How big can this container get? Just the amount of push data in the program? Could a flat array be used to avoid the overhead (and possible DoS vulnerability) of intermittently growing the deque? Ot is setting aside that much memory, potentially almost the whole program, not worth it?

chfast · 2019-06-23T13:55:41Z

The 0.1 version of evmone has been tagged to be used as the base line for future optimizations.
I also described a first one in #72.

chfast · 2019-07-24T15:28:16Z

@zac-williamson I'd like discuss some changes you have here.

chfast · 2022-08-20T07:45:12Z

After all these years, I'm finally getting threaded code support: #495.

zac-williamson added 22 commits April 14, 2019 23:01

added support for computed gotos

1bbf605

removed unused file

39747f4

removed comments, added exception handling to log and call methods

70d2574

made opcode jump tables singletons

a7c3842

added global memory paging, that reserves memory ahead of txn execution

09c2b33

changed analysis objects from std::deque to std::vector

bf44b86

removed branch to check for basic blocks

ab92fc5

CHECK_BLOCK happy path has 1 if statement, from 3

75db85c

removed superfluous files

192d666

comments and formatting

ccd0c77

removed state.max_potential_memory

65110ff

memory::allocate_memory returns uint8_t* instead of std::pair

reactivated analysis tests

c972866

added more memory tests stack over/underflow checked against explicit variables fixed bug where out of bounds memory indices were being written to

more comemnts and formatting

a1caac6

state.stack is no longer initialized by default removed a conditional branch in analyze loop added memory.cpp

typo

9f1b919

removed comments

6177bb1

all opcode methods only have 1 argument

165a93b

reactivated commented out tests

5cc423f

updated .gitignore

8264699

push data converted in analysis stage

synced with upstream

a66374d

evmc enum fix

79a9219

removed std::max from jump and jumpi opcodes

e932da6

removed redundant code

507be9a

added blake2b_huff benchmark

32f6929

gcolvin suggested changes May 3, 2019

View reviewed changes

updated blake2b_huff contract

ea0ec48

Fix compilation warnings

9689317

chfast mentioned this pull request Aug 13, 2019

article: Efficient gas calculation algorithm for EVM #123

Merged

1 task

gcolvin mentioned this pull request Sep 15, 2019

threading model #174

Closed

chfast closed this Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evmone speedups #11

evmone speedups #11

zac-williamson commented Apr 26, 2019

chfast commented Apr 26, 2019

gcolvin commented Apr 26, 2019

gcolvin commented Apr 26, 2019

zac-williamson commented Apr 29, 2019

gcolvin left a comment •

edited

gcolvin May 1, 2019 •

edited

chfast commented Jun 23, 2019

chfast commented Jul 24, 2019

chfast commented Aug 20, 2022

evmone speedups #11

evmone speedups #11

Conversation

zac-williamson commented Apr 26, 2019

chfast commented Apr 26, 2019

gcolvin commented Apr 26, 2019

gcolvin commented Apr 26, 2019

zac-williamson commented Apr 29, 2019

gcolvin left a comment • edited

Choose a reason for hiding this comment

gcolvin May 1, 2019 • edited

Choose a reason for hiding this comment

chfast commented Jun 23, 2019

chfast commented Jul 24, 2019

chfast commented Aug 20, 2022

gcolvin left a comment •

edited

gcolvin May 1, 2019 •

edited