We’re pleased to announce that WebKit has a full WebAssembly implementation.
ArrayBuffer. All WebAssembly memory operations operate on the instance’s memory. Finally, a table holds handles to WebAssembly functions, allowing indirect calls within an instance to target different functions with the same signature (in C++-speak: virtual functions and function pointers). Interestingly, instances can share the same memory, and tables can call directly across instances which enables dynamic linking.
To allow sharing of modules between Web Workers and to prepare ourselves for future features like threads, we’ve made our internal representation of WebAssembly code thread-safe. This lets developers
postMessage a compiled
WebAssembly.Module across workers without requiring re-compilation, copying, or any other redundant work. Our implementation of
postMessage for modules is simpler than a riddle: sharing a module between workers involves passing a reference to our internal module representation to the other worker. That worker will run the same machine code as the agent that originally produced the module.
WebAssembly directly exposes 32- and 64-bit integers as well as 32- and 64-bit floating point numbers. Its instruction set is equally simple:
The instructions are low-level by design, and it is this low-level quality that gives WebAssembly its power. WebAssembly was born as a compilation target, molded by compiler engineers.
WebAssembly modules can easily contain tons of code, some of which isn’t executed more than once or very frequently. This is why we opted to use two tiers: one that generates decent code quickly, and one that generates optimized code only when the engine thinks the code is hot enough to warrant it. BBQ compiles code about 4× as fast as OMG, but produces code that executes roughly 2× as slow as OMG. We use a background thread when compiling functions using the OMG. When OMG compilation finishes, we pause the executing WebAssembly threads and hot-patch the OMG compilation into the module.
In order to produce executable code as soon as possible, the BBQ tier omits many of the optimizations possible in the B3 compiler. Additionally, the BBQ tier also uses a linear scan combined register / stack allocation algorithm. This allocates registers about 11× faster than the graph coloring algorithm that B3 usually uses. Avoiding expensive optimization allows WebKit to produce BBQ fast, so that BBQ may be consumed as soon as possible.
When a function has been executed enough times, our WebAssembly runtime decides to optimize that function. Since WebAssembly does not require any type speculations, we only use tiering to conserve compile time. BBQ code contains only the profiling needed to detect when code has executed many times.
One of the most important optimizations in WebAssembly is reducing the overhead on memory accesses while preserving security guarantees. WebAssembly specifies that all memory accesses are performed on a single 32-bit linear memory. We do this by reserving slightly more than 4GiB of virtual address space. This virtual reservation doesn’t occupy physical memory unless accessed. We use the hardware’s page protection mechanism to mark all but the lower pages as non-readable and non-writable. All loads and stores from this 32-bit address space can be performed normally, without explicit bounds checking instructions. If a load or store would be out of bounds, it will trigger a hardware fault that will ultimately result in a POSIX
SIGSEGV or a Mach
EXC_BAD_ACCESS exception. Our custom signal / Mach exception handlers then ensure the fault originated from a WebAssembly memory operation. If so, we set the instruction pointer to a special code stub that performs a WebAssembly trap and materializes a