Wasm is not going to save Javascript
This article is a case study of the performance impact of improving bnf-parser library to be able to take a given BNF syntax input and compile it all the way down to an optimised parser in wasm for execution - to improve parse times of arbitrary syntaxes.
Testing Methodology
For each round of testing we will parse two rather complex BNFs (sequalize, lolcode) via three different parsing methods sequentially. Measuring the total parse time for each method using parse hooks. The library itself is actually boot strapped (it compiles and uses itself), and the first stage of compiling a BNF syntax is parsing it - so this is a valid test case for potential parsers generated by the library.
All three parse methods are tested per round to keep the execution of each method closely coupled to each other to mitigate the impacts of background processes, and V8 optimisations so that these external factors will hopefully affect each of the parsers similarly.
The first two parsers are actually the same wasm compiled parser in two different modes, the first with source mapping enabled, and the second one without. Source mapping is an optional extra parse which can be applied to the wasm parser which correctly maps syntax nodes of the tree to the column
, row
, and javascript index
(don't get me started on UTF-16) it spans. This is an optional extra parse in bnf-parser because it allows the parser to not waste time allocating extra reference objects which aren't necessary for applications which don't need syntax error messages.
These compiled parsers have also had the default optimisations applied within binaryen which should hopefully give them an advantage over the Javascript implementation (assuming V8 optimisations don't have their way).
The third parser is using bnf-parser's legacy parser which behaves kind of like a graph traversal completely in Javascript where the graph structure is generated from a BNF, and the resulting syntax tree for a given input is generated based on this graph traversal (like a DFA).
We ran the testing round 10,000
consecutive times, gathering results using NodeJS perf_hooks, we then also ran the tests a second time with more in-depth hooks put into the bnf-parser artifacts to see exactly what's going on under the hood. These performance measurements where not taken in the tests comparing the different parsers as the act of measuring them would heavily impact the overall performance of the wasm results, they're just that fast.
Results
From the results below we can see a few interesting trends, both the wasm w/ mapping
parser and the legacy
parse both have significantly higher 99% execution times than their 1%. This is due to the fact both of them are receiving a lot of love from V8's excellent optimiser. For the first couple of runs it's slow, but once the JIT realises it's doing the same thing many times it starts to optimise it.
We also did a test where after each testing round we attempted to parse another random non bnf syntax, to see if it threw off V8's optimisations due to the graph traversal functions now running on a different graph to the one it was optimised for. However that had no negligible effect.
Wasm w/ map |
Wasm no map |
Legacy | |
---|---|---|---|
Max | 6.1372ms | 2.5623ms | 13.5988ms |
99% | 2.3481ms | 0.8897ms | 2.2354ms |
50% | 1.5971ms | 0.6350ms | 1.6305ms |
1% | 0.6533ms | 0.2202ms | 0.4673ms |
Min | 0.6437ms | 0.2173ms | 0.4602ms |
Mean | 1.2774ms | 0.5384ms | 1.2102ms |
Comparing the median legacy
times to wasm no map
we can see an approximate 2.56x
- however only legacy
is generating SyntaxNode
references, so it's not a fair comparison of equivalent compute. Comparing the wasm
parser with source mapping to legacy
we see only a 1.04x
improvement.
And that might get you thinking - wow, JS really isn't that slow. But comparing wasm
to raw JS performance isn't fair either, because you're missing a step. You need to move data in and out of the JS world to the wasm
instance, and that has a tax.
The transport tax
There are four main stages of using the wasm
bnf-parser;
- Encoding: This is where you take the input data from Javascript, and write it into the
wasm
instance's memory, and also tell the instance how long the data you just put in is - Parsing: This is the actual work we want to complete, this is iterating over the string and generating the entire syntax tree
- Decoding: We want to be able to use that tree in JS, so we need to load it back out to be useful - bring it back over to JS land.
- Mapping: This is generating the source references for a given syntax tree, based on the input data.
It's important to note that the
mapping
part is almost entirely done in Javascript rather than inwasm
, because the computation is super simple, you're just iterating forward over a string counting the index as you depth first traverse over the syntax tree filling in the references (this is done using stack operations, so it's a single function call to save on the extra tax of recursive calls in JS).
Since the majority of the complex work being done is simply allocating new objects to store the reference at each point - there will be no real time-saved by doing this in WASM, and any time saved will be mostly lost due to the data transfer tax.
Encode | Parse | Decode | Mapping | |
---|---|---|---|---|
Max | 0.2647ms | 0.2692ms | 2.6253ms | 3.5991ms |
99% | 0.0064ms | 0.1443ms | 1.0704ms | 1.1919ms |
50% | 0.0026ms | 0.1335ms | 0.6914ms | 0.9063ms |
1% | 0.0023ms | 0.0436ms | 0.1738ms | 0.4232ms |
Min | 0.0020ms | 0.0428ms | 0.1720ms | 0.4160ms |
Mean | 0.0032ms | 0.0910ms | 0.5242ms | 0.7071ms |
From this data we can see we are spending 40.03%
of our time just moving data between JS to WASM land - that's almost half of the entire computation.
We can also see the other 55.41%
is taken up by generating the source references.
Leaving only 7.70%
of the time we took to run this parse actually computing the syntax tree!!!
What's up with Javascript's GST rates being so high?
The reason this tax for transferring data is so high is painfully illustrated by the difference in compute time between decode
and mapping
. Mapping is much simpler, than trying to traverse a tree generated by a different language where you need to worry about bit alignment, as well as actually decoding the foreign data into something Javascript can use.
The reason is object allocation.
Everything in Javascript is an object, even the object within an object is an object - and that's objectively a problem.
In any statically typed language if you allocate a struct
which has another struct
as it's member, you get that child for free. That isn't the case in Javascript. Every SyntaxNode
, has a ReferenceRange
which contains two Reference
objects - so that means if you want to allocate a SyntaxNode
and fill in all of it's children, that's actually 4
allocations, not 1
.
The reason decoding is able to be as fast as it is; is because of object reuse. By default every single SyntaxNode
actually shares the same ReferenceRange
instance, that means that range and it's two children only need to be allocated once, and every SyntaxNode
gets a ReferenceRange
so now you don't need to do null checks everywhere - and we only have one allocation per node.
But when you run the source map over the syntax tree, now for every single SyntaxNode
you have to perform 3
allocations: ReferenceRange
, start Reference
, and end Reference
.
Part of the reason the execution in wasm
is actually so fast is because it only does one allocation, the entire tree itself.
The entire tree represented in wasm
is actually flat packed into linear memory. And since after every parse the data is read out, we don't need the previous tree after each parse - so we can just write over it. So we have zero allocations because we just use the same single allocation. In other languages line C++
you could allocate a vector a factor or two larger than your estimated tree size, then compute your flat tree, then shrink afterwards. Two allocations.
In Javascript everything is an object, everything must be independently allocated.
Can Wasm still work as an accelerator?
Wasm libraries can still work as an accelerator to Javascript, in almost an identical way to how everything in Python is actually a C library. You could have a library for matrix multiplication, and all of your matrices are permanently stored in WASM, only coming out after computation is complete to be printed, sent over the network, or written to file.
So much like the current Python eco system, JS could lead towards a world where it's a glue language - the problem is that for typical Javascript workflows it's 90%
glue.
For the vast majority of Javascript execution it's:
- Take something from the network
- Perform a small amount of manipulation
- Send it out to the network, or write to the DOM.
JS is primarily a middle man language, attempting to use it like python to abstract the middle man's duty to another person creating another middle man leads to very little performance gain, and a whole lot of headache. Try talking to the tech support of any major tech company and you'll see what I mean.
Wasm also has another extra headache. It's security focused, meaning every wasm instance has it's own independent memory, that means two different wasm libraries can't actually share memory, unless they are recompiled together, or you parse data between wasm libraries much like you do from JS to wasm.
Plus the majority of wasm compiled modules don't actually play nicely together, they're compiled to rule their sandpit, and no one else can enter. If you attempted to bring a C++ library and Rust library into the same WASM module, who's malloc
implementation are we using? There is only one linear memory, and they can't both be operating on the same space.
Who's are we choosing? How do we choose? How do each of the children know who was chosen?
Wasm is what people wanted docker to be
Wasm is a really powerful tool, but I think people are miss understanding where it's heading and what it will be great for. No it will not be great for bringing that one Rust library to use in your TS workflow. It's better to think of it like a super light weight and actually portable docker container that can execute anywhere.
You can bring that container into the browser to act as your front end, or you can have it running as micro-service, or the entire backend.
What it's not is a way to use language X
's library in language Y
.