Surfin' Safari

Announcing SquirrelFish

Posted by Geoffrey Garen on Monday, June 2nd, 2008 at 5:37 pm

SquirrelFish Mascot

“Hello, Internet!”

WebKit’s core JavaScript engine just got a new interpreter, code-named SquirrelFish.

SquirrelFish is fast—much faster than WebKit’s previous interpreter. Check out the numbers. On the SunSpider JavaScript benchmark, SquirrelFish is 1.6 times faster than WebKit’s previous interpreter.

SunSpider runs per minute

bar graph of SunSpider runs

Longer bars are better.

What Is SquirrelFish?

SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding register window calling convention. It lazily generates bytecodes from a syntax tree, using a simple one-pass compiler with built-in copy propagation.

SquirrelFish owes a lot of its design to some of the latest research in the field of efficient virtual machines, including research done by Professor M. Anton Ertl, et al, Professor David Gregg, et al, and the developers of the Lua programming language.

Some great introductory reading on these topics includes:

I’ve also pored over stacks of terrible books and papers on these topics. I’ll spare you those.

Why It’s Fast

Like the interpreters for many scripting languages, WebKit’s previous JavaScript interpreter was a simple syntax tree walker. To execute a program, it would first parse the program into a tree of statements and expressions. For example, the expression “x + y” might parse to

        +
      /   \
     x     y

Having created a syntax tree, the interpreter would recursively visit the nodes in the tree, performing their operations and propagating execution state. This execution model incurred a few types of run-time cost.

First, a syntax tree describes a program’s grammatical structure, not the operations needed to execute it. Therefore, during execution, the interpreter would repeatedly visit nodes that did no useful work. For example, for the block “{ x++; }”, the interpreter would first visit the block node “{…}”, which did nothing, and then visit its first child, the increment node “x++”, which incremented x.

Second, even nodes that did useful work were expensive to visit. Each visit required a virtual function call and return, which meant a couple of indirect memory reads to retrieve the function being called, and two indirect branches—one for the call, and one for the return. On modern hardware, “indirect” is a synonym for “slow”, since indirection tends to defeat caching and branch prediction.

Third, to propagate execution state between nodes, the interpreter had to pass around a bunch of data. For example, when processing a subtree involving a local variable, the interpreter would copy the variable’s value between all the nodes in the subtree. So, starting at the “x” part of the expression “f((x) + 1)”, a variable node “x” would return x to a parentheses node “(x)”, which would return x to a plus node “(x) + 1”. Then, the plus node would return (x) + 1 to an argument list node “((x) + 1)”, which would copy that value into an argument list object, which, in turn, it would pass to the function node for f. Sheesh!

In our first rounds of optimization, we squeezed out as much performance as we could without changing this underlying architecture. Doing so allowed us to regression test each optimization we wrote. It also set a very high bar for any replacement technology. Finally, having realized the full potential of the syntax tree architecture, we switched to bytecode.

SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes.

The bytecode’s register representation and calling convention work together to produce other speedups, as well. For example, jumping to the first instruction in a JavaScript function, which used to require two C++ function calls, one of them virtual, now requires just a single bytecode dispatch. At the same time, the bytecode compiler, which knows how to strip away many forms of intermediate copying, can often arrange to pass arguments to a JavaScript function without any copying.

Just the Beginning

In a typical compiler, conversion to bytecode is just a means to an end, not an end in itself. The purpose of the conversion is to “lower” an abstract tree of grammatical constructs to a concrete vector of execution primitives, the latter form being more amenable to well-known optimization techniques.

Therefore, though we’re very happy with SquirrelFish’s current performance, we also believe that it’s just the beginning. Some of the compile-time optimizations we’re looking at, now that we have a bytecode representation, include:

  • constant folding
  • more aggressive copy propagation
  • type inference—both exact and speculative
  • specialization based on expression context—especially void and boolean context
  • peephole optimization
  • escape analysis

This is an interesting problem space. Since many scripts on the web are executed once and then thrown away, we need to invent versions of these optimizations that are simple and efficient. Moreover, since JavaScript is such a dynamic language, we also need to invent versions of these optimizations that are resilient in the context of an unknown environment.

We’re also looking at further optimizing the virtual machine, including:

  • constant pool instructions
  • superinstructions
  • instructions with implicit register operands
  • advanced dispatch techniques, like instruction duplication and context threading
  • getting computed goto working on Windows

Performance on Windows has extra room to grow because the interpreter on Windows is not direct-threaded yet. In place of computed goto, it uses a switch statement inside a loop.

Getting Involved

If you’re interested in compilers or virtual machines, this is a great project to join. We’re moving quickly, so the best way to come up to speed is to log on to our IRC channel.

As always, testing out nightly builds and reporting bugs is also a great help.

Extra Bonus Updates

We’ve got some extra bonus info: very early draft documentation of the SquirrelFish VM’s opcodes. For those of you who know about VMs, you may find this enlightening, for those who don’t, you may find it is simpler than you expect.

In addition, we have a detailed comparison of Safari 3.1 vs. SquirrelFish, looking at the individual tests, it is interesting to see which sped up the most. If you look at this comparison to Safari 3.0, you can see that we’ve sped up 4.34x overall since Safari 3, and have improved some kinds of code by over an order of magnitude.

SquirrelFish around the web: There’s lots of interesting discussion in the reddit article about this post. And posts from key SquirrelFish developer Cameron Zwarich has performance data and other info, as does occasional WebKit contributor Charles Ying.

67 Responses to “Announcing SquirrelFish”

  1. cying Says:

    Holy Jebus that’s fast!

  2. mazdak Says:

    This is a topic of great interest to me. I’d appreciate it if you could post the name of some “stacks of terrible books and papers” that you pored over :)

  3. ggaren Says:

    @mazdak

    So much has been written on these topics. Without knowing what you’re working on, I’m not sure what to recommend.

    If you liked the articles I posted, here are some good follow-ups:

    * Everything involving Ertl and/or Gregg tends to be well-researched, detailed, and relevant. You can start at Ertl’s ACM bibliography.

    * For more on Lua, you can check out The Evolution of Lua.

    In general, steer toward articles with detailed benchmarks, and you can’t go wrong!

  4. timofonic Says:

    Does this have something in common with LLVM? Also known as the revolution of compilers and such :)

  5. asdasdwr Says:

    I have no idea what this about, but I love the picture.

  6. webjive Says:

    It may be fast but, on some gallery pages with sliding images, WebKit pegs the ol CPU meter still. That’s a biggie for me and my customers, CPU usage.

  7. Matt Lilek Says:

    @webjive

    If you have a site that consistently uses excessive system resources, please file a bug on bugs.webkit.org

  8. wpbasti Says:

    Ever wondered if it makes any sense and at least is possible to cache this byte compiled code instead of caching the original source file. Ever done any experiments in this field? Applications built with huge application JavaScript frameworks (like qooxdoo) may have multiple large JS files with more than 1MB size (un-gezipped). Storing a “byte compiled” version in the cache may make sense for files of that size.

  9. Oliver Says:

    @wpbasti: An issue with that is that we optimise lookup of global values, which may not be valid if the load order of such files is different. That said it is possible that in the future we may be able to appropriately update such references. Another thing to consider is of course the fact that we don’t actually compile functions until they’re called, and even then the time to compile any given function is typically tiny compared to the time required to execute it.

  10. Cameron Zwarich Says:

    It’s great to finally see a post about SquirrelFish on the WebKit blog. I made a short post to kick off a new blog that I will hopefully use to talk about ongoing JavaScriptCore development. The first post includes some SunSpider numbers for the bleeding edge versions of different browsers, which may be of interest to people reading this post.

  11. Pingback from Peter Van Dijck’s Guide to Ease » Blog Archive:

    [...] The best logo ever (via Simon). [...]

  12. yellowiscool Says:

    It’s excellent !

    It’s possible to use SquirrelFish for embedding ? As spidermonkey ?

  13. Pingback from andrewskinner.name:

    [...] Safari Webkit team for calling their new super fast Javascript engine – SquirrelFish. Seriously – you can’t get more hard core geek than that. Can you? Despite the name the [...]

  14. iFrodo Says:

    Is that normal that the Web inspector is not available in the latest nightly build?

  15. Maciej Stachowiak Says:

    @yellowiscool

    Yes, it is possible to embed it in your own programs – the JavaScriptCore library has a public API that works cross-platform.

  16. Pingback from inside looking out » SquirrelFish is faster than Tamarin:

    [...] I compared WebKit’s new SquirrelFish bytecode JavaScript interpreter against Tamarin, the JIT JavaScript engine currently in Flash 9 and in development for [...]

  17. Oliver Says:

    @iFrodo: it should be there, have you checked the context menu, or the Develop menu? Or are you referring to Drosera? I ask because Drosera was recently killed off as we have now integrated the debugger with the web inspector.

  18. mazdak Says:

    @ggaren
    Thanks for the recommendations. I am looking for good introductory material on VM design for now.

  19. Pingback from WebKit continues to soar at Kevin’s blog:

    [...] Surfin’ Safari – Blog Archive » Announcing SquirrelFish [...]

  20. Pingback from Red Sweater Blog - Apple’s Script:

    [...] the rest of the Mac nerd world, I saw the announcement of SquirrelFish as very promising and inspiring news. The WebKit team has massively [...]

  21. Pingback from Michael Tsai - Blog - Announcing SquirrelFish:

    [...] new JavaScript engine: SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding [...]

  22. jeffr Says:

    What’s really exciting is that this was started and finished in two months. That is really a testament to the quality of the webkit team.

  23. Pingback from SquirrelFish: WebKit auf der Überholspur » MACNOTES.DE:

    [...] liegt, stellten heute in ihrem Blog die neue Javascript-Engine des WebKits, SquirrelFish genannt, vor. Der neue Interpreter arbeitet bis zu 4 mal schneller als noch in Safari 3 und immerhin noch 1,6 [...]

  24. coolfactor Says:

    Way to go, WebKit team!

    If I’m not mistaken, wasn’t WebKit’s javascript already head-to-head or faster than competing browser? If so, this is yet another mark for them to reach for.

    Fantastic!

  25. Pingback from David Mandelin’s blog » Blog Archive » SquirrelFish:

    [...] you’re reading this, chances are that you already know about SquirrelFish, Appl/WebKit’s new Javascript implementation. Early tests show SquirrelFish to be 60% faster [...]

  26. Pingback from Safari’s New JS Interpreter: SquirrelFish | Robert Accettura’s Fun With Wordage:

    [...] an announcement on the Safari blog about SquirrelFish, their new JS interpreter. To sum it up: SquirrelFish is a register-based, direct-threaded, [...]

  27. Ajay Says:

    I running 31st may webkit build, dont know if this includes squirrel fish.

    Safari is holy cow fast, damn, didnt know browsers of past were so inefficient

  28. randallfarmer Says:

    I’m unable to receive a WebKit bugzilla login at GMail (twotwotwo plus webkitbugs at gmail).

    Since I figure an incorrectly-submitted bug report beats none: In Safari 3.1.1 (but not Firefox 3 RC 1), trying to set the body of my GMail vacation message to “I’ll be away” makes it save as “I’ll b away”, reproducibly. If I try to set the away message “‘ABCDEFGHIJ” (note the leading quote), the “A” is dropped. (My vacation subject is “I’m away until June 17″ if that’s necessary to repro.)

    Hope this is really a WebKit bug and I’m not being silly. It’s a great product.

  29. Pingback from WebKit’s JS Interpreter is shockingly slow « Handwaving:

    [...] interpreter in Apple’s WebKit, which is the engine for the Safari browser, is very slow. Go here, scroll down to “Why It’s Fast”. They were interpreting the syntax tree rather [...]

  30. Pingback from Johnny Chadda .se : Squirrelfish Javascript engine in Webkit — speeds up Safari:

    [...] SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes. – The webkit blog [...]

  31. Thomas Says:

    I must confess I’m very happy about the commotion WebKit in general and SquirrelFish in particular are causing in the Javascript engine realm. Seemingly, new wine will again not go into old wineskins, and it needed a fairly fresh endeavor to bring Javascript closer to a level it deserves. In contrast, the Mozilla project seems to suffer from a certain stiffness in that regard, despite of Tamarin and all that, and the amount of love Firefox’s engine receives leaves me disappointed. Which is even more surprising since all of the browser’s chrome runs on top of it. All experiences from similar runtime environments (e.g. Emacs, Eclipse, basic operating systems,…) seem to be ignored and have to be gathered again. When will they start running multiple interpreter instances (or at least worker threads) in the browser, to isolate chrome and different pages from each other?! Will WebKit do it? – Anyway, way to go, WebKit!

  32. Pingback from the 4th floor » Blog Archive » squirrelfish:

    [...] squirrelfish ist ein Javascript Engine für Browser, der momentan vom WebKit Open Source Projekt entwickelt wird. Unter anderem nutzt Safari diese Engine, wohingegen Firefox gerade an tamarin arbeitet. squirrelfish scheint allerdings weitaus schneller zu sein, als Tamarin und das wird auch auf der Homepage von Webkit erklärt, allerdings alles sehr technisch und irgendwann schnallt man einfach ab, aber interessant ist es trotzdem…für Informatiker…manche… [...]

  33. gfan Says:

    The interpret speedup is impressive. Here is my test of SunSpider benchmark on Linux/debian:
    Jun 4 Webkit trunk Total: 5842.2ms +/- 12.5%
    Mar 12 Webkit trunk Total: 7864.6ms +/- 3.1%

    But SunSpider Benchmark only report the interpret time. For JavaScript, most of the scripts are compiled on-the-fly, so the compilation time is also important. It will be more convincing
    if considering both interpret time and compile time on SunSpider Benchmark. Although, considering code cache, speedup interpreter is more important.

  34. Mark Rowe Says:

    @gfan: The SunSpider execution time *already* includes parsing, compilation, and execution.

  35. Oliver Says:

    @gfan: SunSpider reports the complete time to compile and run the tests, in JSC it is basically impossible not to as we lazily compile functions, i’m not sure why you think it does otherwise.

  36. king7532 Says:

    I’m interested if this anything to do with LLVM? Someone mentioned that already, can anyone confirm or deny :)

  37. Mark Rowe Says:

    Squirrelfish does not make any use of LLVM.

  38. gfan Says:

    @Mark Rower and @Oliver: When I traced the SunSpider test cases on SpiderMonkey before, it seems to me when it executes the first line to record the time, the script is already compiled. I thought this is same for Squirrelfish. Sorry for my confusion.Thanks.

  39. mcroft Says:

    This is really impressive! I’ve seen some comparisons to Tamarind around the web and my own testing with apps seems both solid and fast.

    I’ve attempted to use squirrelfish with the latest public beta with a number of existing javascript benchmarks and I’m seeing a number of tests with results of 0 which, when I repeat the tests, either stay at zero or go to 16 ms. Is there a lower limit on measurability using the timing functions?

    Are there known functional differences between the prior engine (JavaScriptCore? is that what it was called?) and SquirrelFish? Even things that used to be broken that you fixed.

    Are there areas where we should expect dramatic speed increases that should change how JS developers design code? There are certainly costly choices that we now avoid.

    Oh, and I hope that we’ll see this on the iPhone. That’s a device whose javascript performance could use a speedup.

  40. Pingback from SquirrelFish: Webkit’s new Javascript Interpreter | /dev/amro:

    [...] Webkit’s JS interpreter has just hit the big time — it’s now a full blown vm instead of a syntax-tree-walker like the other slow-pokes.  [...]

  41. Oliver Says:

    @mcroft: based on what you’re seeing i’m guessing that you’re testing on windows, which for some reason seems to be limited to only 16ms accuracy in some circumstances :-/

  42. Pingback from SquirrelFish could make Safari a lot faster | RatZine - Rat stinkin news:

    [...] is about to get a whole lot faster. The Surfin’ Safari weblog has written about SquirrelFish, the code name (and what a code name it is) for the new interpreter for WebKit’s core [...]

  43. Pingback from Re: The Fight » Blog Archive » Where our hero does some hand waving…:

    [...] SquirrelFish – So awesome. Those webkit guys just make my day every frickin time. Too lazy to click the link? SquirrelFish is a new superfast JS vm runtime. Benchmarks show it faster than Tamarin at the moment even.  Not much need for explanation here. The better performance runtimes we get for the open web, the better it can compete against proprietary competition! Ok, I guess that’s enough for now. I really don’t want to turn this into a news aggregation blog, regurgitating things that I think are cool. You can just go to Ajaxian to see where I get MY news from. However, news regurgitation is easy, and I needed to write something. Also, I feel like such a negative nancy sometimes and I thought a positive post would be nice for a change. [...]

  44. mcroft Says:

    @Oliver: thanks! Yes, I’m testing on Win XP, so that explains that.

  45. Pingback from » Nuovo interprete JavaScript per Webkit:

    [...] Annunciato sul blog di Webkit, il motore di Safari, l’introduzione di un nuovo interprete JavaScript il cui nome in codice è SquirrelFish. Il nuovo interprete SquirrelFish è più veloce (1.6 volte più veloce) del precedente interprete JavaScript di WebKit, come evidenziato in un grafico sul blog. [...]

  46. eflaten Says:

    (off topic)
    Hello the fast fish sounds great. But I dont need more speed. I need a safari that can relax.

    Right now Safari uses 85% cpu and the only thing I do with the browser is writing this lines. Is it flash that hangs from a previous page? Or is it buggs in Safari? There is no flash as I can see. I have done a prosess sample if anyone is interested.

    I am an editor/writer and dont know so much about programming, but I do know that people dont like pages that starts up fans like an old DC-3. I understand a programmer that wants the application to have as much power as possible, but the overall experience is to hot. I have talked with colleagues and it is a problem that a page triggers heat and fans. We fear its a turn-off among the readers. One option we discussed is no-flash on the frontpage. Right now i am on powerbook, but i guess the thermomanagment is similar on other laptops.

    So if Safari and/or flash and other apps can do things slower, but cooler, I will support that.

    Lastly. Thanks for Safari. A truely great piece of work :)

    – Erland Flaten, Lillehammer. Norway

  47. boxed Says:

    @eflaten: “Higher speed” when it comes to computers means it uses the CPU less. It is the same thing, so yes you _do_ need higher speed :P

    As for flash: if the flash player is doing something in an inefficient way that is unfortunately outside the scope of what the WebCore team does. If you can indeed show that flash is to blame then you probably need to complain to Adobe.

  48. Zach Says:

    Wow! Nice fricken work optimizing Javascript. That’s awesome.

    I’ve got this little canvas demo where you watch circles smash into smaller circles (or the impatient can grab a circle and manually smash it into the other circles). Here’s the default:

    http://tech.no.logi.es/woodshop/momentum6.php

    With the latest Webkit build I can totally crank up the smashing into even smaller bits:

    http://tech.no.logi.es/woodshop/momentum6.php?webkit=1

    Try that link in the new webkit build vs Safari vs Firefox. Pretty apparent difference.

  49. Maciej Stachowiak Says:

    @eflaten

    I would love to see profile data for any page where CPU usage is out of control, especially if it does not appear to be Flash. Even if it is Flash, we can pass the data on to Adobe.

  50. Pingback from Alp Toker » Blog Archive » WebKit Meta: A new standard for in-game web content:

    [...] fastest content rendering around as well as nippy JavaScript execution with the state of the art SquirrelFish VM. The JavaScript SDK is available independently of the web renderer for sandboxed client-side game [...]

  51. Chad von Nau Says:

    There is a bootleg store in China called “Squirrel–shaped Fish”. In China it’s standard procedure for stores to illegally use names and logos from large international brands. With this store, they stole the Lacoste alligator logo, but made their own name. Pretty genius. I don’t know if this was the inspiration for the name squirrel fish, but it should be.

    http://chadvonnau.com/china/4/15.IMG_4485_bootlegs.jpg

  52. Pingback from Coccoa for the Web? « Zayne Humphrey’s Blog.:

    [...] about SproutCore on its official website. Apple also has more details on the new MobileMe, and SquirrelFish details are on the Webkit project site. Possibly related posts: (automatically generated)Adobe [...]

  53. Pingback from Browser War - Part 3: Safari 3.1.1 & Nightlies » Zimbra :: Blog:

    [...] There’s already a developer seed of Safari 4 released. Which includes the SquirrelFish JavaScript interpreter (renamed from GlassFish to avoid confusion with Apple’s other Java stuff). SquirrelFish is a [...]

  54. Pingback from And The Winner of the Browser Wars is…. » Zimbra :: Blog:

    [...] SquirrelFish JavaScript interpreter in Safari 4 is a bytecode engine which eliminates almost all of the overhead of a tree-walking [...]

  55. Pingback from dead fish » Blog Archive » 280slides:

    [...] you’ve seen it is not at all slow, I guess with Firefox 3 and newer versions of Safari / Webkit it should get even faster. The point behind this is that if the foundation stands as-is, its just a [...]

  56. dak Says:

    The numbers for SquirrelFish look pretty impressive. It seems on yesterday I was reading about its development, but there was no expectation it would be merged into WebKit anytime soon.

    Speaking of other advancements, is there any chance we’ll see a blog post about the new CSS variable support added to build 34666?

  57. some1 Says:

    Hey Apple team, I beg you to develop an updater for Safari for Windows that just updates the changed bits i.e. patches the existing install instead of downloading the whole thing again, uninstalling and reinstalling. As a user, it’s one thing keeping me away from consistently using Safari because when a vulnerability is detected, I can’t continue to use the old version and I am unable to download large files everytime.

  58. Pingback from incompl.com » June Link Dump:

    [...] SquirrelFish. Everyone should look to these folks next time they need inspiration for naming their JavaScript interpreter. And the logo? Spectacular. [...]

  59. Pingback from Macworld | WebApps i Safari 4 eller Fluid:

    [...] Safari 4. Safari 4 vil også muligens få gleden av en helt ny JavaScript interpretor med kodenavn SquirrelFish. Når dette implementeres er avhengig av utviklingsprogresjonen, men tester så langt viser gode [...]

  60. Pingback from Ten Big New Features in Mac OS X Snow Leopard — RoughlyDrafted Magazine:

    [...] Leopard Wish List: 2005 How Open will the iPhone Get? Surfin’ Safari » Announcing SquirrelFish Microsoft’s Application Features in Mac OS X, System Wide. Microsoft’s business model [...]

  61. Arley23 Says:

    Is it just me or has Webkit for Windows hasn’t worked since June 09???

  62. brunobl Says:

    It would be great to also work on memory leaks :-)
    http://dotnetperls.com/Content/Browser-Memory.aspx

  63. Pingback from Podcast #11 - stackoverflow:

    [...] excited about the SquirrelFish project, which promises to speed up plain old JavaScript running in the browser dramatically — 1.5 [...]

  64. dicklacara Says:

    Wouldn’t there be a potentially large performance gain by:

    1) pre-compiling the bytecode on the server
    2) serving the bytecode and byte code interpreter, only, to the client
    3) interpreting the bytecode on the client

  65. David Smith Says:

    dicklacara: probably not. The compilation time is pretty minimal, and that would make it so that the bytecode format couldn’t be upgraded in the future, which would make further performance gains harder.

  66. mcroft Says:

    I got SunSpider 0.9 to run on MobileSafari 2.

    Summary and comparison to WebKit post-SquirrelFish and IE7 on my 6 month old Thinkpad…
    Totals:
    iPhone: 148752.0ms +/- 3.9%
    WebKit: 2152.0ms +/- 1.7%
    IE 7 : 35659.8ms +/- 3.4%

    iPhone JavaScript is 4.17 times slower than IE7
    iPhone Javascript is 69.1 times slower than WebKit Nightly
    Internet Explorer 7 is 16.6 times slower than WebKit Nightly

    Detailed Results

  67. Maya Says:

    The secret behind isn’t LNVM, it’s just Forth. It is the know how of this some people well known programming languge. If you hear about M. Anton Ertl and David Gregg, and about a very fast direct-threaded interpreted (may be byte code or not), it is clear,. it is Forth-know-how. I’m absolutly sure.

    Some Forth systems are the fastest threaded code and even direct-threaded code interpreter (als virtual stack machines) available. Know how coming from here speeds up SquirrelFish.