Surfin' Safari

Little overview of WebKit’s CSS JIT Compiler

Posted by Benjamin Poulain on Wednesday, March 26th, 2014 at 8:41 am

When it comes to performance, JavaScript generally gets the headlines. But if you look carefully, web pages are not all JavaScript, the performance of all the other parts also has a dramatic impact on the user experience. Observing only JavaScript gives a single point of view over the complex mosaic that is web performance.

Making CSS faster and more scalable is an area of research in the WebKit project. The DOM and CSS do not always scale very well and it is sadly still common to see missed frames during complex animations.

A technique we use to speed things up in JavaScript is eliminating slow paths with Just In Time Compilers (JIT). CSS is a great candidate for JIT compilation, and since the end of last year, WebKit compiles certain CSS Selectors on the fly.

This blog introduces the CSS Selector JIT, its pros and cons, and how you can help us make CSS better.

How and why compile CSS?

CSS is the language telling the engine how to present the DOM tree on screen. Every web engine uses a complex machinery that goes over everything in the DOM and applies the rules defined by CSS.

There is no direct link between the DOM tree and the style rules. For each element, the engine needs to collect a list of rules that apply to it. The way CSS and the DOM tree are connected is through the CSS selectors.

Each selector describes in a compact format what properties are required for an element to get a particular style. The engine has to figure out which elements have the required properties, and this is done by testing the selectors on the elements. As you can imagine, this is a complicated task for anything but the most trivial DOM tree (fortunately WebKit has many optimizations in this area).

So how did we make this faster? A simple solution we picked is to make testing a selector on an element very fast.

The way selector matching used to work is through a software machine, the SelectorChecker instance, taking two inputs: a selector and an input element. Given the inputs, a SelectorChecker goes over each part of the selector, and tries to find the required properties in the tree ending with the input element.

The following illustrates a simplified version of how selector testing used to work:

The problem with SelectorChecker is that it needs to be completely generic. We had a complicated selector interpreter, capable of handling any combination of difficult cases for any selector. Unfortunately, big generic machines are not exactly fast.

When using the CSS JIT, the task of matching a selector is split in two: first compiling, then testing. A JIT compiler takes the selector, does all the complicated computations when compiling, and generates a tiny binary blob corresponding to the input selector: a compiled selector. When it is time to find if an element that matches the selector, WebKit can just invoke the compiled selector.

The following animation illustrates the same process as above, but using the CSS Selector JIT:

Obviously, all the complexity related to CSS Selectors is still there. The Selector JIT Compiler is a big generic machine, just like SelectorChecker was. What changed is that most of that complexity has been moved to compilation time, and it only happens once. The binary generated at runtime is only as complex as the input selector.

There is beauty in simplicity

Although one might think that employing a JIT always makes execution faster, it is a fallacy. The truth is adding a compiler starts by making everything slower, and the compiler makes up for it by creating very fast machine code. The overall process is only a gain when the combined execution time of the compiler and compiled code is smaller than the execution time of the compiler.

When the workload is small, the time taken by the compiler is bigger than the gain. For example, let’s say we have a JIT compiler that is 4 times slower than SelectorChecker, but the compiled code is 4 times as fast as SelectorChecker. Here is the time diagram of one execution:

With this kind of timing, we can run 5 full queries on the old C++ selector checker and still be faster than the JIT.

When the JIT compiler is fast enough and the workload is large enough, the compiled version wins:

This constraint is also a reason why benchmarks running for a long time can be misleading, they can hide slow compilers. JIT compilers can help to have a great throughput for long running programs, but no real web page behaves that way. The latency introduced by compilation also has the potential to become a disaster for animations.

Does this mean we shot ourselves in the foot by making something that is only fast in benchmarks? Not really, we fixed that problem too :)

There are several ways to mitigate the latency introduced by a JIT compiler. JavaScriptCore uses multiple advanced subsystems to reach that goal. So far, the Selector JIT can get away with a simple solution: make the compiler extremely fast.

There are two key parts to the speed of this compiler.

  1. First, the compiler is very simple. Making optimizations can take a lot of time, so we decided to optimize very little. The generated binary is not perfect but it is fast to produce.
  2. The second trick is to use very fast binary generation. To do that, the compiler is built on top of JavaScriptCore’s infrastructure. JavaScriptCore has tools to generate binaries extremely quickly, and we use that directly in WebCore.

In the most recent versions of the JIT, the compilation phase is within one order of magnitude of a single execution of SelectorChecker. Given that even small pages have dozen of selectors and hundreds of elements, it becomes easy to reclaim the time taken by the compiler.

How fast is it?

To give an idea of order of magnitude, I have prepared a little microbenchmark for this blog. It tests various use cases, including things that used to be slow on WebKit.

On my Retina Macbook Pro, the benchmark runs in about 1100 milliseconds on a WebKit from December, and in less than 500 milliseconds on today’s WebKit nightly. A gain of 2x is generally what we expect on common selectors.

Obviously, the speed gains depends greatly on the page. Gains are sometimes much larger if the old WebKit was hitting one of the slow paths, or could be smaller for selectors that are either trivial or not compiled. I expect a lot to change in the future and I hope we will get even more feedback to help shaping the future of CSS performance.

What about querySelector?

The functions querySelector() and querySelectorAll() currently share a large part of infrastructure with style resolution. In many cases, both functions will also enjoy the CSS JIT Compiler.

Typically, the querySelector API is used quite differently from style resolution. As a result, we optimize it separately so that each subsystem can be the fastest for its specific use cases. A side effect of this is that querySelector does not always give a good picture of selector performance for style resolution, and vice versa.

How can you help?

There is ongoing work to support everything SelectorChecker can handle. Currently, some pseudo types are not supported by the JIT compiler and WebKit fall backs to the old code. The missing pieces are being added little by little.

There are many opportunities to help making CSS faster. The existing benchmarks for CSS are extremely limited, there is nothing like JSBench for CSS. As a result, the input we get from performance problems on real websites is immensely valuable.

If you are a web developer, or a WebKit enthusiast, try your favorite website with WebKit Nigthly. If you run into performance problems with CSS, please file a bug on WebKit’s bug tracker. So far, every single bug that has been filed about the CSS JIT has be hugely useful.

Finally, if you are interested in the implementation details, everything is open source and available on webkit.org. You are welcome to help making the web better :)

You can send me questions to @awfulben on twitter. For more in-depth discussions, you can write an email on webkit-help (or maybe file a bug report).

Advanced layout made easy with CSS regions

Posted by Beth Dakin on Wednesday, October 30th, 2013 at 12:02 pm

Co-written by Beth Dakin and Mihnea-Vlad Ovidenie

CSS regions are an exciting technology that make it easier than ever to create rich, magazine-like layouts within web content. Regions have been under development in WebKit for a while now, and we’re delighted to tell you that they are available for use in Safari on iOS 7, Safari 7 on Mavericks, and Safari 6.1 on Mountain Lion.

Magazine-like layout

So I wrote this little article for my personal blog:

Document in regions

That’s cool and all, but wouldn’t it be so much cooler if it had a more interesting layout like this?

Document in regions

So fab! Without regions, achieving a layout like this is a pain. You have to figure out exactly which parts of the article can fit into each box and then hard-code the article content into the appropriate boxes. And after all that work, the design will get totally messed up if the user changes the font size! The layout looks cool, but doing it this way is a lot of work, and it isn’t even a little bit flexible.

Regions make achieving this layout as easy as pie. They allow authors to indicate that some sections of content are intended to define an overall layout template for a portion of the document and that other sections of markup represent the content that is intended to fill that template. The semantically-related content that will flow through the template is called a “named flow.” In our example above, the named flow is the text of my article. Once it has been named, the named flow is distributed into disjointed containers called regions, which can be positioned in any way to achieve the desired layout.

Our simple example only scratches the surface of what you can do with regions. We’ll get to more sophisticated applications later, but first let’s take a closer look at the code.

What is a named flow?

A named flow is a collection of HTML elements extracted from the document’s normal flow and displayed separately. Any HTML element can be part of a named flow. When an element is collected in a named flow, all of its children are collected with it.

You identify a collection of HTML elements as a named flow by using the CSS property -webkit-flow-into. In our example, the named flow will be the elements that contain the text of our article:

<style>
    #flow-content { -webkit-flow-into: pizza-manifesto; }
</style>
<div id=”flow-content”><h1>Pizza is amazing</h1>...</div>

Our example only has one, but a document can have any number of named flows, each with its own name.

Flowing into regions

A region is a block-level element that displays content from a named flow instead of its own content. Regions can have any size and can be positioned anywhere in the document. They are not required to be siblings or to be positioned next to each other in the layout.

A region consumes content from a single named flow. Most of the time, to achieve an interesting layout, there will be more than one region associated with a named flow, and when that is the case those regions form a region chain. When content from a named flow does not fit into a region, the content simply flows into the next region in the chain.

Making an element a region is as easy as adding the -webkit-flow-from CSS property. In our example, the regions are the elements that form the layout template for the document’s overall design:

<style>
    .region { -webkit-flow-from: pizza-manifesto; }
</style>
<div class="region" id="region-1"></div>
<img src="pizza.jpg" width=512 height=342>
<div class="region" id="region-2"></div>
<div class="region" id="region-3"></div>
<div class="region" id="region-4"></div>
<div class="region" id="region-5"></div>

Take a look at the code for the actual document to see the code for the regions side-by-side with the code for the named flow.

One key thing to remember about regions is that they are only visual containers. Region elements do not become the DOM parents of the elements flowed inside them; they only establish the bounding boxes that visually constrain the flow content.

Advanced regions features

One cool feature in the CSS Regions specification is region styling. With region styling, a designer can style the content based on which region it ends up flowing through. For example, if you wanted to change the color of the text displayed in the second region of the article flow, you could do so with region styling:

<style>
    @-webkit-region #region-2 {
        p { color: green; }
    }
</style>

The extra styles are dynamically applied behind the scenes whenever the layout of the article content in the regions changes. So for example, if the user resizes the browser window and different pieces of content end up flowing through the styled region, the content will update dynamically. At this time, you can only style regions with the CSS properties color and background-color, but we intend to progressively add support for more properties, so stay tuned! In the meantime, check out this version of our article that uses region styling.

There is also a whole object model available for interacting with regions and named flows from within JavaScript. The proposed API will make it even easier to create fluid designs that adapt to layout changes. For example, authors can use the API to determine whether or not there are enough regions to display the content from the named flow. Handy stuff!

Dreaming with regions

CSS regions are powerful, and when they are combined with other advanced CSS features like shapes, filters, flexible boxes, transforms, and media queries, incredibly sophisticated designs can emerge.

Back in February, during a CSS regions pattern rodeo hosted by CodePen, Tyler Fry and Joshua Hibbert created some awesome regions demos. Tyler won the contest with his reading carousel made out of regions and transforms, and Joshua created an exploding book, featuring a nice hover effect when opening the book.

The Adobe WebPlatform team has created some very compelling demos with regions in partnership with National Geographic. Check out this article that seamlessly integrates text and photographs to create a flexible design. Adobe has also created a demo that is so cutting edge you will need to download a WebKit Nightly to view it properly. This beautiful prototype uses regions, so the article content breaks up automatically across the different containers, and if font size or window size changes, or if the user zooms in, everything reflows automatically. Check out the source code here!

We are so excited about regions as a technology and excited that they are already available for use in shipping browsers. We plan to continue to refine our implementation in WebKit and to add additional features, so be sure to check back for improvements.

WebKit and C++11

Posted by Anders Carlsson on Friday, September 6th, 2013 at 2:36 pm

I am happy to announce that as of r155146, we now require our various ports to build with compilers that support some C++11 features. This means that we don’t need to use the COMPILER_SUPPORTS() macro to conditionally use these features anymore. These are:

  • Type inference
  • Static assertions
  • Move semantics

We’ve chosen these three features because they are well-supported in recent versions of the compilers we use: clang, MSVC and GCC.

What does this mean for people writing code? Here are three code examples where these three features come in handy:

Type inference

Type inference using the auto keyword will automatically deduce the type of a variable based on its initializer. This is especially useful iterators. This loop:

HashMap<OriginStack, OwnPtr<ExecutionCounter> >::const_iterator end = m_counters.end();
for (HashMap<OriginStack, OwnPtr<ExecutionCounter> >::const_iterator iter = m_counters.begin(); iter != end; ++iter) {
    ...
}

becomes

for (auto it = m_counters.begin(), end = m_counters.end(); it != end; ++it) {
    ...
}

Unfortunately, the new range-based for syntax is not supported by all compilers, but this is definitely a step in the right direction.

Static assertions

static_assert is a way to declare compile-time assertions. If an assertion is false, the compiler will produce an error. WTF already has a COMPILE_ASSERT macro that provides this functionality, but static_assert produces better error messages.

COMPILE_ASSERT(sizeof(AtomicString) == sizeof(String), atomic_string_and_string_must_be_same_size);

gives the error

/Source/WTF/wtf/text/AtomicString.cpp:43:1: error: 'dummyatomic_string_and_string_must_be_same_size' declared as an array with a negative size
COMPILE_ASSERT(sizeof(AtomicString) == sizeof(String), atomic_string_and_string_must_be_same_size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /Source/WTF/wtf/text/AtomicString.cpp:23:
In file included from /Source/WTF/config.h:62:
In file included from /Source/WTF/wtf/FastMalloc.h:25:
In file included from /Source/WTF/wtf/PossiblyNull.h:29:
/Source/WTF/wtf/Assertions.h:324:60: note: expanded from macro 'COMPILE_ASSERT'
#define COMPILE_ASSERT(exp, name) typedef int dummy##name [(exp) ? 1 : -1]

Whereas

static_assert(sizeof(AtomicString) == sizeof(String), "AtomicString and String must have the same size");

gives

/Source/WTF/wtf/text/AtomicString.cpp:43:1: error: static_assert failed "AtomicString and String must have the same size"
static_assert(sizeof(AtomicString) == sizeof(String), "AtomicString and String must have the same size");
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Move semantics (rvalue references and move constructors)

Move semantics can provide improved performance when passing objects by value by moving the data instead of copying it. What it means for WebKit is that we can stop using out parameters in functions that return Vectors. For example:

void HTMLFormElement::getNamedElements(const AtomicString& name, Vector<RefPtr<Node> >& namedItems)
{
    // http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#dom-form-nameditem
    elements()->namedItems(name, namedItems);

    HTMLElement* elementFromPast = elementFromPastNamesMap(name);
    if (namedItems.size() == 1 && namedItems.first() != elementFromPast)
        addToPastNamesMap(toHTMLElement(namedItems.first().get())->asFormNamedItem(), name);
    else if (elementFromPast && namedItems.isEmpty())
        namedItems.append(elementFromPast);
}

becomes

Vector<RefPtr<Node>> HTMLFormElement::namedElements(const AtomicString& name)
{
    // http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#dom-form-nameditem
    Vector<RefPtr<Node>> namedItems = elements()->namedItems(name);

    HTMLElement* elementFromPast = elementFromPastNamesMap(name);
    if (namedItems.size() == 1 && namedItems.first() != elementFromPast)
        addToPastNamesMap(toHTMLElement(namedItems.first().get())->asFormNamedItem(), name);
    else if (elementFromPast && namedItems.isEmpty())
        namedItems.append(elementFromPast);

    return namedItems;
}

(Note that this may have been true in the past in some cases too, due to the named return value optimization), but now it’s safe to do this for all Vectors with a zero inline capacity as well as HashMap and HashSet too!

Move semantics is an interesting topic that I hope to cover further in another blog post, so I’ll only mention it briefly here.

One more thing

Astute readers may have noticed another C++11 feature in the previous example that we can now use. With C++11 there is no need to use a space between right angle brackets when closing template arguments lists! This means that

OwnPtr<Vector<RefPtr<Node> > > m_childNodes;

becomes

OwnPtr<Vector<RefPtr<Node>>> m_childNodes;

Personally I’m really excited about using these features and I think they will be useful throughout the codebase. In time we’re going to start requiring even more C++11 features but this is a good start.

Reference Radness: What’s up().with().all().this() in WebKit lately?

Posted by Andreas Kling on Tuesday, August 27th, 2013 at 2:36 pm

About a month ago, I had a moment of clarity and started converting WebKit code to using references instead of pointers when passing around objects that are known to exist. At first I was just playing around, but it gradually got more serious and other people started chipping in patches too.

Darin suggested I write this blog post to clear up any confusion there might be about why/when to use references, so let’s cut to the chase! There are two main reasons for using e.g Frame& instead of Frame*. Reason number one:

It documents that we are referring to an existing Frame object, and that it’s safe to call member functions on it.

Compare this example:

// WebKit in 2012, so gaudy!
return m_page->mainFrame()->eventHandler()->mousePressed();

..to this:

// WebKit in 2013, so fab!
return m_page.mainFrame().eventHandler().mousePressed();

In the old version, it’s not clear that m_page, mainFrame() and eventHandler() are non-null. In fact, you’d need a pretty good understanding of the WebCore object model to know which pointers could be null and which ones couldn’t. It was typical for functions to be littered with checks like this:

if (!m_page)
    return false;
if (!m_page->mainFrame())
    return false;
if (!m_page->mainFrame()->eventHandler())
    return false;

…which brings me to reason number two:

It exposes unnecessary null checks by turning them into compile errors.

C++ doesn’t let you null check references, so you have no choice but to remove them. This is awesome, because it means smaller and faster code, both binary and source wise. The CPU doesn’t have to spend time checking if the object is really there, and you don’t have to spend time worrying about what to do if it isn’t. Everyone wins!

So when should you be using references?

  • If you were going to return a pointer, but you know it will never be null, make it a reference!
  • If you take a pointer argument, but you don’t want to handle it being null, make it a reference!
  • If your class has a pointer member that never changes after construction, make it a reference!
  • BUT if your code has a RefPtr<Frame>, note that switching to a Frame& would no longer ref/deref the Frame, which may not be what you want!

My dream is that one day, if I see a -> or a *dereference, there will also be a null check in the preceding lines (or a comment explaining why it’s safe to dereference.) This may not be entirely realistic, but let’s see how far we can get.

Improved support for high-resolution displays with the srcset image attribute

Posted by Dean Jackson on Monday, August 12th, 2013 at 11:33 am

WebKit now supports the srcset attribute on image (img) elements (official specification from the W3C). This allows you, the developer, to specify higher-quality images for your users who have high-resolution displays, without penalizing the users who don’t. Importantly, it also provides a graceful fallback for browsers that don’t yet support the feature.

See the new feature in action. Note that you’ll want a recent nightly build of WebKit. I’ve also included part of the demo inline below.

Example of the srcset attribute. The image contains a coloured striped pattern with some inline text that indicates which of the candidate images were selected.

As you may know, WebKit has supported the -webkit-image-set CSS function for more than a year now. It lets CSS properties that take images provide a list of candidate image urls, each with a modifier such as “2x”. This allows the browser to chose the best image for the user’s device. Before this feature, if you wanted to support high-resolution displays you had a few options, each with some downsides. You could duplicate your CSS. You could use JavaScript to query the device pixel ratio and update image resources after the page has loaded, possibly triggering multiple image loads. Or you could hard-code a higher-resolution image and thus penalize some users who were downloading more data than necessary. And if you were providing images of different resolution, there was the added pain of specifying related CSS properties such as the background-size or slices for border-image. It was annoying. Thankfully -webkit-image-set solved this by allowing you to write one simple rule and have the browser do the work of deciding which image to use and therefore which to download. (Hmmm… maybe -webkit-image-set deserves its own blog post! :) )

The srcset attribute on img is very similar to -webkit-image-set. In fact, you can think of it as the markup equivalent to the CSS feature. Like the list of candidate images in -webkit-image-set, you add a new attribute srcset to your image elements. It’s designed to be backwards compatible: browsers that don’t support the attribute ignore it, and continue to use the src value. Meanwhile, browser engines such as WebKit can look at the srcset and decide which image best suits your user’s device. In most cases you won’t need anything more than this:

<img src="normal-image.jpg" srcset="better-image.jpg 2x">

Notice the “2x” after “better-image.jpg”? That tells the browser that if you’re on a display with two or more device pixels per CSS pixel, it should use “better-image.jpg” instead of “normal-image.jpg”. And if you’re not on a high-resolution display, the browser will fall back to the value specified in src. You can also specify another candidate for 1x displays in srcset (as shown in the example).

You can read more about srcset in the official specification. Note that at the moment WebKit only supports the resolution modifiers (e.g. 1x, 2x, 3x). As with any new feature in WebKit there may be bugs, so please be on the lookout for anything that doesn’t behave as expected. In particular, you should verify that WebKit is downloading the minimal resources for a page, because that’s one of the goals of this feature.

Special thanks go to WebKit community members Romain Perier and Yoav Weiss who made important contributions (r153624, r153733) to this feature.