Surfin' Safari

Last week in WebKit: Benjamin left me in charge of choosing a title

Posted by Benjamin Poulain on Monday, August 12th, 2013 at 2:06 am

User visible changes

Sam Weinig removed the JavaScript APIs for Microdata. Microdata failed to gather support from both authors and engine developers. The removal of the APIs had been proposed on webkit-dev and got unanimous support.

Timothy Hatcher modified the rendering of SVG images onto <canvas> to allow the canvas to remain untainted in some cases. Specifically, if an SVG image does not embed foreign content (e.g. HTML), rendering in onto a canvas will not taint the canvas. Any foreign object in the SVG still cause the canvas to be tainted.

And some less dramatic changes:

Engine changes

Any questions? Suggestion for the blogs? Chirp our way to @awesomekling and @awfulben.

Unusual speed boost: size matters

Posted by Benjamin Poulain on Thursday, August 8th, 2013 at 3:43 pm

One of my concerns last year was that while WebKit was getting faster on common benchmarks, some operations were getting slower as a byproduct of new features and optimizations.

WebKit is a fast-moving project, you see new features and optimizations coming daily. A consequence of most of those changes is a rapidly increasing binary size. But does binary size really matter? On a patch-by-patch basis, the growth is harmless and barely detectable. But give it a few months, and you’ll have thousands of patches which together aggregate to a heavier, slower engine.

The following chart shows the evolution of WebCore’s binary size last year. It took about 1.5 months to grow an extra megabyte

The growth of WebKit’s binary size affects speed in 3 ways: memory locality, startup time and memory use.

The memory locality of the instructions affects the CPU caches; the more the code is spread out, the more time the CPU cores are waiting for something to do. Typically, we want to reduce the number of cache misses, being from the instruction-cache, TLB, page faults, etc. Since cache sizes and memory speeds do not increase as fast as our binaries are growing, we cannot count on new hardware to compensate. We need to address the issue in software.

Startup time is impacted by two factors: the most important is loading the binary from disk (which is really slow compared to the speed of a CPU). Memory locality also comes into play because all the caches are cold.

Memory use is quite direct, the more code required by WebKit, the more instructions page loads take place. The memory footprint of the binary is a relatively minor problem; everything isn’t wired into real memory all the time, and a few megabytes increase is a small amount compared to how much memory is used by modern web pages.

How do we fix that? No two patches have been much alike, so I’ll discuss three of the bigger principles below.

Better code

Interestingly, we discovered that in some cases the compiler was doing a “bad job” because the C++ code was preventing the compiler from optimizing. This typically happens when developers try to save a few lines of code instead of writing more explicit algorithms.

A small artificial example will illustrate this better than an explanation. Let’s take the following code:

inline void updateCachedWidth() {
    m_cachedWidth = computeWidth();
    m_cachedWidth *= deviceScaleFactor();
}

From this, the compiler will generate something like:

    call computeWidth()
    store the result at the address of m_cachedWidth
    call deviceScaleFactor()
    load m_cachedWidth from its address
    multiply the two values
    store the result at the address of m_cachedWidth

You may wonder: why is the compiler storing the value in the middle instead of keeping it in a register? This is a common problem: the compiler doesn’t know if the function call will cause m_cachedWidth to be modified, therefore it must re-load the value from memory instead of keeping it in the register.

With a tiny modification:

inline void updateCachedWidth() {
    double newWidth = computeWidth();

    newWidth *= deviceScaleFactor();
    m_cachedWidth = newWidth;
}

We get what we expect:

    call computeWidth()
    keep the result in one available register.
    call deviceScaleFactor()
    multiply the result with the value in register
    store the result at the address of m_cachedWidth

Inline functions

Inline functions are one of the greatest tools for performance, yet one of the most dangerous.

On one hand, inlining gets rid of the register spilling and a jump. This means better register allocation, fewer branches for the CPU to track, more optimization opportunities, etc. When the inlined code is small, the resulting binary is both smaller and more efficient.

On the other hand, if the inlined code is larger than the calling overhead, we end up with bigger binary and the result may or may not be faster than the non-inlined code.

Inlining bigger functions is walking on eggs, you always run the risk of blowing up the instruction cache and making the program much slower than it could be.

There appears to be some tendency to over-inline code to get rid of functions on profilers. While this works great on powerful development computers with huge CPU caches, it does not always fare so well on laptops and embedded systems. It’s also hard to investigate since the problem doesn’t show so clearly in profilers.

Excessive inlining was addressed in two ways. First, for smaller inline functions that are widespread, we worked hard on making them smaller by using different instructions or changing the way their algorithm works. For the bigger functions, we either uninlined them, or split them into a hot inline path and a colder out-of-line function.

Static initializer

Static variables were sometimes costing a lot in binary size without helping the performance at runtime.

Let’s take a simple example:

Object* someObject()
{
    static Object* someObject = new Object();
    return someObject;
}

…which translates into these ARM Thumb instructions:

__Z10someObjectv:
00000ef8	    b5b0	push	{r4, r5, r7, lr}
00000efa	f2401510	movw	r5, 0x110
00000efe	    af02	add	r7, sp, #8
00000f00	f2c00500	movt	r5, 0x0
00000f04	    447d	add	r5, pc
00000f06	    7828	ldrb	r0, [r5, #0]
00000f08	    2801	cmp	r0, #1
00000f0a	    d106	bne.n	0xf1a
00000f0c	f24000fc	movw	r0, 0xfc
00000f10	f2c00000	movt	r0, 0x0
00000f14	    4478	add	r0, pc
00000f16	    6804	ldr	r4, [r0, #0]
00000f18	    e00d	b.n	0xf36
00000f1a	    2008	movs	r0, #8
00000f1c	f000e834	blx	0xf88
00000f20	    4604	mov	r4, r0
00000f22	f7ffffb3	bl	__ZN6ObjectC1Ev
00000f26	f24000de	movw	r0, 0xde
00000f2a	    2101	movs	r1, #1
00000f2c	f2c00000	movt	r0, 0x0
00000f30	    7029	strb	r1, [r5, #0]
00000f32	    4478	add	r0, pc
00000f34	    6004	str	r4, [r0, #0]
00000f36	    4620	mov	r0, r4
00000f38	    bdb0	pop	{r4, r5, r7, pc}

Pretty crazy huh? The code first checks if the object is initialized, then loads the address or initializes the memory (and the object) depending on the results.

Many statics in WebKit exist for good reason – they are objects that need to stick around for the lifetime of the process. But some have been added just for convenience, and we strive to replace those with more straightforward use of the memory.

End results

With the tools in hand, we had a pretty good run at improving the code. As we have seen before, WebCore was the worst offender, let’s see how its size evolved over the last year:

The first month is the uncontrolled growth presented in the introduction. After that the growth slows down as various refactoring introduce dents in the chart. Most of the improvements coming out of those refactoring were quickly overturned by the natural growth of WebCore.

Mid-April, things starts to get serious. There are two majors dents. The first one is the result of several improvements to inlining. The second big drop is the result of removing features and cleaning code that came from the Chromium project.

After that we see a downward trend as engineers better balance code improvements with new features.

Keep in mind that many people were busy adding new features the whole time. Some parts of WebCore grew significantly, others shrank to compensate.

WebKit had some less dramatic changes:

The WebKit library is smaller to begin with. The big drop in October is the result of enabling C++11 by default. Some refactoring allowed the removal of some Objective-C code, but overall changes were quite small.

JavaScriptCore had a steady growth over the last year:

JavaScriptCore gained a lot of new optimizations from the optimizing JIT tier which caused its binary to slowly grow.

Conclusion

Binary size is only an indirect indicator of the overall performance, but looking at it sometimes helps to find big issues that were hidden by the translation from the algorithm in C++ to the machine code.

From the last few months, we at least got some guidelines for future code to avoid blowing up the binary size:

  • Try to be explicit in your coding to help the compiler understand what you are doing.
  • Carefully consider whether you should be inlining large blocks of code.
  • Do not use static initializers for code that is run infrequently and/or have trivial initialization.

Any good binary optimization story? Send it my way to @awfulben.

Lately in WebKit: At least bi-weekly is something!

Posted by Andreas Kling on Tuesday, August 6th, 2013 at 2:01 pm

Hello surfers! What’s been up?

The JavaScriptCore team’s four-tier JIT branch has been merged into trunk by Oliver Hunt. While we’re not blazing up to the 4th level just yet, development has moved to trunk and the future looks very interesting!

Cool user-visible hacks

Cool engine hacks

Cool special mention

That should be it for this time around. As always, you can honk at @awesomekling and @awfulben on twitter with questions/suggestions for these blogs.

Last two weeks in WebKit: breaking code with prefixes

Posted by Benjamin Poulain on Monday, July 22nd, 2013 at 2:17 pm

Last week I had the blog ready and I sent it to Andreas for review. He pointed out that the post was a little short, and that it would be better to flesh it out a bit. I wanted to fix that quickly and publish the blog mid-week. Unfortunately, I ended up having a crazy week and I never got the time to update the post. So here we are, a little late, the last (two) weeks in WebKit:

Exposed changes

Engine hacking

Renáta Hodován published the first blog post about Fuzzinator. The post introduces what Fuzzinator is and the building blocks it uses to find problems in browsers.

Another good news regarding the blogs is Brent Fulgham fixed planet.webkit.org to include the WebKit blog. This makes will it easier for everyone to have a single source of information for WebKit changes. Some old blogs have been removed recently, removing about 8.8 million lines from the planet.

Any comments? Praise? Send it our way to @awesomekling and @awfulben.

Last Week in WebKit: Programmers programming programs

Posted by Andreas Kling on Monday, July 8th, 2013 at 6:49 pm

Hello from WebKit land!

It’s been a few weeks since I’ve blogged here, but now I’m back! Let’s try a new format for the blog this week. Something terse but stylish. Perhaps something like…

Simon Fraser avoided a whole bunch of work.

152212 - Avoid doing work in RenderBox::outlineBoundsForRepaint() when the repaintContainer is this
152213 - Avoid calling isSimpleContainerCompositingLayer() an extra time
152214 - Avoid calling into Objective-C every time we get the scrollbar width
152235 - Don't set z-index: 0 on lots of elements with -webkit-overflow-scrolling: touch

A nice collection of optimizations from Simon this week, proving once again that laziness (and profiling) pays off.

Christophe Dumez made numerous improvements to the SVG object model.

152120 - Update SVG interfaces to stop inheriting from SVGURIReference and SVGTests
152156 - Merge SVGLangSpace into SVGElement
152299 - Remove SVGStyledLocatableElement class
152343 - Move SVGTests attributes parsing to SVGGraphicsElement
152350 - Stop inheriting SVGFilterPrimitiveStandardAttributes in SVG
152404 - Have SVGTextContentElement inherit SVGGraphicsElement

These updates bring us six steps closer to the latest WebIDL and SVG specifications, specifically getting rid of multiple inheritance in the SVG object model. The increase in code sharing is a welcome side-effect!

Bem Jones-Bey added basic support for rectangle shape-outside.

152122 - [CSS Shapes] New positioning model: basic support for rectangle shape-outside

Bem writes: “With the latest Editor’s Draft of the CSS Shapes specification, shape-outside no longer affects the positioning of floats. This is the first step in changing the implementation to reflect this specification change.”

Kangil Han has been adding more isFooElement()/toFooElement() helpers and using them across WebKit.

152149 - Adopt is/toHTMLInputElement for code cleanup
152203 - Adopt is/toHTMLImageElement for code cleanup
152208 - Adopt is/toHTMLOptionElement for code cleanup
152211 - Adopt is/toHTMLOptGroupElement for code cleanup
152218 - Adopt toHTMLTextAreaElement for code cleanup
152290 - is/toHTMLStyleElement should use Element* for its argument
152353 - Append overloading function for isFooElement
152387 - Adopt is/toHTMLMapElement for code cleanup
152440 - Adopt is/toHTMLTableElement for code cleanup

I totally love these. isFooElement() looks way nicer than hasTagName(fooTag) and toFooElement() bakes an element type-check assertion into debug builds instead of blindly static_cast‘ing.

Geoff Garen removed some unwanted O(n2) behavior affecting .innerText.

152306 - plainText() is O(N^2)

Anyone removing something O(n2) is automatically awesome and gets a mention in the blog. 3Cheers4Geoff!

Frédéric Wang made some improvements to our MathML support.

152235 - Add Support for mspace element
152140 - Implement parsing of MathML lengths.

It’s so great to see MathML getting some love. Go Frédéric!

Andreas Kling (me!) made the CSS parser split rules with more than 8K selectors into multiple rules.

152453 - REGRESSION(r125294): A style rule with more than 8192 selectors can cause style corruption.

Still here? Time for some shameless self-promotion! I made this change to fix a popular browser extension that was thinking a little farther outside the box than I had expected when I introduced a 12-bit selector index (to save memory.)

All right, that’ll be it for this week, thanks for tuning in! I’m @awesomekling on twitter if you have feedback/questions about this blog.