WebKit Page Cache II – The unload Event

Sep 21, 2009

by Brady Eidson

Previously I touched on what exactly the Page Cache does and outlined some of the improvements we’re working on.

This post is geared towards web developers and is therefore even more technical than the last.

In this article I’d like to talk more about unload event handlers, why they prevent pages from going into the Page Cache, and what can be done to make things better.

Load/Unload Event Handlers

Web developers can make use of the load and unload events to do work at certain points in the lifetime of a web page.

The purpose of the load event is quite straightforward: To perform initial setup of a new page once it has loaded.

The unload event is comparatively mysterious. Whenever the user leaves a page it is “unloaded” and scripts can do some final cleanup.

The mysterious part is that “leaving the page” can mean one of a few things:

The user closes the browser tab or window, resulting in the destruction of the visible page.
The browser navigates from the old page to a new page, resulting in the destruction of the old visible page.

The Page Cache makes this even more interesting by adding a new navigation possibility:

The browser navigates from the old page to a new page, but the old visible page is suspended, hidden, and placed in the Page Cache.

The Status Quo

Unload event handlers are meant to do some final cleanup when the visible page is about to be destroyed. But if the page goes into the Page Cache it becomes suspended, is hidden, and is not immediately torn down. This brings up interesting complications.

If we fire the unload event when going into the Page Cache, then the handler might be destructive and render the page useless when the user returns.

If we fire the unload event every time a page is left, including each time it goes into the Page Cache and when it is eventually destroyed, then the handler might do important work multiple times that it was critical to only do once.

If we don’t fire the unload event when going into the Page Cache, then we face the possibility that the page will be destroyed while it is suspended and hidden, and the unload handler might never be run.

If we don’t fire the unload event when going into the Page Cache but consider firing it whenever the suspended page is eventually destroyed, then we’re considering the possibility of doing something that’s never been done before: Executing scripts that belong to an invisible web page that has had its “pause” button pressed.

There’s all sorts of obstacles in making this work well including technological hurdles, security concerns, and user-experience considerations.

Since there is no clear solution for handling such pages the major browsers vendors have all come to the same conclusion: Don’t cache these pages.

How You Can Help

Web developers have a few things they can do to help their pages be cacheable.

One is to only install the unload event handler if the code is relevant to the current browser. For example, we’ve seen unload handlers similar to the following:

    function unloadHandler()
    {
        if (_scriptSettings.browser.isIE) {
            // Run some unload code for Internet Explorer
            ...
        }
    }

In all browsers other than Internet Explorer this code does nothing, but its mere existence potentially slows down their user experience. This developer should’ve done the browser check before installing the unload handler.

Another way developers can improve things is to only install the unload event handler when the page has a need to listen for it, then remove it once that reason has passed.

For example the user might be working on a draft of a document so the developer installs an unload handler to make sure the draft gets saved before the page is left. But they also start a timer to automatically save it every minute or so. If the timer fires, the document draft is saved, and the user doesn’t make any further changes, the unload handler should be removed.

Particularly savvy developers might consider a third option.

A Replacement For Unload

Some time ago Mozilla approached this problem differently by inventing a replacement for load/unload events.

The load and unload events are meant to be fired exactly once, and this is the underlying cause of the problem. The pageshow/pagehide events – which we’ve implemented in WebKit as of revision 47824 – address this.

Despite their name the pageshow/pagehide events don’t have anything to do with whether or not the page is actually visible on the screen. They won’t fire when you minimize the window or switch tabs, for example.

What they do is augment load/unload to work in more situations involving navigation. Consider this example of how load/unload event handlers might be used:

    <html>
    <head>
    <script>

    function pageLoaded()
    {
        alert("load event handler called.");
    }

    function pageUnloaded()
    {
        alert("unload event handler called.");
    }

    window.addEventListener("load", pageLoaded, false);
    window.addEventListener("unload", pageUnloaded, false);

    </script>
    <body>
    <a href="http://www.webkit.org/">Click for WebKit</a>
    </body>
    </html>

Click here to view this example in a new window, in case you can’t guess what it does.

Try clicking the link to leave the page then press the back button. Pretty straightforward.

The pageshow/pagehide fire when load/unload do, but also have one more trick up their sleeve.

Instead of firing only at the single discrete moment when a page is “loaded” the pageshow event is also fired when pages are restored from the Page Cache.

Similarly the pagehide event fires when the unload event fires but also when a page is suspended into the Page Cache.

By including an additional property on the event called “persisted” the events tell the page whether they represent the load/unload events or saving/restoring from the Page Cache.

Here’s the same example using pageshow/pagehide:

    <html>
    <head>
    <script>

    function pageShown(evt)
    {
        if (evt.persisted)
            alert("pageshow event handler called.  The page was just restored from the Page Cache.");
        else
            alert("pageshow event handler called for the initial load.  This is the same as the load event.");
    }

    function pageHidden(evt)
    {
        if (evt.persisted)
            alert("pagehide event handler called.  The page was suspended and placed into the Page Cache.");
        else
            alert("pagehide event handler called for page destruction.  This is the same as the unload event.");
    }

    window.addEventListener("pageshow", pageShown, false);
    window.addEventListener("pagehide", pageHidden, false);

    </script>
    <body>
    <a href="http://www.webkit.org/">Click for WebKit</a>
    </body>
    </html>

Click here to view this example in a new window, but make sure you’re using a recent WebKit nightly.

Remember to try clicking the link to leave the page then press the back button.

Pretty cool, right?

What These New Events Accomplish

The pagehide event is important for two reasons:

It enables web developers to distinguish between a page being suspended and one that is being destroyed.
When used instead of the unload event, it enables browsers to use their page cache.

It’s also straightforward to change existing code to use pagehide instead of unload. Here is an example of testing for the onpageshow attribute to choose pageshow/pagehide when supported, falling back to load/unload when they’re not:

    <html>
    <head>
    <script>

    function myLoadHandler(evt)
    {
        if (evt.persisted) {
            // This is actually a pageshow event and the page is coming out of the Page Cache.
            // Make sure to not perform the "one-time work" that we'd normally do in the onload handler.
            ...
            
            return;
        }
    
        // This is either a load event for older browsers, 
        // or a pageshow event for the initial load in supported browsers.
        // It's safe to do everything my old load event handler did here.
        ...
    }

    function myUnloadHandler(evt)
    {
        if (evt.persisted) {
            // This is actually a pagehide event and the page is going into the Page Cache.
            // Make sure that we don't do any destructive work, or work that shouldn't be duplicated.
            ...
            
            return;
        }
    
        // This is either an unload event for older browsers, 
        // or a pagehide event for page tear-down in supported browsers.
        // It's safe to do everything my old unload event handler did here.
        ...
    }

    if ("onpagehide" in window) {
        window.addEventListener("pageshow", myLoadHandler, false);
        window.addEventListener("pagehide", myUnloadHandler, false);
    } else {
        window.addEventListener("load", myLoadHandler, false);
        window.addEventListener("unload", myUnloadHandler, false);
    }

    </script>
    <body>
    Your content goes here!
    </body>
    </html>

Piece of cake!

How You Can Help: Revisited

To reiterate, we’ve now identified three great ways web developers can help the Page Cache work better:

Only install the event handler if the code is relevant to the current browser.
Only install the event handler once your page actually needs it.
If supported by the browser, use pagehide instead.

Web developers that willfully ignore any or all these options are primarily accomplishing one thing:
Forcing their users into “slow navigation mode.”

I say this both as a browser engineer and a browser user: That stinks!

The Plot Thickens

But now that we’ve covered what savvy and polite web developers can do to help in the future, we need to further scrutinize the current state of the web.

Browsers treat the unload handler as sacred because it is designed to do “important work.” Unfortunately many popular sites have unload event handlers that decidedly do not “do important work.” I commonly see handlers that:

Always update some cookie for tracking, even though it’s already been updated.
Always send an XHR update of draft data to a server, even though it’s already been sent.
Do nothing that could possible persist to any future browsing session.
That are empty. They literally do nothing.

Since these misbehaved pages are very common and will render improvements to WebKit’s Page Cache ineffective a few of us started to ask the question:

What would actually happen if we simply started admitting these pages to the Page Cache without running the unload event handler first?

What would break?

Can we detect any patterns to determine whether an unload event handler is “important” or not?

Our Experiment

You never know for sure until you try.

Starting in revision 48388 we’ve allowed pages with unload handlers into the Page Cache. If a user closes the window while the page is visible, the unload event will fire as usual. But the unload event will not be fired as normal when the user navigates away from the page. If the user closes the window while the page is suspended and in the Page Cache, the unload event handler will never be run.

What this means for users is that their navigation experience could be noticeably smoother and quicker in the common case. What this means for developers is that we’re consciously deciding not to run some of their code and their web application might break.

For users and developers alike – Please leave your feedback, observations, or suggestions in the bug tracking this experiment.

And remember this is just an experiment. No one is planning to ship this drastic change in behavior in a production product. But the Page Cache is such an important part of browser performance that we’re willing to push the envelope a little to improve it a lot.

We want to learn what breaks. We want to know if we can heuristically determine if an unload handler is truly critical or not. We want to know if we can detect certain patterns in some types of unload handlers and treat them differently. And, perhaps most importantly, we want to evangelize.

At least one popular Javascript library has already adopted some of the advice we’ve given to help improve the landscape on the web. If just a few more developers for popular sites or libraries take notice of this experiment and change their code then the web will be a much friendlier place for all of us.