Surfin' Safari

WebKit Page Cache II – The unload Event

Posted by Brady Eidson on Monday, September 21st, 2009 at 4:29 pm

Previously I touched on what exactly the Page Cache does and outlined some of the improvements we’re working on.

This post is geared towards web developers and is therefore even more technical than the last.

In this article I’d like to talk more about unload event handlers, why they prevent pages from going into the Page Cache, and what can be done to make things better.

Load/Unload Event Handlers

Web developers can make use of the load and unload events to do work at certain points in the lifetime of a web page.

The purpose of the load event is quite straightforward: To perform initial setup of a new page once it has loaded.

The unload event is comparatively mysterious. Whenever the user leaves a page it is “unloaded” and scripts can do some final cleanup.

The mysterious part is that “leaving the page” can mean one of a few things:

  1. The user closes the browser tab or window, resulting in the destruction of the visible page.
  2. The browser navigates from the old page to a new page, resulting in the destruction of the old visible page.

The Page Cache makes this even more interesting by adding a new navigation possibility:

  1. The browser navigates from the old page to a new page, but the old visible page is suspended, hidden, and placed in the Page Cache.

The Status Quo

Unload event handlers are meant to do some final cleanup when the visible page is about to be destroyed. But if the page goes into the Page Cache it becomes suspended, is hidden, and is not immediately torn down. This brings up interesting complications.

If we fire the unload event when going into the Page Cache, then the handler might be destructive and render the page useless when the user returns.

If we fire the unload event every time a page is left, including each time it goes into the Page Cache and when it is eventually destroyed, then the handler might do important work multiple times that it was critical to only do once.

If we don’t fire the unload event when going into the Page Cache, then we face the possibility that the page will be destroyed while it is suspended and hidden, and the unload handler might never be run.

If we don’t fire the unload event when going into the Page Cache but consider firing it whenever the suspended page is eventually destroyed, then we’re considering the possibility of doing something that’s never been done before: Executing scripts that belong to an invisible web page that has had its “pause” button pressed.

There’s all sorts of obstacles in making this work well including technological hurdles, security concerns, and user-experience considerations.

Since there is no clear solution for handling such pages the major browsers vendors have all come to the same conclusion: Don’t cache these pages.

How You Can Help

Web developers have a few things they can do to help their pages be cacheable.

One is to only install the unload event handler if the code is relevant to the current browser. For example, we’ve seen unload handlers similar to the following:

    function unloadHandler()
    {
        if (_scriptSettings.browser.isIE) {
            // Run some unload code for Internet Explorer
            ...
        }
    }

In all browsers other than Internet Explorer this code does nothing, but its mere existence potentially slows down their user experience. This developer should’ve done the browser check *before* installing the unload handler.

Another way developers can improve things is to only install the unload event handler when the page has a need to listen for it, then remove it once that reason has passed.

For example the user might be working on a draft of a document so the developer installs an unload handler to make sure the draft gets saved before the page is left. But they also start a timer to automatically save it every minute or so. If the timer fires, the document draft is saved, and the user doesn’t make any further changes, the unload handler should be removed.

Particularly savvy developers might consider a third option.

A Replacement For Unload

Some time ago Mozilla approached this problem differently by inventing a replacement for load/unload events.

The load and unload events are meant to be fired exactly once, and this is the underlying cause of the problem. The pageshow/pagehide events – which we’ve implemented in WebKit as of revision 47824 – address this.

Despite their name the pageshow/pagehide events don’t have anything to do with whether or not the page is actually visible on the screen. They won’t fire when you minimize the window or switch tabs, for example.

What they do is augment load/unload to work in more situations involving navigation. Consider this example of how load/unload event handlers might be used:

    <html>
    <head>
    <script>

    function pageLoaded()
    {
        alert("load event handler called.");
    }

    function pageUnloaded()
    {
        alert("unload event handler called.");
    }

    window.addEventListener("load", pageLoaded, false);
    window.addEventListener("unload", pageUnloaded, false);

    </script>
    <body>
    <a href="http://www.webkit.org/">Click for WebKit</a>
    </body>
    </html>

Click here to view this example in a new window, in case you can’t guess what it does.

Try clicking the link to leave the page then press the back button. Pretty straightforward.

The pageshow/pagehide fire when load/unload do, but also have one more trick up their sleeve.

Instead of firing only at the single discrete moment when a page is “loaded” the pageshow event is also fired when pages are restored from the Page Cache.

Similarly the pagehide event fires when the unload event fires but also when a page is suspended into the Page Cache.

By including an additional property on the event called “persisted” the events tell the page whether they represent the load/unload events or saving/restoring from the Page Cache.

Here’s the same example using pageshow/pagehide:

    <html>
    <head>
    <script>

    function pageShown(evt)
    {
        if (evt.persisted)
            alert("pageshow event handler called.  The page was just restored from the Page Cache.");
        else
            alert("pageshow event handler called for the initial load.  This is the same as the load event.");
    }

    function pageHidden(evt)
    {
        if (evt.persisted)
            alert("pagehide event handler called.  The page was suspended and placed into the Page Cache.");
        else
            alert("pagehide event handler called for page destruction.  This is the same as the unload event.");
    }

    window.addEventListener("pageshow", pageShown, false);
    window.addEventListener("pagehide", pageHidden, false);

    </script>
    <body>
    <a href="http://www.webkit.org/">Click for WebKit</a>
    </body>
    </html>

Click here to view this example in a new window, but make sure you’re using a recent WebKit nightly.

Remember to try clicking the link to leave the page then press the back button.

Pretty cool, right?

What These New Events Accomplish

The pagehide event is important for two reasons:

  1. It enables web developers to distinguish between a page being suspended and one that is being destroyed.
  2. When used instead of the unload event, it enables browsers to use their page cache.

It’s also straightforward to change existing code to use pagehide instead of unload. Here is an example of testing for the onpageshow attribute to choose pageshow/pagehide when supported, falling back to load/unload when they’re not:

    <html>
    <head>
    <script>

    function myLoadHandler(evt)
    {
        if (evt.persisted) {
            // This is actually a pageshow event and the page is coming out of the Page Cache.
            // Make sure to not perform the "one-time work" that we'd normally do in the onload handler.
            ...

            return;
        }

        // This is either a load event for older browsers,
        // or a pageshow event for the initial load in supported browsers.
        // It's safe to do everything my old load event handler did here.
        ...
    }

    function myUnloadHandler(evt)
    {
        if (evt.persisted) {
            // This is actually a pagehide event and the page is going into the Page Cache.
            // Make sure that we don't do any destructive work, or work that shouldn't be duplicated.
            ...

            return;
        }

        // This is either an unload event for older browsers,
        // or a pagehide event for page tear-down in supported browsers.
        // It's safe to do everything my old unload event handler did here.
        ...
    }

    if ("onpagehide" in window) {
        window.addEventListener("pageshow", myLoadHandler, false);
        window.addEventListener("pagehide", myUnloadHandler, false);
    } else {
        window.addEventListener("load", myLoadHandler, false);
        window.addEventListener("unload", myUnloadHandler, false);
    }

    </script>
    <body>
    Your content goes here!
    </body>
    </html>

Piece of cake!

How You Can Help: Revisited

To reiterate, we’ve now identified three great ways web developers can help the Page Cache work better:

  1. Only install the event handler if the code is relevant to the current browser.
  2. Only install the event handler once your page actually needs it.
  3. If supported by the browser, use pagehide instead.

Web developers that willfully ignore any or all these options are primarily accomplishing one thing:
Forcing their users into “slow navigation mode.”

I say this both as a browser engineer and a browser user: That stinks!

The Plot Thickens

But now that we’ve covered what savvy and polite web developers can do to help in the future, we need to further scrutinize the current state of the web.

Browsers treat the unload handler as sacred because it is designed to do “important work.” Unfortunately many popular sites have unload event handlers that decidedly do not “do important work.” I commonly see handlers that:

  • Always update some cookie for tracking, even though it’s already been updated.
  • Always send an XHR update of draft data to a server, even though it’s already been sent.
  • Do nothing that could possible persist to any future browsing session.
  • That are empty. They literally do nothing.

Since these misbehaved pages are very common and will render improvements to WebKit’s Page Cache ineffective a few of us started to ask the question:

What *would* actually happen if we simply started admitting these pages to the Page Cache without running the unload event handler first?

What would break?

Can we detect any patterns to determine whether an unload event handler is “important” or not?

Our Experiment

You never know for sure until you try.

Starting in revision 48388 we’ve allowed pages with unload handlers into the Page Cache. If a user closes the window while the page is visible, the unload event will fire as usual. But the unload event will not be fired as normal when the user navigates away from the page. If the user closes the window while the page is suspended and in the Page Cache, the unload event handler will never be run.

What this means for users is that their navigation experience could be noticeably smoother and quicker in the common case. What this means for developers is that we’re consciously deciding not to run some of their code and their web application might break.

For users and developers alike – Please leave your feedback, observations, or suggestions in the bug tracking this experiment.

And remember this is just an experiment. No one is planning to ship this drastic change in behavior in a production product. But the Page Cache is such an important part of browser performance that we’re willing to push the envelope a little to improve it a lot.

We want to learn what breaks. We want to know if we can heuristically determine if an unload handler is truly critical or not. We want to know if we can detect certain patterns in some types of unload handlers and treat them differently. And, perhaps most importantly, we want to evangelize.

At least one popular Javascript library has already adopted some of the advice we’ve given to help improve the landscape on the web. If just a few more developers for popular sites or libraries take notice of this experiment and change their code then the web will be a much friendlier place for all of us.

6 Responses to “WebKit Page Cache II – The unload Event”

  1. doekman Says:

    I would call it the “Page Flip Cache”…

  2. Jake Archibald Says:

    BBC Glow has avoided adding an unload listener since 1.6 to avoid breaking this cache.

    As for my thoughts on the issue, here’s a partial reposting from https://bugs.webkit.org/show_bug.cgi?id=29021

    You could save the page to the page cache, then fire the unload event and navigate away. If back is clicked, the page will be returned in the state it was BEFORE unload was fired. This gives you the best of both worlds. (this was suggested in Chrome’s bug tracker)

    Also, (and webkit may do this already) you need to make sure the element that the mouse is currently over has its mouseout event fired, else you might get sticky hover effects.

    Jake.

  3. Yuan Song Says:

    To fire the unload event whenever the suspended page is eventually destroyed could be a good idea. I don’t think executing scripts in an invisible page is a problem, since the ability of executing scripts in the background can bring other potential benefits in the future.

  4. randallfarmer Says:

    As a user, I like more aggressive page caching. One approach might be to look for a small class of onunload handlers that are generally safe to ignore, like Prototype’s and empty handlers.

    As a Web developer, I’ve sometimes specifically wanted to turn off the back/forward cache. For example, I sometimes have an onsubmit handler — not an onunload! — that leaves the page in an unusable state, e.g., by disabling the submit button. (That means my form is broken if the user starts to submit then immediately hits Esc.)

    When the user hits Back, I’d rather they see the page as it was at load time (but with form data filled in), not as it was after submit. In an ideal world, I’d want them to see the page as it was just before they hit Submit, but that’s not feasible to do.

    Right now, I avoid the page cache by adding an empty onunload handler to disable the page cache, but that will break. As alternatives, I could add a cache-control header, or call window.location.reload() in onpageshow if evt.persisted. Both approaches have the downside of forcing the page’s HTML to be redownloaded as well as skipping the page cache. What do you all think is the best thing to do if I want to opt out of the page cache but am OK with the browser using the cached HTML?

    I know there are more elegant options than opting out of the page cache, like just not messing with the form in my onsubmit handler or “fixing” the form in onpageshow. Still, it seems interesting to ponder what to do if you want to opt out of the page cache; surely someone out there will.

  5. Brady Eidson Says:

    @randalfarmer
    It would stink if you had to take some action to disable the page cache full-time when you only really want it disabled in this one edge case. As mentioned in the post, developers should only install unload handlers when they actually need them to be installed. Do you install an unload handler full-time, or do you install it only in onsubmit?

    If there were some other programatic way to disable the PageCache, that would still be desirable to the unload handler. Semantically, the “unload handler” and “I want to disable the PageCache!” are two completely different things.

    Opera invented “history.navigationMode” for this reason. We might consider implementing that (https://webkit.org/b/29739) but I would shudder if developers just started flipping the switch at page load and left it on the whole time!

    Of course, maybe that wouldn’t be much different than the situation today…

  6. randallfarmer Says:

    @Brady:

    In my defense, the pages I’m thinking of are essentially one big form, and if you leave the page, you’re usually either submitting or never coming back (e.g., closing the window). The onunload is there full-time for these pages right now (I might fix!), but navigating away without submitting and then coming back is rare enough that slowing down that case is bearable, if sloppy.

    Thanks much for considering history.navigationMode and for wading through my longwinded comment. I think you distilled it right: some folks could use an “explicit, semantically precise” way to tell the browser they’ve destroyed the DOM.

    You’re also right that we’ll inevitably sometimes use it in ways that make the page slower than is strictly necessary — sorry. Browser developers put up with a lot of nonsense from Web developers — we owe you one. Or rather, we owe you many many. :)