Apple style span is gone

This week, I committed WebKit changes r92823 and r93001. They’re perhaps the most important changesets I’ve ever committed to the WebKit codebase because these changesets made WebKit no longer produce wrapping style spans on copy and paste and class="Apple-style-span" anymore. In fact, these are two changes I’ve always wanted to make ever since I started working on the WebKit’s editing component in the summer of 2009.

Introduction to Apple style spans

Apple-style-span is a HTML span element with the class “Apple-style-span”.  It is created whenever WebKit applies style on text by CSS.  For example, document.execCommand('HiliteColor', false, 'blue'); may produce:

<span style="background-color: #0000ff;">hello world</span>

if “hello world” was selected.  The initial intent of this was so that WebKit can avoid removing or modifying elements created by the authors and meant to stay by differentiating spans added by WebKit itself and those created by the authors.

We also use an Apple-style-span to wrap the copied contents to preserve the style of the copied content.  If you copy “hello world” on this page, for example, WebKit puts the following markup into the pasteboard on Mac:

<span style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span style="color: rgb(51, 51, 51); font-family: 'Lucida Grande', Verdana, Arial; font-size: 12px; line-height: 18px; ">hello world</span></span>

Problems with Apple style spans

However, avoiding the modification of spans not created by WebKit turned out to be ineffective at best because the editing component had to add and remove so many other elements and WebKit also had to work with elements generated by other browsers and CMS editors.  Also, avoiding the removal of spans without class="Apple-style-span" caused the markup to get progressively verbose over time because sometimes we had to cancel the style added by those elements e.g. (<b><span style="font-weight: normal;">unbolded text</span></b>). This was particularly apparent on mail clients that used WebKit as the editor such as Apple’s Mail or Gmail (if the user happens to use a WebKit-based browser).  In some case, an e-email consisting of 3 lines of text consumed 3MB in HTML because of nested spans created by WebKit and other mail clients.

An Apple-style-span that wraps the copied contents can get far worse if the copied contents include block nodes.  Consider the following markup which annotates “This is title” to be a level-1 header:

<h1>This is title</h1>

When “This is title” is copied, WebKit puts the following markup in the pasteboard:

<span style="color: #000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><h1>This is title</h1></span>

Notice that the h1 is wrapped in a span!  In addition, WebKit used to wrap contents in two spans to retain the document’s style separately prior to r86983.  Here, font-family: sans-serif was set on the body element and therefore stored in a separate span below:

<span style="border-collapse: separate; color: #000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; ">l;<span style="font-family: sans-serif; "><h1>This is title</h1></span></span>

If we paste the above example into right where the br element is in the following markup:

<h1><br></h1>

WebKit produces this:

<h1><span style="font-weight: normal; font-size: medium; "><h1>This is title</h1></span></h1>

Here, the span between two nested h1 is canceling the style of the outer h1 because the span is preserving the style of the container from which contents were copied; i.e. immediately outside of <h1>This is title</h1>.  This is horrible because neither the spans nor the h1 add any semantic or visual information to the page, and it is invalid under any one of HTML4.01, XHTML1.0, and HTML5.

A Two-Year Project to Remove Apple style spans

When I started working as an intern at Google in the summer of 2009, this problem caught my attention and I decided to investigate the ways to fix it.  However, ApplyStyleCommand which implements inline style application commands such as execCommand('bold') and execCommand('italitc'), and markup.cpp and ReplaceSelectionCommand which are  responsible for copy and paste respectively all heavily relied on the classname “Apple-style-span“.  In particular, ReplaceSelectionCommand detected and treated the wrapping spans generated by markup.cpp on copy very differently from other elements. I soon realized that removing Apple style spans require the following 3 steps:

  1. Improve ApplyStyleCommand not to depend on Apple style spans
  2. Improve copy and paste code not to use Apple style spans
  3. Remove Apple style spans

Since I was an intern at the time and I had only a couple of weeks left, I decided to focus on the step 1. So I fixed various bugs in ApplyStyleCommand and refactored the code.

When I came back to Google as a full-time employee, a year later, I continued to fix and refactor this class.  As a result, I have devised a style application algorithm which is now partially adopted by Aryeh’s editing spec.  It’s a three-phase algorithm described as below:

  1. Remove conflicting styles (e.g. if we’re italicizing text, then remove all instances of font-style properties with values other than italic).
  2. For each inline runs, remove all styles that match the style being applied (e.g. if we’re italicizing text, then we remove all font-style properties, em, and i).
  3. Wrap each inline runs with appropriate element or a span with style appropriate attribute; or add appropriate properties to an existing element that wraps each run.

I’m quite proud of this algorithm myself since it produces very clean markup at the end (current WebKit implementation has a bug in pushing down styles).

After I had made some progress in refactoring ApplyStyleCommand, I started cleaning up DOM serialization code in markup.cpp as well which is responsible for generating two wrapping spans.  But there were a couple of obstacles I had to deal with:

  1. There are two conflicting createMarkup functions one used for copy and another one used for innerHTML, and they shared code by means of calling functions instead of a class hierarchy.  This made it hard to modify the interface of each function and do the necessary refactoring to avoid adding wrapping style spans.
  2. createMarkup used for copy was a 250-line long function that serialized range, determined the highest ancestor to serialize, and added wrapping spans.  It made it extremely hard to see which variable or condition depends on what.
  3. Various functions in markup.cpp manipulated CSSMutableStyleDeclaration but the intentions of them and implications on paste code were not obvious.

To address points 1 and 2, I decided to do a massive refactoring of markup.cpp.  Since darin had already introduced MarkupAccumulator (Darin always has the best idea for refactoring!) for the innerHTML version of createMarkup, I decided to introduce StylizedMarkupAccumulator that inherits from MarkupAccumulator for the copy version of createMarkup.  After the refactoring, markup.cpp started looking really clean and nice (Note that abarth extracted MarkupAccumulator.cpp shortly before I finished all the refactoring).  In fact, StylizedMarkupAccumulator provided a perfect abstraction for getting rid of wrapping spans, and various refactoring made clear that this is feasible.

Now I had to address point 3.  For me to get rid of “Apple-style-span”, I had to fully understand how WebKit preserves styles and how various parts of the editing component manipulate and interpret the style information.  Meanwhile, I had realized the fact that various parts of editing component directly manipulate CSSMutableStyleDeclaration is problematic because of tricky properties like background-color and text-decoration from my prior experience with ApplyStyleCommand.  Even seemingly simple font-weight is hard to deal with because it can take numeric values such as 700 and 400 or keywords such as bold and normal.  So I introduced a new layer of abstraction, so called EditingStyle, between the editing component and the CSS component to centralizes all style manipulation code in one place. I’ve been extremely happy about this on-going refactoring effort as it has been reducing the code duplication and caught many hidden bugs.

Now, it was about time.  I had addressed all 3 points that blocked me from getting rid of wrapping style spans on copy.  So I started my epic attempt to get rid of wrapping style spans in May, 2011. This was not an easy job because we use copy and paste code as a part of some other editing commands, and in fact, I spent almost an entire week just to create a prototype.  Since I normally submit 5 or more patches a week, spending an entire week on one patch that can’t even be submitted for a review was very unusual. But it paid off at the end.  I was able to come up with a patch that gets rid of wrapping spans and does not regress a single test. Now, recall my list of things to do in order to remove Apple style spans:

  1. Improve ApplyStyleCommand not to depend on Apple style spans
  2. Improve copy and paste code not to use Apple style spans
  3. Remove Apple style spans

Yes, I was only left with step 3 when I landed the patch for 34564 this Wednesday. So I went ahead and finished off step 3 of this two-year project:

  • Bug 66091 – Share code between isStyleSpanOrSpanWithOnlyStyleAttribute, isUnstyledStyleSpan, isSpanWithoutAttributesOrUnstyleStyleSpan and replaceWithSpanOrRemoveIfWithoutAttributes
  • Bug 12248 – Apple-style-span class seems unnecessary

And there you go.  WebKit revision 93001 that no longer produces Apple style spans.  My (and perhaps your) dream has come true.

Acknowledgements

Of course, all of this could not happen without support from the following people and the entire WebKit community, whom I sincerely thank:

  • Darin Adler
  • Enrica Casucci
  • Eric Seidel
  • Julie Parent
  • Justin Garcia
  • Levi Weintraub
  • Ojan Vafai
  • Tony Chang