Apple style span is gone
Posted by Ryosuke Niwa on Monday, August 15th, 2011 at 10:38 pmThis week, I committed WebKit changes r92823 and r93001. They’re perhaps the most important changesets I’ve ever committed to the WebKit codebase because these changesets made WebKit no longer produce wrapping style spans on copy and paste and class="Apple-style-span" anymore. In fact, these are two changes I’ve always wanted to make ever since I started working on the WebKit’s editing component in the summer of 2009.
Introduction to Apple style spans
Apple-style-span is a HTML span element with the class “Apple-style-span”. It is created whenever WebKit applies style on text by CSS. For example, document.execCommand('HiliteColor', false, 'blue'); may produce:
<span style="background-color: #0000ff;">hello world</span>
if “hello world” was selected. The initial intent of this was so that WebKit can avoid removing or modifying elements created by the authors and meant to stay by differentiating spans added by WebKit itself and those created by the authors.
We also use an Apple-style-span to wrap the copied contents to preserve the style of the copied content. If you copy “hello world” on this page, for example, WebKit puts the following markup into the pasteboard on Mac:
<span style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span style="color: rgb(51, 51, 51); font-family: 'Lucida Grande', Verdana, Arial; font-size: 12px; line-height: 18px; ">hello world</span></span>
Problems with Apple style spans
However, avoiding the modification of spans not created by WebKit turned out to be ineffective at best because the editing component had to add and remove so many other elements and WebKit also had to work with elements generated by other browsers and CMS editors. Also, avoiding the removal of spans without class="Apple-style-span" caused the markup to get progressively verbose over time because sometimes we had to cancel the style added by those elements e.g. (<b><span style="font-weight: normal;">unbolded text</span></b>). This was particularly apparent on mail clients that used WebKit as the editor such as Apple’s Mail or Gmail (if the user happens to use a WebKit-based browser). In some case, an e-email consisting of 3 lines of text consumed 3MB in HTML because of nested spans created by WebKit and other mail clients.
An Apple-style-span that wraps the copied contents can get far worse if the copied contents include block nodes. Consider the following markup which annotates “This is title” to be a level-1 header:
<h1>This is title</h1>
When “This is title” is copied, WebKit puts the following markup in the pasteboard:
<span style="color: #000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><h1>This is title</h1></span>
Notice that the h1 is wrapped in a span! In addition, WebKit used to wrap contents in two spans to retain the document’s style separately prior to r86983. Here, font-family: sans-serif was set on the body element and therefore stored in a separate span below:
<span style="border-collapse: separate; color: #000000; font-family: Times; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; ">l;<span style="font-family: sans-serif; "><h1>This is title</h1></span></span>
If we paste the above example into right where the br element is in the following markup:
<h1><br></h1>
WebKit produces this:
<h1><span style="font-weight: normal; font-size: medium; "><h1>This is title</h1></span></h1>
Here, the span between two nested h1 is canceling the style of the outer h1 because the span is preserving the style of the container from which contents were copied; i.e. immediately outside of <h1>This is title</h1>. This is horrible because neither the spans nor the h1 add any semantic or visual information to the page, and it is invalid under any one of HTML4.01, XHTML1.0, and HTML5.
A Two-Year Project to Remove Apple style spans
When I started working as an intern at Google in the summer of 2009, this problem caught my attention and I decided to investigate the ways to fix it. However, ApplyStyleCommand which implements inline style application commands such as execCommand('bold') and execCommand('italitc'), and markup.cpp and ReplaceSelectionCommand which are responsible for copy and paste respectively all heavily relied on the classname “Apple-style-span“. In particular, ReplaceSelectionCommand detected and treated the wrapping spans generated by markup.cpp on copy very differently from other elements. I soon realized that removing Apple style spans require the following 3 steps:
- Improve ApplyStyleCommand not to depend on Apple style spans
- Improve copy and paste code not to use Apple style spans
- Remove Apple style spans
Since I was an intern at the time and I had only a couple of weeks left, I decided to focus on the step 1. So I fixed various bugs in ApplyStyleCommand and refactored the code.
When I came back to Google as a full-time employee, a year later, I continued to fix and refactor this class. As a result, I have devised a style application algorithm which is now partially adopted by Aryeh’s editing spec. It’s a three-phase algorithm described as below:
- Remove conflicting styles (e.g. if we’re italicizing text, then remove all instances of font-style properties with values other than italic).
- For each inline runs, remove all styles that match the style being applied (e.g. if we’re italicizing text, then we remove all font-style properties, em, and i).
- Wrap each inline runs with appropriate element or a span with style appropriate attribute; or add appropriate properties to an existing element that wraps each run.
I’m quite proud of this algorithm myself since it produces very clean markup at the end (current WebKit implementation has a bug in pushing down styles).
After I had made some progress in refactoring ApplyStyleCommand, I started cleaning up DOM serialization code in markup.cpp as well which is responsible for generating two wrapping spans. But there were a couple of obstacles I had to deal with:
- There are two conflicting createMarkup functions one used for copy and another one used for innerHTML, and they shared code by means of calling functions instead of a class hierarchy. This made it hard to modify the interface of each function and do the necessary refactoring to avoid adding wrapping style spans.
- createMarkup used for copy was a 250-line long function that serialized range, determined the highest ancestor to serialize, and added wrapping spans. It made it extremely hard to see which variable or condition depends on what.
- Various functions in markup.cpp manipulated CSSMutableStyleDeclaration but the intentions of them and implications on paste code were not obvious.
To address points 1 and 2, I decided to do a massive refactoring of markup.cpp. Since darin had already introduced MarkupAccumulator (Darin always has the best idea for refactoring!) for the innerHTML version of createMarkup, I decided to introduce StylizedMarkupAccumulator that inherits from MarkupAccumulator for the copy version of createMarkup. After the refactoring, markup.cpp started looking really clean and nice (Note that abarth extracted MarkupAccumulator.cpp shortly before I finished all the refactoring). In fact, StylizedMarkupAccumulator provided a perfect abstraction for getting rid of wrapping spans, and various refactoring made clear that this is feasible.
Now I had to address point 3. For me to get rid of “Apple-style-span”, I had to fully understand how WebKit preserves styles and how various parts of the editing component manipulate and interpret the style information. Meanwhile, I had realized the fact that various parts of editing component directly manipulate CSSMutableStyleDeclaration is problematic because of tricky properties like background-color and text-decoration from my prior experience with ApplyStyleCommand. Even seemingly simple font-weight is hard to deal with because it can take numeric values such as 700 and 400 or keywords such as bold and normal. So I introduced a new layer of abstraction, so called EditingStyle, between the editing component and the CSS component to centralizes all style manipulation code in one place. I’ve been extremely happy about this on-going refactoring effort as it has been reducing the code duplication and caught many hidden bugs.
Now, it was about time. I had addressed all 3 points that blocked me from getting rid of wrapping style spans on copy. So I started my epic attempt to get rid of wrapping style spans in May, 2011. This was not an easy job because we use copy and paste code as a part of some other editing commands, and in fact, I spent almost an entire week just to create a prototype. Since I normally submit 5 or more patches a week, spending an entire week on one patch that can’t even be submitted for a review was very unusual. But it paid off at the end. I was able to come up with a patch that gets rid of wrapping spans and does not regress a single test. Now, recall my list of things to do in order to remove Apple style spans:
Improve ApplyStyleCommand not to depend on Apple style spansImprove copy and paste code not to use Apple style spans- Remove Apple style spans
Yes, I was only left with step 3 when I landed the patch for 34564 this Wednesday. So I went ahead and finished off step 3 of this two-year project:
- Bug 66091 – Share code between isStyleSpanOrSpanWithOnlyStyleAttribute, isUnstyledStyleSpan, isSpanWithoutAttributesOrUnstyleStyleSpan and replaceWithSpanOrRemoveIfWithoutAttributes
- Bug 12248 – Apple-style-span class seems unnecessary
And there you go. WebKit revision 93001 that no longer produces Apple style spans. My (and perhaps your) dream has come true.
Acknowledgements
Of course, all of this could not happen without support from the following people and the entire WebKit community, whom I sincerely thank:
- Darin Adler
- Enrica Casucci
- Eric Seidel
- Julie Parent
- Justin Garcia
- Levi Weintraub
- Ojan Vafai
- Tony Chang
August 15th, 2011 at 11:49 pm
This is fantastic. Implementing an editor has been rife with pitfalls and incompatibilities since the feature started emerging in various browsers; anything that can make this landscape more sane is bound to be a boon for better-quality editors.
August 16th, 2011 at 1:02 am
Thanks for the great write-up!
August 17th, 2011 at 11:39 am
Sweeet. I got a tiny taste of what working on a rich-text editor must be like from perusing the (much more barebones) QRichText source way back when. Taking a selection that crosses random tag boundaries and copying/pasting as valid HTML that can work in another document, styling a selection, etc. sounds a lot more straightforward than it is. (Also, what happens when someone presses the down-arrow key in an editor is like two lines of code, right? Haha, I kid.)
My coworkers have to deal with HTML from WYSIWYG editors with lots of redundant styling code, which I more or less accepted as needed to make cutting-and-pasting of styles work how folks expected. One said he didn’t know quite what Apple-style-span was there for, but “good riddance” when I told him about its impending demise as this patch gets out to users. So, good work.
It’s also great that there’s some sort of stab at drafting an editing standard underway. Maybe less hacky and browser-specific editing JS libraries are in our collective future.
In conclusion, the Internet owes you a beer, and I hope more cool stuff (and not just editing-related cool stuff) is coming.
August 19th, 2011 at 1:21 pm
Good work.
I worked on a wysiwyg JS editor (private code) once and those spam always made me grin.
Even if I knew it should some work to purge them, I’d not assume such a huge task.
August 20th, 2011 at 12:52 pm
Thank you Ryosuke and the others who helped you work on this. I’ve been building an editor (XHTML 1.1, JavaScript/DOM, PHP validator, XML-to-BB and BB-to-XML parsers) entirely on my own sans iframe/framesworks and having to interpret all sorts of differing implementations between various rendering engines and messy code has been absolutely time consuming so I can’t express how much I appreciate that you put the effort in to cleaning this up. I can’t wait to try out a nightly. Thanks again!
August 21st, 2011 at 8:55 pm
w00t!!