Forum Bugs

Global named HTML anchors do not generate PDF bookmarks

Andi0661
Hi,
I have a generated HTML file with many links. Prince-XML 8.0 (free edition) converts it beautifully to PDF, but does not generate PDF links for HTML links to global named anchors (i.e. "a" elements that are direct children of the "body" element).

The W3C HTML 4.0 spec states that "a" elements are inline elements and as such they are not permitted to appear as direct children of the "body" element. However, so far every browser I tested the HTML file with, accepted them and the links to such global anchors worked fine. My testing included the recent versions of FF, IE, Opera, Safari, and Chrome.

The behavior of Prince-XML seems to be as if it dropped such global named anchors, and any anchors referencing them are generated just as text but not as a PDF link.

When I add a wrapper element with text around the global named anchor, the PDF links generated by Prince-XML work (they are then legal HTML w.r.t. the rule quoted above).

Example for HTML that causes Prince-XML to generate just text without a PDF bookmark:
<body>
<a name="x1"></a>
<p>Link to <a href="#x1">x1</a></p>
</body>


Example for HTML with wrapper element that causes Prince-XML to properly generate a PDF bookmark:
<body>
<div style="visibility: hidden; font-size: 0; line-height: 0;">&#xA0;
<a name="x1"></a>
</div>
<p>Link to <a href="#x1">x1</a></p>
</body>

The div element needed to have some text content for this to work, so I used a nbsp character (xA0). It seems Prince-XML "optimizes" the div element and the contained named anchor away without such additional text content (just my interpretation based on the link behaviour). The "visibility: hidden" alone was not sufficient to suppress the "div" element entirely (it still generated an empty line), so I had to "add font-size: 0" and "line-height: 0" (I did not verify whether one of them would have been sufficient).

I changed the HTML generator to remove some global named anchors because I was able to use heading ids instead, and for the remaining ones, I used the wrappering approach explained above, so my immediate problem is solved.

However, with large HTML files like mine (>800 links), it took me a while to figure out the root cause (global named anchors), so I wanted to raise this here for the benefit of others and make some suggestions regarding Prince-XML:

1. I don't think it is ok to "optimize" the named anchor away when it is wrappered inside a "div" element without additional text content. After all, this is legal HTML.

2. When Prince-XML internally drops or ignores any illegal HTML statements, this should be reported as an error or warning. After all, nested "p" elements are also reported by Prince-XML (which caused me to clean up my sloppy usage of nested "p" elements in the HTML generator, so thanks for that :wink: ).

3. Could you consider loosening the strict interpretation of the HTML spec w.r.t. global named anchors (i.e. without being wrappered) somewhat to the more tolerant way supported by most browsers ?
The current behavior causes HTML that works in most browsers to no longer work with Prince-XML.

I can supply the HTML and PDF files if needed.

Kind regards,
Andy
mikeday
Thanks for the detailed coverage, this should be helpful to other readers of the forum. There is a known issue where links to empty anchor elements in a block context will not work. In fact this may also affect links to empty spans, eg. <span id="foo"></span> if they occur in a block context. The workaround is similar to what you describe, either moving the id to a non-empty element, or adding some "fake text" around the anchor. In general I think that placing the id attribute on a more semantically relevant element like a heading or document section is the best strategy. However, we do plan to fix this limitation in a future release of Prince.