Forum How do I...?

same-document link and xml:base="..." attribute

jablko
How do I make a same-document link when already using an xml-base="..." attribute?

Here's my HTML input, http://www.sfu.ca/~jdbates/tmp/css/201001030/manual.html

- and here's the PDF I generated with Prince 7.0, http://www.sfu.ca/~jdbates/tmp/css/201001030/manual.pdf

An example from the input is,

<a href="#d1e90" title="UM-1.1" shape="rect">UM-1.1 What is ICA-AtoM?</a>


I want this to be a same-document link to the section with id "d1e90", but in the generated PDF, clicking this link instead opens http://ica-atom.org/docs/index.php?title=UM-1#d1e90 in a browser : (

I think it's because of this attribute, xml:base="http://ica-atom.org/docs/index.php?title=UM-1"

If I drop this attribute ^ - then in the generated PDF, clicking the link jumps to the section with id "d1e90" (which is what I want) and doesn't open a browser

- unfortunately, without the xml:base="..." attribute, clicking other links doesn't open the correct URLs and images are missing : (

How do I make a same-document link without dropping the xml:base="..." attribute?

According to RFC 3986 section 4.4, http://tools.ietf.org/html/rfc3986#section-4.4

When a URI reference refers to a URI that is, aside from its fragment
component (if any), identical to the base URI (Section 5.1), that
reference is called a "same-document" reference. The most frequent
examples of same-document references are relative references that are
empty or include only the number sign ("#") separator followed by a
fragment identifier.

In this case my understanding is that the xml:base="http://ica-atom.org/docs/index.php?title=UM-1" attribute defines the "base URI". The relative "URI reference" "#d1e90" resolves to http://ica-atom.org/docs/index.php?title=UM-1#d1e90. Except for the fragment component, this is identical to the base URI, so is a same-document reference?

When a same-document reference is dereferenced for a retrieval
action, the target of that reference is defined to be within the same
entity (representation, document, or message) as the reference;
therefore, a dereference should not result in a new retrieval action.

So clicking this link should jump to the section with id "d1e90", not open a browser?

Is this a problem with Prince, or with my PDF reader? (or my interpretation of the RFC...)
mikeday
The situation here is that only part of the document has that base URL, correct? This means that the rest of the document, including the root element, has a different base URL. Having multiple base URLs for the same document isn't a scenario we expect to handle. Can you use the --baseurl command-line option to specify the same base URL for the whole document instead?
jablko
Thanks Mike -

That's correct - the document is a combination of several wiki pages, each with a different xml:base="..." attribute. So only part of the document has each base URL

By specifying the --baseurl option I can make same-document links in one wiki page work, but can't make same-document links in all pages work at the same time : (

Is there an alternative approach that Prince does handle?

May I open a feature request for Prince to support documents composed of content with different base URLs?
mikeday
I tested a document like this:
<html>
<body>
<div xml:base="http://example.com/doc1.html">
<p id="first">This is the first paragraph!</p>
<p><a href="#first">This</a> is a link to the first paragraph.</p>
</div>
<div xml:base="http://example.com/doc2.html">
<p id="second">This is the second paragraph!</p>
<p><a href="#second">This</a> is a link to the second paragraph.</p>
</div>
</body>
</html>

and the links resolved correctly as internal links. Does your document have a similar structure to this one?

Update: Cancel that, I was parsing the document as HTML, and the xml:base attribute was being ignored. Once parsed as XML the problem can be replicated. :)
jablko
Yes exactly - my document has a similar structure
mikeday
Today we have released Prince 7.1, which includes better support for internal links in documents with multiple base URLs.