Forum How do I...?

Cross links when generating PDF from multiple files

elephant
While generating a PDF from multiple XHTML files (a website) I tried to keep the links from one page to the other intact. The links are not anchor links on one page but pretty basic hrefs. E.g. http://localhost/a/ has a link to http://localhost/b/ like:

// file a/index.html
a href="/b/"


Most links are "path-absolute" so starting with "/" but some may also be relative like "../c/". The links are all referring to a directory only as the actual "file" is not always index.html but may also be generated or dynamic like index.jsp or similar.

After generating the PDF directly from feeding all URLs into Prince using a simple Python script in which the end result is something like:

subprocess.call(['prince', 'http://localhost://a/', 'http://localhost://b/', 'test.pdf'])


The resulting PDF has then all links as URL links in it so e.g. the link on "/a/" is a link to "http://localhost://b/". Effectively the PDF links are unusable and for now I disabled them using prince-link: none in the stylesheet.

But I figured there should be a solution to let prince rewrite all links with something like prince-link: url(target-url) but I don't know how to use it. Is there a way to find out the target-url of the page where e.g. "/b/ is generated to?

The only workaround I could think of is to rewrite all links to something like a href="#b" and give page "b" an body/@id but I hope this should be easier with a few lines in the CSS?

Thanks for any hint.
mikeday
Prince 6.0 rev 3 has a limitation affecting links without fragment IDs when processing multiple documents. Links with fragment IDs such as index.html#foo should be correctly converted to internal links, but links without fragment IDs will not. This issue will be fixed in the next maintenance release.
elephant
mikeday wrote:
Prince 6.0 rev 3 has a limitation affecting links without fragment IDs when processing multiple documents. Links with fragment IDs such as index.html#foo should be correctly converted to internal links, but links without fragment IDs will not. This issue will be fixed in the next maintenance release.


What can I do now? If I put a bogus fragment ident after each link, would it work? How would I have to write the stylesheet then (and after the issue is fixed)?

Thanks!
mikeday
At the moment you would need to add a fragment identifier after each link, but it would need to be a valid fragment identifier, ie. the document you are linking to would need to have an id attribute somewhere with that value. This restriction will be lifted in the next release.
mikeday
The new release is out now with a fix for this issue, and links between documents should work correctly now.
Jellby
In version 7.0b1 links between files are, indeed, usually working, but not when the files don't have the same parent directory, it seems.

I have these files:

a/test1.html:
<html>
<body>
<p id="p1">This is a paragraph in file 1.</p>
<p>A <a href="../b/test2.html">link</a> to file 2.</p>
<p>A <a href="../b/test2.html#p2">link</a> to a paragraph in file 2.</p>
</body>
</html>


b/test2.html:
<html>
<body>
<p id="p2">This is a paragraph in file 2.</p>
</body>
</html>


And I compile with:

prince a/test1.html b/test2.html -o test.pdf

The links in the resulting pdf point to local files, not to page 2 of the pdf. This does not happen if both test1.html and test2.html sit in directory a/, though.
mikeday
It also works if you run the command from the 'a' directory:
$ cd a
$ prince test1.html ../b/test2.html -o test.pdf

Clearly the issue here is that the paths "a/../b/test2.html" and "b/test2.html" are not comparing as equal, as the ".." is not being normalised. I'll add this to the roadmap.
mikeday
Prince 7.0 is out and supports normalisation of path names, so this problem should now be fixed. Thanks for letting us know. :)
fbrzvnrnd
I think there still a problem if:

we have a folder called 'example'. Inside the folder 'example' we have multiple xhtml files. Files are linked each over using this syntax <a href="../example/filename.xhtml">

This syntax is the standard Sigil one, so everytime I try to build a PDF from a Sigil ePub I have to search/replace the "../example/" part. But it is a Prince problem I think.

f.
mikeday
How are you invoking Prince on the input files? Can you do something like this:
$ prince example/chap1.html example/chap2.html -o out.pdf

Or pass the full absolute paths to the input documents, so that the ../example can be resolved correctly.