Forum How do I...?

Trouble rendering HTML 4.01 Transitional (with CSS) to PDF

algorithmetics
Hello,

I have W3C-valid HTML 4.01 Transitional files (with in-line CSS) and would like to be able to convert them to PDF format using Prince, but am having some trouble: the generated PDF does not mimic the HTML/CSS I see in Firefox, Chrome, or Internet Explorer on Windows XP. The PrinceXML homepage advertises complete support for HTML; as this is valid HTML, I am wondering where I am making a mistake and would appreciate some advice.

I've tried leaving the CSS in the <style> tags as well as in-line and it does not seem to make a difference.

I did notice that the Prince PDF output is very close to the how the page displays when I try giving it a <!DOCTYPE of HTML 4.01 Strict or of XHTML; is there a setting available to force Prince to parse it at HTML 4.01 Transitional rather than any of the other formats? I was planning on purchasing a license for Prince but am hesitating while I try to fix this problem.

Thank you in advance for any suggestions.

Edited by algorithmetics

algorithmetics
Here's an example of a page I've been trying to convert:
http://gilson.algorithmetics.com/view1

W3C Validation (with neither warnings nor errors):
http://validator.w3.org/check?uri=http://gilson.algorithmetics.com/view1

PDF output (generated using PHP exec() to call Prince at the command line using URL-passed arguments on a remote Debian server):
http://lafayette13.dyndns.org/prince/prince-php5-r5/convert.php?input=http://gilson.algorithmetics.com/view1&output=1.pdf
mikeday
If you check the page with the W3C CSS validator you will find that lengths giving for the width, height, top, left, bottom, and right properties must have a unit, eg. px or pt, and cannot simply be numbers.
algorithmetics
That was quite a silly mistake, I'm sorry for wasting your time on something so obvious.

I am, however, noticing some new errors: I'm converting some pages from a PDF to HTML and each line of text is in a separate div, and not all lines are reproduced perfectly and some overlap occurs, making the text unreadable in places. Here's an example:

HTML/CSS page:
http://gilson.algorithmetics.com/view.php?id=48

PDF version:
http://lafayette13.dyndns.org/prince/prince-php5-r5/48.pdf

Is there something I can do to avoid this or is it just a consequence of the complicated HTML structure?

Edited by algorithmetics

mikeday
This appears to be an issue with Prince, where line-height applied to inline elements like span is not having the desired affect. Applying line-height to the containing div element solves the problem. We will investigate this issue for a future release of Prince.
algorithmetics
Your suggestion has been implemented and is successful. Thank you very much for your help.

Any complicated software will initially have some small bugs to be ironed out over time. As long as there is a temporary fix, it is not a major issue. Thus, I am very impressed by the quality of the output produced by the Prince HTML to PDF conversion software and would like to compliment its developers on their great work.
mikeday
Thanks! We appreciate it :)