Forum How do I...?

create PDF's for HTML generated by pdf2htmlEX library

Inchara
I am unable to convert my html to visually accurate pdfs. Browsers are able to render it correctly but prince is unable to. My hypothesis is that the fonts are not loaded properly for the html (generated by PDF2htmlex library) that refer to woff files via font faces. Prince seems to default to system fonts resulting in documents that have slightly bigger fonts, get cut off towards the right end and have overlapping text.

Logs
Converting document...
used font: Times New Roman, Regular
loading font: /*/f1.woff
used font: DejaVu Sans, Book
loading font: /*/f6.woff

When I disable system fonts and try to force it to use only fonts via font faces I get the following error - warning: Ensure fonts are available on the system or load them via a @font-face rule.

mikeday
Is it failing to load the f1.woff? Are there any error messages?
Inchara
No there are no error messages, The pdf gets generated but the text is overlapping and content gets cut off in the right end. The html is an absolute positional html generated by pdf2htmlEX library.

Following are the details and logs. Does the logs look like partial loading of fonts? I am not sure what is causing the visual fidelity loss. This renders well on browsers.

Font-face loading in css

.ff0{font-family:sans-serif;visibility:hidden;}
@font-face{font-family:ff1;src:url(f1.woff)format("woff");}.ff1{font-family:ff1;line-height:1.695312;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff2;src:url(f2.woff)format("woff");}.ff2{font-family:ff2;line-height:1.172852;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff3;src:url(f3.woff)format("woff");}.ff3{font-family:ff3;line-height:1.202148;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff4;src:url(f4.woff)format("woff");}.ff4{font-family:ff4;line-height:0.850586;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff5;src:url(f5.woff)format("woff");}.ff5{font-family:ff5;line-height:0.981934;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff6;src:url(f6.woff)format("woff");}.ff6{font-family:ff6;line-height:1.456055;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff7;src:url(f7.woff)format("woff");}.ff7{font-family:ff7;line-height:0.769531;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff8;src:url(f8.woff)format("woff");}.ff8{font-family:ff8;line-height:1.456055;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ff9;src:url(f9.woff)format("woff");}.ff9{font-family:ff9;line-height:1.534180;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ffa;src:url(fa.woff)format("woff");}.ffa{font-family:ffa;line-height:1.112305;font-style:normal;font-weight:normal;visibility:visible;}
@font-face{font-family:ffb;src:url(fb.woff)format("woff");}.ffb{font-family:ffb;line-height:0.946777;font-style:normal;font-weight:normal;visibility:visible;}

Logs

Converting document...
used font: Times New Roman, Regular
loading font: f1.woff
used font: DejaVu Sans, Book
loading font: f6.woff
used font: DejaVu Serif, Book
loading font: f2.woff
used font: Calibri, Regular
loading font: f3.woff
used font: Calibri, Bold
loading font: f5.woff
used font: Calibri, Bold Italic

List of fonts in the directory

f1.woff: DejaVu Sans Book
f2.woff: Calibri Regular
f3.woff: Calibri Bold
f4.woff: Cambria Bold
f5.woff: Calibri Bold Italic
f6.woff: DejaVu Serif Book
f7.woff: Georgia Italique
f8.woff: DejaVu Serif Book
f9.woff: DejaVu Serif Bold
fa.woff: Times New Roman Normal
fb.woff: Times New Roman Italique
wezm
I did some experimentation with the pdf2htmlEX tool. It generates print styles by default, which Prince honours. These print styles set the width of the page to 793.700800pt and height to 1122.519733pt.
One pt is 1 inch / 72, so that means the pages come out 11" by 15.6".

If you toggle print styles in a browser (Firefox in my case) you see the same effect of larger text, matching Prince.



Looking at the source code of pdf2htmlEX it seems it should be possible to influence the scaling with the --zoom and --font-size-multiplier arguments, but while they change the screen styles the print styles remain the same.

One workaround I came up with was to counteract the increased size by applying a scale in Prince:

html { transform: translate(-26mm, 0) scale(0.74); }


With that stored in scale.css Prince can be run as follows to produce content that fits on the page:

prince --page-size a4 -s scale.css input.html


One issue remains: I'm only seeing one page of output in Prince when there should be two. I believe this is because the page-container element has these styles on it:

#page-container {
  position: absolute;
  top: 0;
  left: 0;
  margin: 0;
  padding: 0;
  border: 0;
}


Being page-based this makes Prince constrain that element to the size of a single page, therefore only one page comes out.
Inchara
Thanks for the scaling tip! I used 0.75 and provided the exact page height and width- it helped me fix the scaling issue.
But I still see overlapping text. Do you think this could be a font issue? I do not see all the fonts being loaded in the logs. (as noted in the above message, only some fonts are loaded via font-face).
When I run with --no-system-fonts I get an error: Unable to find any available fonts.
I wonder if the positions of text is off due to difference in fonts.. Have you seen this before? Really appreciate your response.
  1. original.png356.1 kB
    original pdf
  2. prince_generated.png587.4 kB
    prince generated PDF
wezm
Interesting. I didn't see that behaviour in my testing, but I see from your screenshot that different fonts are being selected. If you could supply a sample document that reproduces the issue I can investigate further.
Inchara
Thank you so much wezm!! Truly appreciate your time. I am trying to convert lorem_mini_2_accessible.html to PDF. I have also attached a pdf generated by prince which uses prince_fixes.css and the corresponding logs.
  1. lorem_mini_2.zip1.1 MB
    html and css
wezm
Hi Inchara, the fonts in your sample document look ok, but some things are certainly mis-positioned. The fancy.min.css stylesheet has the following lines at the end:
h1, h2, h3, h4, h5, h6, p, ul, ol, li, section, article, title, main  {
    all: unset
}

However Prince does not yet support the 'all' property. We've added it to the roadmap for implementing in the future, but don't have an expected timeline on that. In the meantime the layout can be improved quite a bit by replicating some of what 'all: unset' would do. Try this CSS added to prince_fixes.css:
h1, h2, h3, h4, h5, h6, p, ul, ol, li, section, article, title, main {
    margin: 0;
}
mikeday
The "all" shorthand property is now supported in Prince pre-release builds.