Prince 14.3: still issues with huge data:URLs

Johann
20 Jun 2022

In our documents we embed all images with data: URLs, but always had problems with huge (several MB) PNG images.

The release notes for Prince 14.3 announced improvements in decoding data: URLs, but the conversion of some our documents still fails!

For example: we have an image tag with a data: URL attribute that starts at column 13725 and ends at column 20479816 of the same line.

The conversion fails with errors:

prince: test.xhtml:1011: error: internal error: Huge input lookup
prince: test.xhtml:1011: error: attributes construct error
prince: test.xhtml:1011: error: Couldn't find end of Start Tag img line 1011
prince: test.xhtml:1011: error: Premature end of data in tag div line 1011
prince: test.xhtml: error: could not load input file
prince: error: failed to load all input documents

- - -
Johann

mikeday
21 Jun 2022

It appears to be a limit of the XML parser, we should be able to fix this, in the meantime perhaps you could use HTML instead with "-i html"?

Johann
21 Jun 2022

Thanks - that parses the big images!

But it does not immediately resolve the issue for our documents, because we also have the CSS styles inline.That leads to other error messages like:

prince: test.xhtml:style:786: warning: parse error in selector at '&'

As a workaround I currently urge every team member to import reasonably sized images.

- - -
Johann

mikeday
21 Jun 2022

I'm curious where that error is coming from, are you using XML entities in the style element?

Johann
21 Jun 2022

It's simple the line:

h2+div.figure&gt;img, h3+div.figure&gt;img, h4+div.figure&gt;img { max-height: 210mm; }

In this case the > selector.

- - -
Johann

mikeday
21 Jun 2022

Oh right, in HTML these characters do not need to be escaped inside the <style> element while in XML they do.

mikeday
14 Feb 2024

We have addressed this issue in the latest build, please let us know if you experience any further issues with large XML documents.

(Please note that when running Prince on Linux it may be necessary to pass the new --xml-parse-huge option on the command-line to enable this behaviour. Prince uses libxml2 and it caps the size of XML documents to protect against the "billion laughs" entity attack where a small document can expand when parsed to consume an unexpectedly large amount of memory. A better mechanism to protect against this attack was added in libxml2 version 2.11.0 which allows Prince to safely enable the option by default, however in the meantime please exercise care when using this option with untrusted XML input that could potentially consume more memory than anticipated).

Johann
14 Feb 2024

Great - Thanks!

- - -
Johann

Forum › Bugs

Prince 14.3: still issues with huge data:URLs