Forum How do I...?

Stop Prince if image fails to download

joemasilotti
Is it possible to kill Prince if an image (or any remote resource) fails to download? If not, what is the best way to check if a resource fails?

Our HTML to PDF conversion includes user submitted images, some of which occasionally fail to download (or time out). If the image doesn't load then the PDF is "wrong" and needs to try again. We don't want to show a broken PDF to our end users.

Ideally, the exit code would be non-zero. But it doesn't look like that's the case. Should I be parsing the output for warnings? If so, is there a recommended pattern I could grep for?

Thanks in advance,
Joe
mikeday
Currently converting this document with structured logging will give this output:
<img src="notfound.jpg">
<img src="http://example.com/notfound.jpg">

msg|wrn|notfound.jpg|can't open input file: No such file or directory
msg|wrn|http://example.com/notfound.jpg|Could not resolve host: example.com
fin|success

If the images are all remote then it may be sufficient to check for any warning message whose location refers to a HTTP/HTTPS URL.
joemasilotti
Unfortunately unrecognized/ignored CSS prints out the same
msg|wrn
log. Our customers are able to enter a small snippet of CSS to change their background color and such. So I don't think this is an option for us. Is there any way to get the images to print
msg|err
or similar?

Edited by joemasilotti

mikeday
You can disable the CSS warnings with --no-warn-css, would that help?
joemasilotti
Good idea, I can add that.

Are there any other warnings that can be generated? Right now I'm parsing the log file for
msg|wrn|http
and assuming that that is an HTTP failure.
mikeday
That should catch all of them, including "Unknown image format" which is possible when the server incorrectly redirects to a HTML "not found" page instead of returning 404.
mikeday
The other solution would be to download customer logos in advance and cache them locally.
joemasilotti
Got it. That might be possible too.

Do you know of any other logs that start with the same string that could trigger false positives?
mikeday
Not unless you are linking to other HTTP resources, although you may wish to catch those too.
joemasilotti
The only other HTTP resources we are linking to are external fonts and CSS files. Which, now that I think about it, should also "fail" the PDF. So I think that will work! Thanks for the help.
joemasilotti
After some more digging I don't think will work for us.

We need the PDF generation to stop as soon as it encounters an error. Even with setting the HTTP timeout flag our server times out when a few images aren't available for download. For example, five images in a row timeout which adds up to 25 seconds, timing out our server.

Do you have any other recommendations?
mikeday
Unfortunately making one HTTP request wait for others is inherently risky, as a slow server could make you wait too long even if there is no error. The usual solution is to queue the job and run it in the background, so your server can give an immediate response and then update when the PDF is ready. Another option would be to download the customer images ahead of time, if that is possible.
joemasilotti
Background jobs definitely make sense and are something we are exploring. We are trying to avoid this because our customers make a few tweaks and want to view their PDF quickly. They make a small batch of consecutive changes trying to get the output looking right before moving on and actually printing it. So quick response times is important, and we are worried that pushing to a background job would slow this down.

Another related question: is it possible to embed fonts directly into the PDF? We are running on Heroku where it isn't feasible to install fonts to the system. We are currently downloading them with the recommended @font-face decelerations, but that makes a network request. Ideally we would embed them as you recommended with images using data objects.
mikeday
It is complicated if customers can edit the CSS directly and include links to images on remote servers, ideally if they entered images through the UI then you could download and cache them, eliminating any delay in producing the PDF.

You can embed fonts directly in a style sheet with data: URLs (base64 encoded) if it is not feasible to store them as local files.