Forum How do I...?

Convert output from pdftohtml back to PDF with Prince?

have
Has anyone experience with handling the output from the pdftohtml library - to create a new PDF?

Thing is, we receive a PDF file and would like to be able to replace content (text / prices) in it - and create a new PDF.

pdftohtml takes a -xml option - and creates nice XML like:

<text top="478" left="493" width="43" height="38" font="2">Some text label</text>


(complete XML output attached)

How could we use Prince to position and style an element like this?
  1. drinks.xml1.9 kB
howcome
Interesting challenge. Try running this on the command-line:
prince -j --script=https://www.princexml.com/howcome/2021/pdf2xml/script.js https://www.princexml.com/howcome/2021/pdf2xml/drinks.xml -o foo.pdf > drinks.html

The small script converts the XML to presentational HTML which can be processed with Prince.

Some work remains. In particular, the font sizes and positions must be multiplied with something. And the <image> element must be processed.

Edited by howcome

have
Thanks Håkon! :-) .. Looks promising! I will take a closer look at it.

Appreciate your input!