Forum Bugs

UTF-8 encoding problem

Emil Stenström
Hi,
I just tried playing around with Prince and it certainly looks good. I found one problem though. I have a basic HTML 4 document in UTF that I would like to convert to PDF but it seems all non-ASCII characters gets displayed incorrectly.

I have a testcase up at: [problem solved]
And the generated PDF: [problem solved]

Is this a known problem? What can I do to sidestep it?

(Another minor annoyance is that Prince don't remembers the last directory I picked Input files from. Using the last directory would be much better imo).

Edited by Emil Stenström

Emil Stenström
Update: It seems it renders incorrectly when viewed on the web too, but not if I save the file locally and open it. The server seems to send a Content-Type of ISO-8850-1 which causes the rendering error there, but this shouldn't be an issue locally should it?
mikeday
Hi Emil,

Your HTML file begins with a UTF-8 Byte Order Mark (BOM), which unfortunately is not recognised by the HTML parser that we are using and causes the rest of the file to be misinterpreted.

However, this construct is perfectly valid XML, so changing the file to be an XHTML file (eg. changing <meta...> to <meta.../>) will solve the problem.

The UTF-8 BOM is usually only added to files edited in Notepad, so one other alternative would be to copy the file contents into another text editor and save from there.

We'll try to fix this issue in a future release of Prince.

Best regards,

Michael
Emil Stenström
@mikeday: Thanks! I'll have a look at that. Appreciate the speedy reply.

[Edit: It worked like you said, thanks]
mikeday
Today we have released Prince 6.0 rev 8, which correctly processes HTML documents beginning with a UTF-8 byte order mark.