Forum Bugs

HTML5 and meta charset UTF-8


I dont't know if this is a bug or not, however using the HTML5's meta charset ...
<meta charset="UTF-8" />
... in the pdf I get wrong characters for à è é ì ò ù and so on.
Fortunately this problem doesn't occur using the "old" meta charset
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
This is a limitation of our current HTML parser. We are in the process of developing a new HTML5 parser that should solve the issue.
What is the eta on the new HTML5 parser engine? I was looking to buy Prince to help meet a fast corporate timeline; however, we are using HTML5 pretty heavily on the pages that need to become PDF.
We are currently testing the new parser, and improving performance. Which particular aspects of HTML5 are you using? Most of the new elements like <article> etc. work fine with our existing parser, they only trigger a warning.
On a short timeline, this could be gotten around temporarily with
or another such stream editor, by replacing the actual foreign characters with their HTML entities until the new parser is in plac.

John Haugeland is

Changing the meta tag would be a lot easier, after that all the characters will be parsed without problems.
Hi, I ran into this problem too, and found this post via Google. Our html code is being upgraded to HTML5.

Replacing the meta tag worked for us to set the encoding correctly for now.

Mike, if you see this, do you plan for your HTML5 parser to handle this tag?

Yes, the new HTML5 parser will handle this.
We have now released Prince 9, which uses our new HTML5 parser with support for the new charset declaration syntax:
<meta charset="UTF-8">

(Although in this case it is unnecessary, as UTF-8 is the default encoding).