Forum Bugs

HTML5 and meta charset UTF-8

steve27
Hi,

I dont't know if this is a bug or not, however using the HTML5's meta charset ...
<meta charset="UTF-8" />
... in the pdf I get wrong characters for à è é ì ò ù and so on.
Fortunately this problem doesn't occur using the "old" meta charset
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
.
mikeday
This is a limitation of our current HTML parser. We are in the process of developing a new HTML5 parser that should solve the issue.
nhanna
What is the eta on the new HTML5 parser engine? I was looking to buy Prince to help meet a fast corporate timeline; however, we are using HTML5 pretty heavily on the pages that need to become PDF.
mikeday
We are currently testing the new parser, and improving performance. Which particular aspects of HTML5 are you using? Most of the new elements like <article> etc. work fine with our existing parser, they only trigger a warning.
StoneCypher
On a short timeline, this could be gotten around temporarily with
sed
or another such stream editor, by replacing the actual foreign characters with their HTML entities until the new parser is in plac.

John Haugeland is http://fullof.bs/

mikeday
Changing the meta tag would be a lot easier, after that all the characters will be parsed without problems.
rpilkey
Hi, I ran into this problem too, and found this post via Google. Our html code is being upgraded to HTML5.

Replacing the meta tag worked for us to set the encoding correctly for now.

Mike, if you see this, do you plan for your HTML5 parser to handle this tag?

Thanks,
Roger
mikeday
Yes, the new HTML5 parser will handle this.
mikeday
We have now released Prince 9, which uses our new HTML5 parser with support for the new charset declaration syntax:
<meta charset="UTF-8">

(Although in this case it is unnecessary, as UTF-8 is the default encoding).