Forum How do I...?

Character encoding problem in Windows Vista, Prince v6.0r5

roychen
Hello,

While attempting to convert an XHTML file to a PDF, I got the following error messages:

prince: warning: no glyphs for character U+0099, fallback to U+2122.

There are a few others like these too, and as far as I can tell, the affected characters are em-dashes and the Unicode apostrophe.

Reading a few other posts on this forum, it would appear that such error messages appear in Linux if you don't have the appropriate fonts installed. However, I'm running the Windows version of Prince, and should have all the Microsoft TrueType fonts installed.

Any help would be much appreciated.
mikeday
Sounds like the file has a declared encoding of ISO-8859-1, but has had text in another encoding pasted into it, or has been edited in an application that does not use ISO-8859-1. Try this:
<?xml version="1.0" encoding="Windows-1252"?>
roychen
Hello mike, firstly, thanks for your help with fixing those links previously.

My document had the header

<?xml version="1.0" encoding="UTF-8"?>


and changing the encoding to Windows-1252 didn't work.

Could it be because I saved the XHTML file as ASCII (ISO-8859-1), and I need to save it as a Unicode file?
mikeday
That could be it. Somewhere along the line the encoding has been jumbled, as "99" is not a valid ISO-8859-1 character, but it is a valid Windows-1252 character.
roychen
Well, turns out it was simply because I didn't replace a "&" character with the HTML entity "&amp;".

Finally managed to figure it out after spending so much time working on the file's encoding.

Thanks for your help mike.
mikeday
That's funny, was there a &#99 in the document or something?
roychen
Hmm, I'm pretty sure there weren't any HTML entities in my document.

My document's encoding is ANSI as UTF-8 (UTF-8 without BOM).

Apparently, the problem is caused by having an ampersand character in the <meta> tag in the <head> section (I used those to set the author and subject metadata for the output PDF).

For example:

<head>
<title>Testing 123</title>
<meta name="author" content="Roy Chen" />
<meta name="subject" content="Testing & testing" />
</head>


Replacing the ampersand with &amp; fixes this issue. Perhaps you all would like to look into it.

Thank you once again.
mikeday
That's interesting, it must have confused the encoding detection some how. The resulting error message is not very convenient, though. :)
roychen
You don't say.. :)

Well, at least it's resolved, thanks!