Forum How do I...?

Avoid Identity-H encoding?

david2
Hi,

I'm evaluating solutions for converting HTML to PDF, and one of the requirements I was given is that the generated PDF must not have Identity-H encoded fonts. While evaluating the non-commercial version of Prince, the PDFs I've generated have had Identity-H encoded fonts when I've used OpenType fonts but not when I've used TrueType fonts.

I searched for solutions and the most relevant information I found was the following discussion:

http://www.princexml.com/forum/topic/1823/disabling-automatic-ligatures

I added the following property to our CSS:

* { font-variant: prince-opentype(ccmp) }

The result was that one OpenType font was not Identity-H encoded, but the other OpenType font remained Identity-H encoded:

Are there any other possible solutions I could try to avoid Identity-H encoded fonts other than use TrueType fonts?

Thanks,
David
mikeday
Disabling ligatures is necessary, but the other issue is to avoid using any characters outside the basic Roman character set, as this will force Prince to switch encodings. If you email me (mikeday@yeslogic.com) a sample PDF file I can take a look and see which character is causing the switch.

However, where does the requirement to avoid Identity encoding come from? Since Prince generates ToUnicode maps for all fonts, the choice of character encoding should not affect the accessibility of the file, for example search and copy/paste should still work fine.
david2
mikeday wrote:
the other issue is to avoid using any characters outside the basic Roman character set, as this will force Prince to switch encodings.

Okay, thanks for the tip. The problem was the ½ character entity reference.

However, where does the requirement to avoid Identity encoding come from?

The PDFs will be printed, and I was told that I should avoid Identity-H encoding because it can cause problems when printing.

Since I need to avoid Identity-H encoding, it seems like I should use TrueType fonts if I end up using Prince to generate the PDFs. In the context of Prince, is there any reason why I should use OpenType fonts instead (which would apparently require us to avoid using HTML character entity references)?

Thanks again for your help.

David
mikeday
Technically TrueType fonts don't really exist any more, there are only "OpenType fonts with TrueType outlines" (.ttf files) and "OpenType fonts with PostScript/CFF outlines" (.otf files).

Either of these fonts can contain OpenType substitution tables for ligatures, which will trigger the use of Identity encoding in Prince unless you disable it with the font-variant property.

And regardless of which kind of font you use, Prince will switch to Identity encoding if you use a character outside the basic Roman character set.

So to avoid the Identity encoding, simply disable ligatures and use basic Roman characters. However, most people have no problems whatsoever with this encoding, so I would recommend double checking if it is really an issue with your printer.
david2
mikeday wrote:
Technically TrueType fonts don't really exist any more, there are only "OpenType fonts with TrueType outlines" (.ttf files) and "OpenType fonts with PostScript/CFF outlines" (.otf files).

I didn't know this. Thanks for the info.

And regardless of which kind of font you use, Prince will switch to Identity encoding if you use a character outside the basic Roman character set.

Do you know why Prince uses Identity-H encoding when the input HTML has the ½ character entity reference and uses OpenType fonts but not when the HTML uses TrueType fonts?

Thanks,
David
mikeday
Difficult to say without seeing the PDF; which fonts are you using? Do both of them show the 1/2 glyph, or does it get replaced with question mark?
david2
mikeday wrote:
which fonts are you using?

I've tested Prince using 10-20 fonts, and the results are always the same. When I use a TrueType font (e.g., Helvetica World) for the text that contains the ½ character entity reference, I see the following PDF properties in Adobe Reader:

HelveticaWorld (embedded subset)
Type: TrueType
Encoding: Built-in

When I use an OpenType font (e.g., Myriad Pro) for the text that contains the ½ character entity reference, I see the following PDF properties in Adobe Reader:

MyriadPro-Regular (embedded)
Type: Type 1 (CID)
Encoding: Identity-H

Do both of them show the 1/2 glyph

Yes.

Thanks,
David
mikeday
This is because Prince currently cannot subset OpenType fonts with PostScript/CFF (Type 1) outlines, but it can subset OpenType fonts with TrueType outlines. Fonts that have been subset do not use any encoding, which for some reason is considered to be different to the identity encoding used for fonts which have not been subset. The mind boggles.

Anyway, if you disable font subsetting with the --no-subset-fonts option then both fonts will show up as Identity-H. On the other hand, if you remove the fraction character, the encoding should change to "Roman" for both fonts.

You may wish to use OpenType fonts with TrueType outlines given that Prince can subset them, and the resulting PDF file will be slightly smaller.
joelmeador
We've been having issues with tihs on DocRaptor recently due to the way typekit delivers fonts. Is OTF subsetting in the future? How does that relate to the --force-identity-encoding in Prince 11, if at all?
mikeday
The --force-identity-encoding option simply does what it states: forces the use of identity encoding even when the text could have used MacRoman encoding. Subsetting for OpenType fonts with CFF outlines would be nice, but looks like it will take a bit of work.
mikeday
Prince latest builds now support subsetting for CFF fonts! :D
joelmeador
Awesome!