Forum Bugs

Problems in unicode font placement (Hebrew vowels)

KalH
Hi,

If I convert (for example) Genesis chapter 7 in Hebrew to PDF with Prince, then the various marks (vowels and cantillation marks) around the Hebrew characters are often misplaced, compared to viewing the web page in Chrome:



I have tried viewing the PDF with different viewers, including the built-in Chrome one, and they all give the exact same misplaced marks as above, so the problem is not from the viewer but from the PDF itself. And the marks are correct if the PDF is generated by Chrome > Print > Save as PDF.

This is very unfortunate, is there a way to solve this?

Thank you.

Classical Hebrew fonts are needed to display the cantillation marks correctly, like the free Shlomo Fonts.

These are other free and widely used Hebrew fonts:

Ezra SIL fonts
SBL Hebrew fonts
mikeday
Thanks, we will take a look at this. We have spent some time with Ezra SIL, but there must be some further interaction of the OpenType positioning rules we have missed.
KalH
Great, looking forward to this. The Shlomo fonts above are derived from and very similar to Ezra SIL.

It seems to be an issue with some positioning rules applied after the symbols have been drawn on the letter, because for example with the third word of the image above לְנֹ֔חַ, the double dots and the single dot normally do not overlap (as in the image above, Chrome version), but if I edit the HTML file and remove the last letter of לְנֹ֔חַ, ie replace it with לְנֹ֔, then the double dots and single dot do overlap, exactly in the same way as in the image above, Prince version.

So the normal, non-overlapping positioning rule is apparently dependent on the next letter being present, and Prince apparently does not apply that rule at all, whether the next letter is present or not.

The exact same behavior happens while typing in Wordpad. For some reason this issue does not come up with Arial, but does with Ezra SIL/Ezra SIL SR, with Shlomo and with SBL Hebrew, all widely used and standard fonts specialized for Classical Hebrew.

Thank you.

EDIT: If it's any help, the standard keyboard layout to type all these letters and marks is Biblical Hebrew (Tiro).

Edited by KalH

mikeday
Okay, here is a simplified sample document:
<html dir="rtl">
<body>
<p style="font-size: 400%; font-family: Ezra SIL">
&#x5d9;&#x5b9;&#x5d0;
</p>
</body>
</html>

The document contains three Hebrew characters, in logical order:
  • U+5D9 HEBREW LETTER YOD
  • U+5B9 HEBREW POINT HOLAM
  • U+5D0 HEBREW LETTER ALEF
The holam is a non-spacing mark that is logically attached to the preceding yod, and that is where it appears when using older fonts like Times New Roman:
times.png

However, when using the Ezra SIL font the browsers move the holam above the alef instead, while Prince leaves it above the yod:
ezra.png

If the browsers are correct, Ezra must have an OpenType positioning rule that applies to the holam in this context. [To be continued!]
  1. ezra.png4.5 kB
  2. times.png2.0 kB
mikeday
Continuing, the three Hebrew characters in the simplified sample document are mapped to the following sequence of glyphs in the Ezra SIL font:
284 263 275

These are still in logical order, so 284 is the glyph index for yod that will be displayed in the right-most position.

There are no OpenType substitution rules in Ezra that apply to these glyphs, so next we check for OpenType positioning rules to see how the glyphs should be arranged relative to each other.

Currently the only applicable rule Prince finds is a mark-to-base rule that anchors glyph 263 to 284, attaching the holam to the yod, not the alef. However, Ezra is an unusually complex OpenType font, with a lot of contextual rules that only apply in certain situations, so maybe there is a positioning rule that Prince is skipping incorrectly?
mikeday
Okay, this looks like a possibility: the OpenType GPOS table in the Ezra SIL font contains a chaining contextual positioning subtable that matches input glyph 263 (holam) when it is followed by glyph 275 (alef). This then applies another lookup record which contains a mark-to-base attachment, which is where things get awkward.

Normally mark-to-base attachments apply to two glyphs, with glyph1 as the base and glyph2 as the mark. However, following the chaining rule the first glyph is the holam mark, and the second is the alef base, so when we apply them in this order the rule doesn't match.

If we temporarily tweak it so that the rule does match in that situation, then the alef gets treated as a mark and attached to the holam which is attached to the yod, which isn't what we want at all.

This is going to need some more thought. :|
KalH
EDIT: I saw your new post just now after posting, this is an answer to your previous post.

Yes, I think that's true, I do think there is a positioning rule that Prince is skipping incorrectly: AFTER the aleph (275) is typed/present/processed, Prince should go back and apply a rule to the holam. That is what Wordpad (in real time) and Chrome both do.

The holam in יֹא should definitely be above the alef (between the alef and the yod) and not directly above the yod. This is what I get in all fonts (Times New Roman, Ezra SIL) in browsers and in Wordpad.

(however, contrary to yod+holam, in vav+holam וֹ the holam is just above vav, as it should be. Ezra SIL and Times New Roman know this, so they treat yod and vav differently)

times new roman vs ezra.png


There is a different behavior in Ezra compared to Times New Roman, which is exactly the same thing happening in the previous post when removing the last letter of לְנֹ֔חַ:

as shown in the image, if I type yod then holam in Wordpad (without typing the alef for now), the holam appears just above yod in Ezra SIL (wrong for now) whereas the holam appears up left to yod in Times New Roman (already right). But just yod+holam makes no sense on its own, so Ezra SIL is expecting another letter just afterwards, and when I type this expected next letter, alef, Ezra SIL moves the holam to the left and both fonts end up giving the same correct result.

So both fonts appear to differ in the order of applying the rules: Ezra SIL applies the rule only after (or if) the next, expected letter is typed/present/processed, whereas Times New Roman applies all rules (in this case) immediately. This may account for why Ezra SIL is supposed to be more precise (as a font specifically designed for Classical Hebrew), because there are cases where the next glyph changes the typography of the preceding one and even preceding ones.

So I think that after Prince has processed the alef, it should apply a rule to the preceding glyph (the holam) maybe in relation with the glyph even before that (the yod).

The לְנֹ֔חַ case actually shows that rules need to be applied several glyphs before: when the last letter is typed (het ח), as the image on this post shows the placements of the TWO preceding glyphs (single dot = holam and double dot = zaqef qatan) change. This is what Chrome does, and it is also done in real time by Wordpad: typing the het immediately changes the holam and zaqef qatan.

Just in case, the sequence לְנֹ֔חַ is letter: lamed 05DC, dots below: sheva 05B0, letter: nun 05E0, single dot: holam 05B9, double dot: zaqef qatan 0594, letter: het 05D7, and line below: patah 05B7. (the line below, patah, is a bit different in Ezra and Times New Roman in the image above, that's normal, it's a patah furtive and the typography varies for that)
  1. times new roman vs ezra.png20.7 kB

Edited by KalH

mikeday
Right, by using the TTX tool to dump the OpenType GPOS table I have been able to confirm that Ezra SIL has two positioning rules that apply in this situation. Firstly the rule that attaches the holam mark to the preceding base letter, which can be alef, yod, or many others:
<Lookup index="35">
  <LookupType value="4"/>
  <LookupFlag value="1"/>
    <MarkBasePos index="0" Format="1">
      <MarkCoverage Format="1">
        <Glyph value="holam"/>
      </MarkCoverage>
      <BaseCoverage Format="2">
        <Glyph value="alef"/>
        <Glyph value="bet"/>
        <Glyph value="gimel"/>
        ...   
        <Glyph value="yod"/>

Then the more complicated chaining contextual rule, which applies to holam marks that are followed by an alef, and redirects back to the previous simple rule:
<ChainContextPos index="0" Format="3">
  <InputCoverage index="0" Format="1">
    <Glyph value="holam"/>
  </InputCoverage>
  <LookAheadCoverage index="0" Format="1">
    <Glyph value="alef"/>
  </LookAheadCoverage>
  <PosLookupRecord index="0">
    <SequenceIndex value="0"/>
    <LookupListIndex value="35"/>
  </PosLookupRecord>
</ChainContextPos>

Prince can't handle this combination for two reasons. Firstly, we expect mark glyphs to always follow base glyphs. But the second rule matches a mark that precedes a base glyph, and tries to apply a mark-to-base attachment. We don't like that. Secondly, even if we do apply the rule, there is a clash between the rule trying to attach the holam to the yod and the rule trying to attach it to the alef, and I don't think we can resolve the priorities properly at this time.

Now that we understand the issue, we can start figuring out a plan of attack. But it's clearly not going to be a simple fix, as it involves changing the fundamentals of how we apply OpenType GPOS rules.

Edited by mikeday

KalH
Is it really necessary to apply the OpenType glyph positioning rules by hand?

I have used DirectWrite and I didn't have to look up anything in any OpenType GPOS table because the function GetGlyphPlacements "Places glyphs according to the font and the writing system's rendering rules". So applying all these rules is done automatically by DirectWrite, which has "Support for the advanced typography features of OpenType fonts".

GetGlyphPlacements returns a glyphAdvances array (advance width of each glyph) and a glyphOffsets array (horizontal and vertical offset for each glyph).

Aren't these glyph placement numbers enough information when writing to a PDF, replacing all that complicated GPOS table lookup work?

This would make Prince support the OpenType standard completely, which would make Prince useable not only by all the people interested in Biblical Hebrew (of which there are many, not just Jews but also the religious Protestants in the US, seminary students and teachers etc) but it seems likely that even some fonts for the western script do use some of these complicated rules for eg special kerning or ligature looks or any context-sensitive shaping, so without that full OpenType support even latin script texts could give incorrect results.

Alternatively, Chrome gives the correct result for לְנֹ֔חַ and for יֹא

<html dir="rtl">
<body>
<p style="font-size: 400%; font-family: Ezra SIL">
&#x5d9;&#x5b9;&#x5d0;
</p>
<p style="font-size: 400%; font-family: Ezra SIL">
&#x5DC;&#x5B0;&#x5E0;&#x5B9;&#x594;&#x5D7;&#x5B7;
</p>
</body>
</html>



when using Print > Save As PDF, and it is open source so maybe it could be useful to have a look at the code to see how Chrome/WebKit applies the OpenType rules when writing to PDF, if they do it by hand or use some GetGlyphPlacements style already existing function?
mikeday
DirectWrite is a Windows-specific API, and Prince runs on MacOS X and Linux as well, so we need alternative approaches. There are other open-source libraries available, such as Pango.
KalH
Thank you for looking at this issue. Pango sounds great, makes it unnecessary to reinvent the wheel and debug complex issues related to glyph positioning and font rendering.

Have you made any tests with Pango in Prince yet? Looking forward to see if it solves the Biblical Hebrew problems.
mikeday
It will take time. :)
KalH
OK, thanks for making such a great program and good luck with the Pango conversion (or whatever text engine you decide to use)!
KalH
Any progress on this? I'd like to know if I have to find another solution or if Prince is likely to give correct results with Hebrew vowels soon.

Thanks.
mikeday
Probably not soon, no. We have so much to do. :(
mikeday
It took longer than we hoped, but Prince 10 includes a fix for this OpenType issue affecting the Ezra SIL font. Sorry for the delay, and thank you for letting us know about the issue! :)