Solaiman Lipi font in Bangla is not being rendered properly

Initiator
27 Aug 2014

I am using PrinceXML 9.0 with Ubuntu 14.04 on a 64-bit Intel machine.

A Bangla font called SolaimanLipi seems to be rendered incorrectly. Please take a look at the screenshot attached to see the misaligned letters.

It will be wonderful to have a fix for this!

SolaimanLipiBroken.jpg‎ 51.5 kB ‎
The misaligned letters have been circled

mikeday
27 Aug 2014

Is this font installed by default on Ubuntu? Do you know which package it comes from?

Initiator
27 Aug 2014

No, this font isn't available by default. I obtained it from http://www.omicronlab.com/bangla-fonts.html

mikeday
27 Aug 2014

Thanks, can you attach a short sample HTML document that displays incorrectly?

Initiator
28 Aug 2014

Hi Mike, I am attaching a txt file with the text that shows up the problem. You can apply the font from this link: http://www.omicronlab.com/download/fonts/SolaimanLipi_20-04-07.ttf

You might need to open the file in Notepad - it appears garbled in the browser.

text.txt‎ 0.4 kB ‎
Text with the issue.

Edited 28 Aug 2014 by Initiator

mikeday
28 Aug 2014

Thanks, I will take a look at this.

mikeday
29 Aug 2014

It seems that there are some control characters in that text, specifically U+200C "zero width non-joiner" which is interfering with OpenType shaping. If they are removed, then the glyphs align correctly. Do you know if they are there for a reason, or perhaps they were added accidentally by some other system?

Initiator
29 Aug 2014

I am not sure why those control characters have crept in. I am using WordPress to add content, by the way.

Which software did you use to observe them?

I tried using Notepad++ in Windows. I copied the problematic text from the PDF, pasted it in Notepad++ and then chose 'Show All Characters'. I saw nothing strange.

mikeday
29 Aug 2014

The characters will not show up in the PDF, only in the original HTML. You can remove them with a CSS rule like this:

p { prince-text-replace: '\200c' '' }

However, it would be nice to know why they are there in the first place, and whether they are important.

Initiator
29 Aug 2014

The CSS rule partly fixed the problem. I still see misalignment in the letters.

I've attached a screenshot indicating where the misalignment starts. The offending letters appear slightly lower than the rest, as you'll be able to notice.

At this point, I am unable to ascertain why the control characters are there. I'll update this thread once I have the information.

still_misaligned.png‎ 4.2 kB ‎
Misalignment

mikeday
30 Aug 2014

Can you attach the text for the misaligned letters? Perhaps there are other control characters that Prince needs to handle differently.

Initiator
30 Aug 2014

Here goes!

text.txt‎ 0.3 kB ‎
More misaligned text.

Initiator
1 Sep 2014

Hi Mike.

Just wondering if you got a chance to investigate further on this.

Thanks!

mikeday
1 Sep 2014

It seems that even after the U+200C control characters are removed, there is still a vertical alignment issue for the character U+09B9 (BENGALI LETTER HA) but only when followed/combined with U+09CD (BENGALI SIGN VIRAMA). For some reason this mark results in the combined glyph moving down and breaking the baseline. We will continue our investigation.

mikeday
1 Sep 2014

It seems that the glyphs themselves are printed at different heights, without any OpenType positioning adjustments being made at all. That is very strange, as one would assume that the glyphs would all appear at the same vertical position by default.

Initiator
1 Sep 2014

All recent browsers render these characters correctly, so you'll probably be working to apply a fix in PrinceXML for this?

mikeday
1 Sep 2014

Yes, as soon as we figure out exactly what is going on.

It seems to be related to the interplay of two different OpenType substitution features: "half" and "haln". Both of these substitution features apply to the sequence of ha+virama (halanta/hasanta?) characters mentioned earlier. However, the "half" feature produces the expected correct glyph, and the "haln" feature produces a glyph that is pushed down slightly. I don't know why, but this appears to be an error in the font, and can be confirmed with a font editor.

Still, the browsers make it work. It seems they are applying the "half" feature and Prince is not. The OpenType recommendations for Bengali are to apply both features, with "half" first. However, it appears that we only apply it to "pre-base" consonants, and we don't consider the ha character to be a pre-base consonant in this context. This may not be correct; unfortunately we are not experts in every script that we have implemented.

Checking the error console in Firefox, I get these errors:

downloadable font: GDEF: Invalid offset to glyph classes, table discarded (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: Layout: Bad lookup flags 8 (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: Layout: Failed to parse lookup 11 (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: GSUB: Failed to parse lookup list table, table discarded (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf

mikeday
1 Sep 2014

Sorry, pressed post a bit too quickly there. Continuing: it seems that the font has some serious problems. Firefox doesn't like it, and I can't convince Chrome to load it at all. Given that other Bengali fonts appear to be working fine, this may be more of a font problem than a Prince problem.

Initiator
1 Sep 2014

Cool! Thanks for the long explanation.

Initiator
1 Sep 2014

Mike, I just got back to ask, which exact Bengali fonts work fine with PrinceXML? I'll replace SolaimanLipi with one of those.

Thanks!

mikeday
2 Sep 2014

On Ubuntu, Prince will use "Ani" and "Mukti Narrow" from the ttf-bengali-fonts package.

Initiator
2 Sep 2014

I tried using Ani, and attempted to remove the control character U+200C using the CSS rule:

p { prince-text-replace: '\200c' '' }

This has no effect.

What could I be missing out? You could refer to the sample text I've attached.

Thanks for all your help!

new_zwnj_issue.txt‎ 0.1 kB

mikeday
2 Sep 2014

Do you get similar results for the document I have attached below?

new.html‎ 0.3 kB
new.pdf‎ 16.9 kB

Initiator
2 Sep 2014

Not really, the PDF you attached has different problems.

I am attaching a pic with an indication of a strange vertical line that is apparently resulting from U+200C.

strange_vertical_line.png‎ 5.2 kB

mikeday
2 Sep 2014

Right. But the prince-text-replace rule removes that line, when I test it here (eg. new.html). What are the remaining problems?

Initiator
2 Sep 2014

I am trying to remove U+200D as well. What is the correct way to remove multiple control characters?

mikeday
2 Sep 2014

You can have multiple replacement pairs like this:

prince-text-replace: '\200C' '' '\200D' ''

Initiator
2 Sep 2014

That works. But in another place, I've noticed   in the HTML source, which results in garbled text in the PDF.

I cannot remove   since that will delete all spaces from the entire page.

Is there a regex I can use?

mikeday
2 Sep 2014

If you had a specific text fragment you could replace "a\A0 b" with "a b". If you want true regular expressions, then you will need to do this with JavaScript.

Forum › Bugs

Solaiman Lipi font in Bangla is not being rendered properly