Forum Bugs

Solaiman Lipi font in Bangla is not being rendered properly

Initiator
I am using PrinceXML 9.0 with Ubuntu 14.04 on a 64-bit Intel machine.

A Bangla font called SolaimanLipi seems to be rendered incorrectly. Please take a look at the screenshot attached to see the misaligned letters.

It will be wonderful to have a fix for this!
  1. SolaimanLipiBroken.jpg51.5 kB
    The misaligned letters have been circled
mikeday
Is this font installed by default on Ubuntu? Do you know which package it comes from?
Initiator
No, this font isn't available by default. I obtained it from http://www.omicronlab.com/bangla-fonts.html
mikeday
Thanks, can you attach a short sample HTML document that displays incorrectly?
Initiator
Hi Mike, I am attaching a txt file with the text that shows up the problem. You can apply the font from this link: http://www.omicronlab.com/download/fonts/SolaimanLipi_20-04-07.ttf

You might need to open the file in Notepad - it appears garbled in the browser.
  1. text.txt0.4 kB
    Text with the issue.

Edited by Initiator

mikeday
Thanks, I will take a look at this.
mikeday
It seems that there are some control characters in that text, specifically U+200C "zero width non-joiner" which is interfering with OpenType shaping. If they are removed, then the glyphs align correctly. Do you know if they are there for a reason, or perhaps they were added accidentally by some other system?
Initiator
I am not sure why those control characters have crept in. I am using WordPress to add content, by the way.

Which software did you use to observe them?

I tried using Notepad++ in Windows. I copied the problematic text from the PDF, pasted it in Notepad++ and then chose 'Show All Characters'. I saw nothing strange.
mikeday
The characters will not show up in the PDF, only in the original HTML. You can remove them with a CSS rule like this:
p { prince-text-replace: '\200c' '' }

However, it would be nice to know why they are there in the first place, and whether they are important. :)
Initiator
The CSS rule partly fixed the problem. I still see misalignment in the letters.

I've attached a screenshot indicating where the misalignment starts. The offending letters appear slightly lower than the rest, as you'll be able to notice.

At this point, I am unable to ascertain why the control characters are there. I'll update this thread once I have the information.
  1. still_misaligned.png4.2 kB
    Misalignment
mikeday
Can you attach the text for the misaligned letters? Perhaps there are other control characters that Prince needs to handle differently.
Initiator
Here goes!
  1. text.txt0.3 kB
    More misaligned text.
Initiator
Hi Mike.

Just wondering if you got a chance to investigate further on this.

Thanks!
mikeday
It seems that even after the U+200C control characters are removed, there is still a vertical alignment issue for the character U+09B9 (BENGALI LETTER HA) but only when followed/combined with U+09CD (BENGALI SIGN VIRAMA). For some reason this mark results in the combined glyph moving down and breaking the baseline. We will continue our investigation.
mikeday
It seems that the glyphs themselves are printed at different heights, without any OpenType positioning adjustments being made at all. That is very strange, as one would assume that the glyphs would all appear at the same vertical position by default.
Initiator
All recent browsers render these characters correctly, so you'll probably be working to apply a fix in PrinceXML for this?
mikeday
Yes, as soon as we figure out exactly what is going on. :)

It seems to be related to the interplay of two different OpenType substitution features: "half" and "haln". Both of these substitution features apply to the sequence of ha+virama (halanta/hasanta?) characters mentioned earlier. However, the "half" feature produces the expected correct glyph, and the "haln" feature produces a glyph that is pushed down slightly. I don't know why, but this appears to be an error in the font, and can be confirmed with a font editor.

Still, the browsers make it work. It seems they are applying the "half" feature and Prince is not. The OpenType recommendations for Bengali are to apply both features, with "half" first. However, it appears that we only apply it to "pre-base" consonants, and we don't consider the ha character to be a pre-base consonant in this context. This may not be correct; unfortunately we are not experts in every script that we have implemented. :)

Checking the error console in Firefox, I get these errors:
downloadable font: GDEF: Invalid offset to glyph classes, table discarded (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: Layout: Bad lookup flags 8 (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: Layout: Failed to parse lookup 11 (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf
downloadable font: GSUB: Failed to parse lookup list table, table discarded (font-family: "SolaimanLipi" style:normal weight:normal stretch:normal src index:0)
source: file:///home/mikeday/bangla/SolaimanLipi_20-04-07.ttf

mikeday
Sorry, pressed post a bit too quickly there. Continuing: it seems that the font has some serious problems. Firefox doesn't like it, and I can't convince Chrome to load it at all. Given that other Bengali fonts appear to be working fine, this may be more of a font problem than a Prince problem.
Initiator
Cool! Thanks for the long explanation.
Initiator
Mike, I just got back to ask, which exact Bengali fonts work fine with PrinceXML? I'll replace SolaimanLipi with one of those.

Thanks!
mikeday
On Ubuntu, Prince will use "Ani" and "Mukti Narrow" from the ttf-bengali-fonts package.
Initiator
I tried using Ani, and attempted to remove the control character U+200C using the CSS rule:

p { prince-text-replace: '\200c' '' }


This has no effect.

What could I be missing out? You could refer to the sample text I've attached.

Thanks for all your help!
  1. new_zwnj_issue.txt0.1 kB
mikeday
Do you get similar results for the document I have attached below?
  1. new.html0.3 kB
  2. new.pdf16.9 kB
Initiator
Not really, the PDF you attached has different problems.

I am attaching a pic with an indication of a strange vertical line that is apparently resulting from U+200C.
  1. strange_vertical_line.png5.2 kB
mikeday
Right. But the prince-text-replace rule removes that line, when I test it here (eg. new.html). What are the remaining problems?
Initiator
I am trying to remove U+200D as well. What is the correct way to remove multiple control characters?
mikeday
You can have multiple replacement pairs like this:
prince-text-replace: '\200C' '' '\200D' ''
Initiator
That works. But in another place, I've noticed   in the HTML source, which results in garbled text in the PDF.

I cannot remove   since that will delete all spaces from the entire page.

Is there a regex I can use?
mikeday
If you had a specific text fragment you could replace "a\A0 b" with "a b". If you want true regular expressions, then you will need to do this with JavaScript.