Forum How do I...?

Arabic font issue since updating to Prince 9

rosebud
Hi!
We have some reports that we've been generating with Prince 7 for awhile. We offer these reports in many languages, including ar-EG and jp-JP, etc. We are in the process of upgrading to Prince 9, and we're seeing some issues with our Arabic and Japanese.
Our Arabic is rendering the characters as block letters - and I was told that Arabic needs to be rendered as cursive Arabic characters, or else it doesn't make sense.
Arabic on Production running Prince 7 - fonts loaded:
Screen Shot 2014-09-23 at 2.34.29 PM.png


Arabic on QA running Prince 9 - fonts loaded:
Screen Shot 2014-09-23 at 2.34.47 PM.png




Our Japanese is showing up with random characters, and our Chinese zh-hans-CN is showing up in different characters [though a co-worker from China said that though the characters look different, they have the same meaning]

Japanese Production running Prince 7 - fonts loaded:
Screen Shot 2014-09-23 at 2.38.30 PM.png


Japanese on QA running Prince 9 - fonts loaded:
Screen Shot 2014-09-23 at 2.38.44 PM.png


As we are updating Prince, we are starting with fresh clean jobservers - so they don't have all the same fonts loaded as the old jobservers did - in the past we loaded a bunch of fonts to get things working, and we never went back to see which ones were really necessary. We're trying to be leaner with our new set up.

The main difference I see is the absence of CID and Identity H on our new jobservers. Are those needed for the correct rendering of the Arabic, Japanese and Chinese fonts?
Thank you!
  1. Screen Shot 2014-09-23 at 2.34.29 PM.png142.2 kB
    Prod Arabic Fonts loaded
  2. Screen Shot 2014-09-23 at 2.34.47 PM.png124.2 kB
    QA Fonts loaded.
  3. Screen Shot 2014-09-23 at 2.38.30 PM.png163.7 kB
    Prod Japanese
  4. Screen Shot 2014-09-23 at 2.38.44 PM.png170.0 kB
    QA Japanese
rosebud
And I apologize if my question and background info do not make sense completely. I didn't know how else to word it.
Thank you!
mikeday
Which operating system are you running Prince on?

For Arabic, what does the text look like? Are you getting any warning messages from Prince about being unable to find glyphs for certain characters?
rosebud
Prince 9 running on: Linux **** 2.6.32-431.20.3.el6.x86_64 #1 SMP Fri Jun 6 18:30:54 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Prince 7 running on: Linux **** 2.6.18-371.9.1.el5 #1 SMP Tue May 13 06:52:49 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Here is how a piece of Arabic text is rendering on Prince 9
Screen Shot 2014-09-24 at 10.37.54 AM.png


Same piece of text as rendered in Prince 7
Screen Shot 2014-09-24 at 10.39.02 AM.png



I'm not getting any warning messages when I run Prince 9 at the command line.


  1. Screen Shot 2014-09-24 at 10.37.54 AM.png6.5 kB
    Prince 9 Arabic
  2. Screen Shot 2014-09-24 at 10.39.02 AM.png5.5 kB
    Prince 7 Arabic
mikeday
We have seen this issue affecting Arabic before, and it involved an apparently broken version of the Microsoft Core Fonts.

An easy way to install the Microsoft fonts on CentOS systems is described here:
yum install curl cabextract xorg-x11-font-utils fontconfig
rpm -i https://downloads.sourceforge.net/project/mscorefonts2/rpms/msttcore-fonts-installer-2.6-1.noarch.rpm


Alternatively, you could install the kacst-fonts package (yum install kacst-fonts) and update the Prince fonts.css to include these fonts for Arabic text.
rosebud
Our Sys Admin says the fonts installed are identical between our two machines - the only difference he can see is in the encoding. On current prod [running Prince 7] there
is a (CID) next to the font type and the encoding is identity-h instead of
default.

Could a difference in the encoding with the different versions of Prince
(CID and identity-h vs default) be the culprit?


Thank you!
mikeday
The encoding depends on what characters in the font have been used within that particular PDF. For example, Prince will use MacRomanEncoding if only a subset of Latin characters have been used.

Can you try converting a simple test document, eg. containing a single Arabic word, using Times New Roman?
rosebud
Here is a test doc generated on one of our new jobservers, running Prince 9, using Times New Roman.
arabicTimes.pdf‎
  1. arabicTimes.pdf8.6 kB

Edited by rosebud

mikeday
Unfortunately this PDF file has been modified by MacOS X?
rosebud
That must've happened after I scp'd it to my desktop from the job server, and I opened it to make sure I was attaching the correct pdf to this thread? I will try again tomorrow and attach it after scp'ing it, without opening it locally. Should I zip it first to avoid OS X messing with it? Sorry about that :(
Thank you!

Edited by rosebud

mikeday
Yes, that might help. And you can check the size of times.ttf (or Times_New_Roman.ttf) on the server? It should be 330412 bytes.
rosebud
Here is a zipped copy of the PDF.
arabicTimes.pdf.gz‎
I will get back to you on the size of times.ttf.

Thank you!
  1. arabicTimes.pdf.gz8.9 kB
    zipped Arabic Times Prince 9
rosebud
Here are our times.tff installs:


On current prod [still running Prince 7 - showing Arabic correctly]

root@jsprod1b msttcorefonts]# ll times.ttf
-rw-r--r-- 1 root root 330412 Jul 17 2006 times.ttf

On new jdev boxes [running Prince 9 - Arabic missing ligatures.]

root@jdev1 msttcore]# ll times.ttf
-rw-r--r-- 1 root root 406760 Nov 15 2006 times.ttf


Our sys admin added this : "The times.ttf are different sizes on the 2 servers. The MS fonts were
installed exactly how he recommended. And it is consistent across all 4
servers(2 new prod job servers I am working on). So probably not one bad
install. Might just have been updated. They do have different creation
dates."
rosebud
I googled
https://www.google.com/search?q=406760++times.ttf&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=fflb#rls=org.mozilla:en-US:official&channel=fflb&q=406760++times.ttf+330412++times.ttf

And found this:
http://sourceforge.net/projects/mscorefonts2/files/rpms/

The byte size of the times.ttf that you suggest we use matches this:

Viewing cabinet: times32.exe
File size | Date Time | Name
-----------+---------------------+-------------
58 | 11.07.2000 16:49:20 | fontinst.inf
330412 | 11.05.2000 11:45:18 | Times.TTF
333900 | 11.05.2000 11:45:20 | Timesbd.TTF
238612 | 11.05.2000 11:45:22 | Timesbi.TTF
247092 | 11.05.2000 11:45:22 | Timesi.TTF
45568 | 12.11.1998 19:43:34 | FONTINST.EXE



The times.ttf we have installed on our new machines matches this:


Viewing cabinet: EUupdate.EXE
File size | Date Time | Name
-----------+---------------------+-------------
364676 | 15.11.2006 11:02:26 | Arial.ttf
349232 | 15.11.2006 11:02:24 | ArialBd.ttf
225848 | 15.11.2006 11:02:14 | ArialBI.ttf
206708 | 15.11.2006 11:02:16 | ArialI.ttf
45568 | 12.11.1998 19:43:34 | fontinst.exe
406760 | 15.11.2006 11:02:30 | Times.ttf
395104 | 15.11.2006 11:02:28 | TimesBd.ttf
237720 | 15.11.2006 11:02:18 | TimesBI.ttf
246400 | 15.11.2006 11:02:16 | TimesI.ttf
186192 | 06.12.2006 12:57:58 | Verdana.ttf
153540 | 06.12.2006 12:57:58 | Verdanab.ttf
175012 | 06.12.2006 12:58:00 | Verdanai.ttf
173840 | 06.12.2006 12:58:02 | Verdanaz.ttf
141836 | 09.10.2006 14:49:34 | trebucit.ttf
136172 | 09.10.2006 14:49:32 | trebuc.ttf
126376 | 09.10.2006 14:49:32 | trebucbd.ttf
133716 | 09.10.2006 14:49:32 | trebucbi.ttf
216 | 27.04.2007 16:06:38 | fontinst.inf

************
When I shared this with our sys admin, he replied,
"The rpm they list there is the one I
installed. If you read closely you will see that all the listed cab files
are included in the rpm install. It appears the EUupdate overwrites a few
of the fonts including arial. Odd that they would put conflicting font
files in one install."
rosebud
Mike, Do you think our issue could be related to the EUupdate version of Times.ttf overwriting the 330412 times.ttf?
Thank you!
mikeday
Yes, I do. Can you try with the 330412 byte font?
rosebud
We copied Arial and Times from our prod machines [that are running prince 7 with Arabic appearing correctly] to our new Prince 9 machines, and generated these 2 pdfs.
Arial:
arabicArial.pdf.gz‎

Times:
arabicTimes2.pdf.gz‎
  1. arabicArial.pdf.gz9.2 kB
    Arabic Arial
  2. arabicTimes2.pdf.gz10.4 kB
    Arabic Times
mikeday
That looks good, yes? :)
rosebud
It does look good. But when I tried a few different programs/environments to open the Arabic, I was noticing differences.
When I open the 2 test pdfs in OS X/Preview, I see differences in the way the ligatures are rendered.
Screen Shot 2014-10-03 at 10.05.28 AM.png


I went to my old linux machine and opened the zipped files from above, and they look identical there?
arabicUbuntu.png


I then opened a pdf that was generated from our new Prince 9 environment, and viewed it inside the browser [Firefox v31.0]. That pdf is using "font-family: Arial, Verdana, Helvetica, sans-serif, AR PL KaitiM GB;" The Arabic ligatures were missing in the PDF viewed in the browser. I saved that same PDF from the browser, and opened it in Adobe Reader, and the ligatures were correct.
Screen Shot 2014-10-03 at 10.04.26 AM.png

LEFT: firefox RIGHT: adobe reader with same pdf, downloaded from FF browser


I guess what I'm wondering is if there is a way to have the Arabic appear consistent, no matter what is viewing the PDF? Preview, or Firefox, or Adobe Reader, etc. Does this have anything to do with the embedded subset vs embedded version of fonts? the TrueType (CID) vs plain TrueType? The Identity-H
encoding vs built-in?

Thank you!
  1. Screen Shot 2014-10-03 at 10.04.26 AM.png81.7 kB
    left is in ff, right is in adobe reader
  2. Screen Shot 2014-10-03 at 10.05.28 AM.png160.0 kB
    arial vs times on OS X in Preview
  3. arabicUbuntu.png73.6 kB
    arabic arial vs times on ubuntu

Edited by rosebud

mikeday
This is getting very confusing. In your first screenshot, you are comparing arabicTimes.pdf, which is broken, with arabicArial.pdf, which is working. You said you are loading both of these in MacOS X Preview.

In your second screenshot, you are loading them on Ubuntu. The arabicArial.pdf on the right looks the same as on MacOS X, but the PDF on the left also seems to be Arial, not Times New Roman. But I can't see the filenames on the screenshot.

It should not matter where you are viewing the PDF, they should look the same in all viewers. Only where it was generated should matter, as that will determine which fonts are used.

Are you using the same PDF file to test each viewer? Please avoid resaving the PDF from different programs, as this could potentially confuse the results.
rosebud
I was using the same file - and on each machine I was opening it directly from this forum thread. I must have messed something up.

I have a question about Japanese fonts now :)

We have a character [for example] that is showing up as a question mark
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=6226



In the prince-9.0r2-macosx/lib/prince/style/fonts.css, I saw that the Japanese fonts were set for unicode ranges.

@font-face {
font-family: serif;
unicode-range: U+3040-309F, U+30A0-30FF, U+4E00-9FBF;
src: local("Hiragino Mincho Pro") /* Japanese */
}

@font-face {
font-family: sans-serif;
unicode-range: U+3040-309F, U+30A0-30FF, U+4E00-9FBF;
src: local("Hiragino Kaku Gothic Pro") /* Japanese */
}


what purpose are the unicode-ranges for?

Thank you!

Edited by rosebud

mikeday
The issue here is Han unification; U+6226 is a Chinese character as well as a Japanese character. By default, Prince uses Chinese fonts for these characters, so the unicode-range is to restrict the Japanese fonts to Hiragana and Katakana, which are exclusively Japanese scripts. In order to use the Japanese fonts instead of Chinese fonts, you can remove the unicode-range. However, that will make it difficult if you also process Chinese text. The best way is to tag the text as being Japanese or Chinese, and then select an appropriate font.
rosebud
Thank you Mike! That was a great explanation!