Forum How do I...?

Arial Unicode and italic

henrike
Hi!

Using PrinceXml 6.

Trying to use Arial Unicode MS and italic. I do get italic, but the font is not Arial - it's some serif font. If I use bold style - it's ignored i.e. not displayed as bold in the PDF.
I have to use Unicode because the html is in several different languages (russian, japanese, polish etc).

This is the style/configuration I use (for italic):

.italicstyle
{
font-family: "ARIALUNI";
/*font-family: @Arial Unicode MS;*/
font-style: italic;
}

@font-face
{
font-family: "ARIALUNI";
src: local("Arial Unicode MS") format("truetype");
}


Why is that? How do I specify an italic (or bold) Unicode Arial font?
mikeday
Are you running on Windows, MacOS X or Linux?

Is there any reason you need to use the @font-face rule, and can't refer to "Arial Unicode MS" as the font-family directly?
henrike
I'm running on Windows.

No, there's no reason (well, there actually IS a reason, but nothing that matters here...).

It doesn't matter if a refer to "Arial Unicode MS" - same result.
henrike
More info (perhaps simpler to understand)...

css:

.mystyle
{
font-family: Arial;
}

html:

<span withmystle><i>Дапта<i></span>

Is rendered (in PDF, with PrinceXml) as italic cyrillic but with a serif font.

But...
css:

.mystyle
{
font-family: Arial;
font-style: italic;
}

html:

<span withmystle><i>Дапта<i></span>

Is rendered (in PDF, with PrinceXml) as cyrillic with a serif font - BUT NOT italic.

Both is rendered correct (italic Arial Cyrillic) in the browser.

Hasn't this todo with some glyph list?
mikeday
Okay, the "Arial Unicode MS" font is not available on bold or italic variants. However, the regular "Arial" font is, so what you can do is use both:
font-family: Arial, Arial Unicode MS, sans-serif;

That way Latin text will use the Arial font, with bold and italic, and text in other scripts such as Chinese will fallback to Arial Unicode MS, without bold and italic.
henrike
Sorry, that's not an option. I want to display italic/bold font in e.g. cyrillic - I don't want a fallback.
I'm sure it's possible to create a PDF with cyrrilic bold or italic. Is it not just PrinceXml not being able to handle it?

Why I used "Arial Unicode MS" in the first place was because PrinceXml produced question marks when using "plain" Arial on e.g Cyrillic characters.
mikeday
Arial Unicode MS does not have bold or italic, and Prince won't attempt to artificially embolden it or slant it. May I ask which platform you are running on? The regular Arial font on Windows XP does appear to support Cyrillic, and includes bold and italic variants.
henrike
I'm running on Windows XP.

You're absolutely right, Arial Unicode MS only supports Roman letters i.e. no italic or bold.
Thus, as you said, the browser (e.g. Internet Explorer) or other applications such as MS Word and more, must do some kind of slanting or embolding of its own. Because, as you say, Arial does not seem to support Cyrillic. But still I can start MS Word and paste some Cyrillic characters using that font (and make it bold italic). And the browser can display Arial in cyrillic italic/bold. I don't know how that's done.

But, I still wonder... in russia - they must be able to produce PDF with bold/italic text which I can download and view. See example here http://www.kfw.de/DE_Home/Bankengruppe_RU.pdf

Sorry for being stubborn, but I really need a solution to this.
mikeday
Actually I find that the Arial font on Windows XP does support Cyrillic, and it can make that work with Prince or in Internet Explorer. If you only specify Arial and use Cyrillic, do you get question marks in the Prince output? How big is arial.ttf on your system? It should be around 367kb.
henrike
My Arial.ttf is 359kb. The problem is not just cyrllic, there's greek, polish etc. (Asian languages doesn't have italic).

If I only specify Arial, I don't get question marks (that had to do with me not specifying utf-8 as the charset in html) I do get cyrrilic - but not Arial, it's seems to be some serif font (Times Roman?).
mikeday
Perhaps you need to install multilanguage support on Windows? Try checking your language settings. Looking at the description for Arial on MyFonts.com it lists that Cyrillic is supported, and on our Windows XP machine it appears to support Cyrillic, and the Arial font installed by the msttcorefonts package on Linux also supports Cyrillic. So I'm afraid I'm really at a loss here. :)
henrike
Have you tried it youself? Did you get Arial cyrrilic to work in PrinceXml on Windows? If you did, then the problem is mine :) . If you didn't I'm sure you will get the same result as I (i.e. cyrrilic characters - but NOT Arial).
mikeday
Yes I did, and it worked: if I say "font-family: Arial" then I can get sans-serif Cyrillic, including bold and italic variants. If I say "font-family: Arial Unicode MS", then I get Cyrillic, but not bold, and the italic falls back to Times New Roman.
henrike
Great! :D I'm going to investigate what I'm missing then...
henrike
Got it!

The problem is really weird. And I do think it's a problem with PrinceXml - please confirm!

It has to do with if I save the css-file with or without the UTF-8 BOM.

Without the BOM it works great. If I save the css-file with the BOM it doesn't work, but I can work around it by specifying a "dummy" css-class first in the file.

I think you have an issue with the UTF-8 BOM, both in the css-file and the html/xml file.
mikeday
Good point, a UTF-8 BOM at the beginning of the CSS file should be ignored. I'll add this issue to the roadmap for the next release.
henrike
This also affect the HTML/XML files. An UTF-8 BOM in those files also makes the conversion to PDF messed up.
mikeday
XML should be fine, but the HTML parser that we are using does have a problem with HTML files that begin with a UTF-8 BOM.
mikeday
We have now released Prince 6.0 rev 6, which correctly handles a BOM at the beginning of CSS files and XML files; unfortunately it still causes character encoding problems for HTML files at this time.
mikeday
Today we have released Prince 6.0 rev 8, which correctly processes HTML documents beginning with a UTF-8 byte order mark.