Forum Bugs

Bangla text is messed up in Firefox 33.1.1

Initiator
Hi Mike,

I've come across an interesting issue in Firefox 33.1.1

I generated a PDF using an alpha build of PrinceXML (http://www.princexml.com/download/prince_20140909-1_ubuntu14.04_amd64.deb)

The PDF looks fine when I download it or view it inline within Firefox.

But when I upload the generated PDF via WordPress and view it again within Firefox, I notice the text is messed up, as indicated in the screenshot.

What could be going wrong?
  1. ebook-fire fox.jpg161.8 kB
    Firefox Bangla letter rendering
mikeday
If you download the file again from WordPress, has it changed at all? Does it look correct in other PDF viewers?
Initiator
If I download the file from WordPress and view it in Adobe Acrobat Reader or the latest IE and Chrome's inbuilt PDF viewers, it looks perfect.

Only in Firefox 33.1.1 on Windows 8 there is a problem.

I am attaching part of the text that is resulting in the problem, which I saved in Notepad with ANSI encoding.

Do you think the problem crept in while the text was copy-pasted between various editors, for instance Notepad to MS Word and then into WordPress' post editor?
  1. problematicText.txt0.8 kB
    Problematic text
mikeday
I'm not sure, but once it is converted to PDF it really shouldn't matter, so this seems like a bug in Firefox pdf.js.
Initiator
Another interesting thing is that this problem doesn't happen with other text. So I am suspecting the "journey" taken by the text might have something to do with this problem, like saving in MS Word in a certain format, then moving to Notepad, and so on. I guess the best workaround would be to ascertain which "journey" works and stick to it.
mikeday
Yes. If you would like us to take a look at it, please attach a small HTML document that replicates the problem.
Initiator
I am attaching a sample HTML file.

Steps to replicate:
1) Generate PDF from sample file using PrinceXML.
2) Upload the generated PDF to WordPress site.
3) View the uploaded PDF in Firefox 33.1.1 (Windows 8 ) PDF Viewer.
4) Notice the garbled text as indicated in the first post of this thread.
  1. BanglaText.html20.5 kB
    Bangla Text in Firefox

Edited by Initiator

mikeday
We need to eliminate step 2. If you download it from WordPress and save it locally, has the file changed?
Initiator
If I download it from WordPress and save it locally, the text looks perfect in Adobe Acrobat Reader.
mikeday
Yes, but what about in Firefox? You can open local PDF files in Firefox too.
Initiator
I don't have access to the problematic machine at the moment. I'll update this thread as soon as I do.
Initiator
Mike, I have confirmed that local PDF files in Firefox look garbled, like that screenshot.
mikeday
Can you attach one of the garbled PDF files, just to make sure I am testing the same file?
Initiator
Here's the file!
  1. The_FIrst_Step_PDF.pdf304.6 kB
    First Step

Edited by Initiator

mikeday
This PDF file has been modified by "Adobe PDF Library 11.0".
Initiator
Mike, you have my apologies. I have now uploaded the correct PDF. The earlier one was generated by PrinceXML and modified. Both have the issue.

Edited by Initiator

mikeday
Thanks, I can replicate the issue with Firefox 34 on Linux.

Can you also attach the HTML input featuring the exact text used to create this PDF? (Or at least one paragraph from it).
Initiator
The HTML file is here: http://www.princexml.com/forum/post/13492/attachment/BanglaText.html

I'd attached it in an earlier post here.
mikeday
Thanks. Some how when I convert it, although I get the same text, there is no problem in Firefox. I'm wondering if some of the CSS is causing the problem, as I don't have access to the style sheets. Are you applying the prince-text-replace property at all?

I've attached a very simple test document that just has a small bit of text using the AdorshoLipi font. This does not cause any problems for me when I view the PDF in Firefox, does it work for you?
  1. new.html0.1 kB
Initiator
I am using this in Prince:
<style>
*{prince-text-replace:\'\200c\' \'\200b\' \'\200d\' \'\200b\'}
</style>


I will test with the sample text you've attached and let you know my findings.
mikeday
Right, none of those characters are in the sample text, so it might not be related.

By the way, why are the single quotes escaped? It should be:
prince-text-replace '\200c' '\200b' '\200d' '\200b'

That will replace U+200C and U+200D with U+200B.
Initiator
The single quotes are escaped because I had written this within PHP tags, and forgot to remove them here :)

By the way, I've filed a bug with Mozilla for the PDF rendering issue:

https://bugzilla.mozilla.org/show_bug.cgi?id=1108301



Edited by Initiator

mikeday
Great, it looks like they will be able to fix the issue.