Forum Bugs

No word spacing

dgcarlson
When converting documents with prince, I have found that each word is a separate text element and that the words do not have a space at the end. Each text element is positioned using a transform matrix. Is there a way that I can get prince to output a space at the end of each word?
mikeday
No, not really. What is the need for this? Is it related to text selection on ebook readers, as came up in this thread?.

Edited by mikeday

dgcarlson
Not exactly. We are pulling the text out of the pdf and sending it to a text to speech engine. It reads the line as a single word since there are no spaces. So is there no way to add a space between words? I have opened several PDFs in a Cos-view program, and it looks like every word is added with a seperate BT, using a matrix to position, so it is safe to assume that at the time you build the pdf you know each word.
mikeday
If you are creating the PDF from HTML content, can you send the HTML to text-to-speech instead? :)

Are you using your own code to extract text from the PDF, or a third-party tool? I'm wondering if tagged PDF might be easier for you to process.
dgcarlson
Unfortunately, option 1 is out. Nice try, though :)

The issue is that we receive content that is already in pdf format, so we cannot change the way we are processing pdfs. So far, after over 300,000 documents, we have not found any other document that cause this issue with our TTS engine.

Is there no way on your end to add a space to the end of the words?