Forum How do I...?

Trouble generating pdfs with large tables

ghazuria
I seem to be having problems generating pdfs that have tables with over 3500 rows in them. The size of the html string that gets sent to prince is about 7300000 characters long, using the passthru function to generate the pdf. I am able to generate smaller tables without problems. Although, tables larger than 1000 rows do seem to take very long to generate and the prince.exe memory usage goes well above 100MB (close to 400MB when generating a pdf that includes a table with ~3000 rows).

I was wondering if there is a limit on how long the string can be that gets passed to the convert function or any other reason why it wouldn't work for a table that size.

Any help would be appreciated. Thanks.
mikeday
There might be a limitation of PHP once the strings get too big, are you able to write them to a temporary file and run Prince on that instead?
ghazuria
Thanks for the quick reply.

According to PHP documentation, there is no set limit for a string.

I changed the function to use the html file instead and it still does the exact same thing. Files smaller than a certain size came up very fast and without problem. Those that were really large (~4.5MB html file and greater) took an extremely long time and produced no pdf.

Anything else that it might be?
mikeday
Hmm, perhaps running out of memory and killing the process? Could you email me one of these big HTML documents (preferably compressed) so that we can try and reproduce the problem here?
pkmiec
We actually have the same problem. In our case, when prince processes large html files (~ 4000 rows ~ 4.5 meg) it takes about 320 megs of memory. Memory actually stays at around 60 megs for most of the time and only at the end starts to grow. Since we have klimit set at 300 megs, prince errors with out of memory error. Increasing the klimit is one solution.

However, I am also wondering how HTML choices affect prince with regard to speed and memory usage. I already saw the other post about HTML vs. XML, but I am more after:

* Does prince prefer css or inlined styles? I am guess memory usage would go up but speed would increase with inlined styles.

* Would defining column widths help prince layout large tables?

In some cases, we found it useful to split up pdf generated into chunks. We use prince to generate each chunk and the use pdftk to stitch the chunks together. Of course this doesn't work when you care about page numbers or have very large tables. Is there other tricks we can play to help prince deal with very large tables?
mikeday
It is possible to improve the speed with which Prince processes long documents, particularly documents containing thousands of elements such as table cells. One method that we have found is removing any unused id attributes, as thousands of unnecessary id attributes cause a slowdown in Prince. This is a limitation that we will be fixing in a future release.

Memory usage is much more difficult to improve, as it increases at the end as you noticed due to the way in which styles are manipulated internally as the PDF file is generated. Unfortunately the only way to improve this is to reduce the number of elements in the document.

We have raised the priority of this issue on the roadmap and will address it immediately after the next maintenance release. Send me an email (perhaps next week?) if you would like to try an early build of Prince and evaluate performance improvements.
pkmiec
Wow. You guys are super quick!

I'd be very interested in trying out the functionality. I'll get back to you in couple weeks (no pressure :)).
mikeday
Today we have released Prince 6.0 rev 7, which improves performance and reduces memory usage for documents with many elements, such as table cells.
pkmiec
I am amazed! I did some unscientific tests with rev 6 and rev 7 ..

My html test file (13 gigs) contained a table with 70+k rows. With rev 6, memory shot up well over 2 gigs and basically the process never finished due to massive swapping. With rev 7, memory peaked at 1.1 gigs and prince finished in 7mins. The resulting pdf had 2860 pages (7 gigs).

This is a huge improvement.

Great work!!
mikeday
A 13 gigabyte HTML file? Really?? :)