Forum Feature requests

Using better compression and PDF size optimization

IZh
Hi!

The Prince PDF generator is excellent but it produces large documents.
For example, web-developer's PDF-version of HTML5 standard has a size of 20 MB.

But there are some open-source PDF-optimizers like:
https://code.google.com/p/pdfsizeopt/
(There is a white-paper describing what methods are used.)

After using pdfsizeopt, the size of the document is shrinked to 7,5 MB without loosing quality.

I think, it is worth it.

Thank you.

Edited by IZh

jim_albright
What is time to compress and decompress?

Jim Albright
Wycliffe Bible Translators

IZh
It depends on what options are used and what is inside your document.
I have original file of 20,929,761 bytes (with 177 PNG images in it). The document is mostly textual.

I have tested pdfsizeopt with different options (without multivalent):
--use-pngout=false --use-jbig2=false   19.8s 7,803,424
--use-pngout=false --use-jbig2=true    19.8s 7,803,402
--use-pngout=true  --use-jbig2=false 3m27.6s 7,764,166
--use-pngout=true  --use-jbig2=true  3m32.5s 7,764,146


As you may see the most of the result is achieved even with disabled all optional utilities.

jbig2 works fast, but I don't have much JPEGs inside my document, so there are almost no impact of using jbig2 nor in size, nor in speed.

The pngout PNGs optimizer works very slow. If you have to create PDF from the same original document it may have sense to optimize original PNGs (before embedding it in PDF) and don't use pngout during PDF optimization. The pngout can considerably reduce size of some PNGs (typically with lots on one-colored areas), but it needs time to try different compression options to find the best one.

I think, the Prince can use some generic PDF optimization techniques that will reduce the size considerably without loosing any visual quality.

As for decompress, I don't quite understand. I don't know how much time PDF reader needs to open optimized version. I didn't compared it.

I optimized documents for web, so the size does matter. I don't think that on modern PCs the optimized document will be opened significantly slower than original.

And it could be made as a command-line option for Prince -- what levels of optimization to use. ;-)
mikeday
Which input HTML document created the 20Mb output file?
IZh
http://www.whatwg.org/specs/web-apps/current-work/
This is WHATWG's version of HTML5 specification.
It is a large document (about 6 MB as I remember). ;-)
mikeday
Thanks. The PDF size seems to be from the number of links in the document, resulting in many annotation objects. The 20Mb PDF file will shrink to 6Mb if you disable links:
:link { prince-link: none !important }

Perhaps in the future we can improve this by using compressed object streams in the PDF, but for the time being it will be necessary to run pdfsizeopt as an external command.