Forum Bugs

Memory usage in Control mode

struijenid
Hi,

Our Java application starts up a Prince process (v13.5) in control mode and sends it job/dat/pdf chunks in order to convert several HTML + CSS + Javascript files into a single PDF.

We have noticed that the memory claimed by Prince while in control mode is not freed after end-chunks are sent. Memory usage continues to grow until the Prince process has consumed about 1-2GB of RAM.

Is this expected behavior? Is there configuration we can change in order to let Prince free memory it no longer needs? Can we somehow control the amount of RAM reserved for processing?

Thanks!

EDIT: When pushed further, RAM consumption goes up to 4GB after which the Prince process crashes. On a machine with more RAM available, Prince manages to complete its job, claiming up to 5.1GB of RAM. Our output is a concatenation of 1000 times the same 30 page document.

EDIT 2: When we keep running prince in control mode, even after our big output is generated, RAM is not freed. Subsequent HTML to PDF conversions consume more and more RAM until the system can no longer give Prince any more memory and it crashes.

Edited by struijenid

mikeday
We will investigate the memory usage situation for this extremely large document.

In the meantime you may wish to simply restart the control process for each job, the overhead of doing so is measured in milliseconds which should not be a problem for jobs of this size.
struijenid
Hi mikeday,

Thanks for your response. For the record: we are fine with a large document requiring a large amount of RAM in order to complete. What we are surprised about is that even after said large document is generated, the Prince process keeps holding on to memory.

We expected that memory to be freed once an end-chunk is sent...

Kind regards

Edited by struijenid

mikeday
Yes, we would expect that too! However even if the process holds on to the "high watermark" of memory that it has allocated for the largest document processed so far, the memory usage is not supposed to grow indefinitely with subsequent smaller documents, that is definitely something we would like to eliminate.
mikeday
May I ask which operating system you are using and which Prince version you have installed? Also do your test documents include any bitmap images or SVG?
struijenid
I have reproduced this on Ubuntu 18.04, Windows Server 2019 and on Windows 10.

I have attached the HTML to this message for your reference.

EDIT: Forgot to attach the images
Second image that is used is too large to upload on this forum. See https://eoimages.gsfc.nasa.gov/images/imagerecords/74000/74393/world.topo.200407.3x21600x21600.B2.jpg
  1. 4037e0ee-6b7c-442f-95cf-9d0885ebe80b.html42.2 kB
  2. ocean-creatures-size-comparison.jpg3.1 MB

Edited by struijenid

mikeday
Thank you, so you were concatenating this document many times to produce once enormous HTML that you were feeding to Prince, or were you running Prince with many copies of this input document and letting Prince do the concatenation?
struijenid
Indeed, we concatenate that HTML 1000 times into a single big HTML file, let's say 1000repeats.html. Afterwards we try to render a small HTML 1000 times separately. This small html also attached - sorry should have thought of this before.

In other words, we sent chunks as shown below:
job
<job-json to 1000repeats.html>
pdf

job
<job-json to small.html‎>
job
<job-json to small.html‎>
<repeat this job a total of 1000 times>

end


We notice some memory is being freed, but not nearly as much as we were hoping for. Running the following chunk sequence eventually lets our machine run out of RAM too:
job
<job-json to 1000repeats.html>
pdf

job
<job-json to 1000repeats.html>
pdf

job
<job-json to 1000repeats.html>
pdf

<run several times and OOM occurs>
  1. small.html5.5 kB
mikeday
Thanks, we will investigate this issue.
struijenid
Uploaded references to the large images we used
mikeday
We are still investigating ways of working around this issue, complicated by the fact that returning memory to the operating system can cause fragmentation issues on Linux that may also cause problems.

In the meantime I think it would be best to restart the Prince control process after large jobs (for example by calling stop() and then start() in the Java wrapper) as this will guarantee that memory usage is controlled and adds very little overhead, in fact it may even be faster for very large documents.

(The main reason for having one persistent process was to reduce overhead when converting large numbers of very small documents, such as single page invoices, as in that situation the startup overhead could be significant enough to matter).
struijenid
Thank you for the status update. Given the use case, guaranteeing contiguity of the address space does indeed seem far from trivial :)

Our application renders a mixture of large numbers of small PDFs and low numbers of large PDFs. With no way of predicting which kind of request will be next, a few weeks ago, we have implemented the workaround you suggested.

As expected, the performance hit we took was much more noticeable when rendering 1000s of small PDFs compared to rendering a few large PDFs (worst case roughly 20-25% slower). To us this is acceptable for the time being, however we continue to look forward to the fix!
ChristophS
We noticed growing memory usage even when converting the same 11 KiB input file over and over again. I'm attaching
  • a Dockerfile (includes download of the input file),
  • a shell script,
  • a Java program which uses PrinceControl,
  • and resulting memory sizes of 100000 conversions (docker run --rm prince-test ./prince-test sample-page.html 100000). The script output was converted to CSV with another script not attached here.

I'm no expert in interpreting those numbers taken from /proc/$pid/status. I read that VmSize is quite misleading. But overall, this looks suspicious to me.

BTW, the reason why we stumbled upon memory usage was this: Using a Spring Boot service with 1000 Undertow worker threads and a pool of 750 PrinceControls, we started a load test of 1000 (via GNU parallel) curl requests (each doing the same three sequential PrinceControl.converts). This succeeded in 144 seconds. Repeating the same 1000 requests did not finish, instead leading to vast memory usage and near 100% CPU usage by kswapd0.
  1. PrinceTest.java1.2 kB
    PrinceControl caller
  2. prince-test0.7 kB
    bash script
  3. prince-test.docker1.0 kB
    Dockerfile
  4. prince-test_20200828T112018.csv27.0 kB
    results with 100K conversions
mikeday
Thank you, we will investigate.