Forum How do I...?

Problems with convert_string_to_passthru on Windows

whittaker007
Hi there,

We are running Prince 8.1 on an server running Ububtu 10 which is serving up PDF versions of content on our Drupal web site using a custom module. I have set it up so that I can create PDFs directly using convert_string_to_passthru and indirectly using convert_string_to_file and generating a link to the file in the HTML output.

Because the content may change and we are not expecting a large volume of PDF requests we would much prefer to use the convert_string_to_passthru method to provide direct downloads rather than saving the PDFs on the server.

This is working perfectly on Mac and Ubuntu browsers, but on Windows XP running Chrome or IE8 (I haven't had a chance to test Windows 7 or 8 or other browsers on XP) the resulting file opened in Adobe Acrobat Reader reports the following error:

"Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."

And indeed we get no content in the PDF except the page background colour.

The same process using the convert_string_to_file method does produce a valid PDF on XP, so I figured the problem must be in the way we are serving the files using the convert_string_to_passthru method. But I checked the headers and cleaned the output buffer but nothing seems to help.

The log file reports no errors during this process:

Wed Jul 31 15:03:15 2013: ---- begin
Wed Jul 31 15:03:15 2013: Loading document...
Wed Jul 31 15:03:16 2013: Converting document...
Wed Jul 31 15:03:16 2013: finished: success
Wed Jul 31 15:03:16 2013: ---- end


This is the PHP code we are currently using to output the PDF:

  switch ($output_mode) {
    case OUTPUT_SCREEN:
      drupal_deliver_html_page($input_node);
    break;

    case OUTPUT_DOWNLOAD:
      ob_start();
      $result = $prince->convert_string_to_passthru($input_html);
      $pdf = ob_get_contents();
      ob_end_clean();

      if (!strlen($result)) {
        // handle errors
      } else {
        header("Content-Transfer-Encoding: binary");
        // header("Pragma: public"); // required
        // header("Expires: 0");
        header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
        // header("Cache-Control: private", FALSE); // required for certain browsers
        header("Content-Type: application/pdf");
        header('Content-Disposition: attachment; filename="'.$output_filename.'";');
      // header("Content-Length: ".$fsize);
        print $pdf;
        ob_end_flush(); //now the headers are sent
      }
    break;

    case OUTPUT_FILE:
      // check the output folder exists, create it if it doesn't
      // file_prepare_directory() passes the $dir by reference, so can't use
      // the constant dirstly
      $dir = DOWNLOAD_PATH;
      file_prepare_directory($dir, FILE_CREATE_DIRECTORY);
      @$prince->convert_string_to_file($input_html, $output_realpath, $msgs);
      $input_node = '<h3><a href="' . $output_url . '?' . time() . '">Download the PDF</a></h3><br /><hr /><br />' . $msgs . $input_node;
      drupal_deliver_html_page(drupal_render_page($input_node));
    break;
  }


Any idea what the problem is or how to fix it?
mikeday
One possibility that springs to mind is that Acrobat embedded in IE often makes partial byte range requests for PDF files to load them incrementally, and that might be messing things up. If you are able to check the web server logs when accessing the PDF from IE, you should see some numbers on the request line indicating it is attempting to retrieve only part of the file.

Another way to test this is if you can perhaps right click on the link and select save as or download, or change the content type to "application/octet-stream" so that it doesn't go to Acrobat. If saving the file to the desktop and then loading it with Acrobat works fine, then it is probably the byte range issue. Hopefully there should be a way to workaround this with HTTP headers of some kind.
whittaker007
Hi Mike,

This is happening both on Chrome and on IE, and using the Content-Disposition: attachment the file saves to disk first, and is then opened automatically using Adobe Acrobat. Also using this method there is no download link to right-click.

The incomplete download may be a possibility - is there a way to calculate an appropriate value for the Content-Length header from the output buffer contents created by the return from Prince?

Edited by whittaker007

mikeday
You can use ob_get_length() to get the length of the output buffer.
whittaker007
Thanks, I'm testing these right now. Same issue on Firefox, so pretty safe to say it's browser independent. Using application/octet-stream did not help. Nor does adding the Content-Length header.

The response headers are identical on Windows and Mac. The Content-Length header says the size is 190569 (190.5k) which is what the downloaded file size is on the Mac, but Windows reports the downloaded file size as 187k, though that could be a difference in byte conversion (190569 / 1000 = 190.5 on Mac, but 190569 / 1024 = 186 on Windows).

This theory also seems borne out by the fact that the same PDF produced on the server and then downloaded is exactly the same size. So I'm pretty sure it's not an incomplete download, unless the difference is of a few bytes. And the one produced on the server first, then downloaded does work on Windows.

Edited by whittaker007

whittaker007
It's definitely something to do with the delivery, not Prince. I tried a method where I created the file on the server using convert_string_to_file, then read that file in with fopen and fpassthru with identical results.

There must be something added by the Drupal delivery which is messing up the file response. I tried putting a PHP die; at the end of the ob_end_flush(); and that ends up delivering the octet stream to the screen instead of downloading. I need to figure out a clean way to terminate further processing and deliver the response.
mikeday
When you view the PDF in Firefox is it using Acrobat, or Firefox's new built-in PDF viewer?

Update: never mind, I didn't see your subsequent post.

Edited by mikeday

mikeday
Perhaps it is appending extra HTML to the end of the PDF. You can check this by opening it in a text editor.
whittaker007
Would you be able to take a look? I've attached an example PDF which opens fine on Mac and Linux, but not Windows (at least not Windows XP)
  1. Helping you plan for retirement (10).pdf190.6 kB
    Buggy PDF
mikeday
It looks like this is a Prince issue, something strange about one of the patterns in the PDF that is making Acrobat very unhappy. Would you be able to email me (mikeday@yeslogic.com) the input HTML document?
mikeday
Another possibility is to disable PDF compression, which might make it a bit easier to tell what is going on.
mikeday
It turns out that it was SVG patterns nested inside patterns, which was not supported properly in Prince 8.1 and resulted in a PDF file that Acrobat could not load. Prince 9 has support for nested patterns, which fixes the problem.
whittaker007
Thanks Mike!