Forum Bugs

broken PDF, XREF table, Error: /undefined in /BXlevel

gryszkalis
Hi
I'm testing princexml for creating large pdf files (over 5000 pages, over 1000 source html files, typical result is something like 50MB).

It works great but sometimes generated PDF seems to be broken:
- acrobat reader 8 on linux displays "rebuilding" message before opening but the contents look good
- older reader (on windows) refuses to open the file
- ghostscript pdfopt tool displays:
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** Warning:  There are objects with matching object and generation
   **** numbers.  The accuracy of the resulting image is unknown.
Error: /undefined in /BXlevel
Operand stack:
   --nostringval--   --dict:4/5(L)--   Next   --nostringval--   50725196   2562   0   --nostringval--   Title   (\376\377\000W\000e\000z\000w\000a\000n\000i\000e\000 \000d\000o\000 \000z\000a\000p\001B\000a\000t\000y)   Parent   --nostringval--   Dest   --nostringval--   --nostringval--   X421324   0   8   --dict:11/11(ro)(G)--   (n)
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1846   1   3   %oparray_pop   1845   1   3   %oparray_pop   1829   1   3   %oparray_pop   1723   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   --nostringval--   %array_continue   --nostringval--   --dict:4/5(L)--   --nostringval--   2   %dict_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   --nostringval--
Dictionary stack:
   --dict:1147/1684(ro)(G)--   --dict:1/20(G)--   --dict:108/200(L)--   --dict:76/200(G)--   --dict:278/300(ro)(G)--   --dict:20/25(L)--   --dict:9/40(L)--
Current allocation mode is local
Last OS error: 2
Current file position is 36516
GPL Ghostscript 8.64: Unrecoverable error, exit code 1


Note: most files are generated ok, I guess 1 in 10 is faulty.

princexml displays no warnings during processing, like
Wed Jun 10 13:51:22 2009: ---- begin
Wed Jun 10 13:51:22 2009: loading HTML input: 00_eaecde438d6a2fcc4745fb35b4032b6f_0.html
...
Wed Jun 10 13:51:42 2009: loading image: 12d1ee8c5ac41714e59ff4387bec755f.gif
Wed Jun 10 13:51:42 2009: loading image: 0011087.jpg
Wed Jun 10 13:51:42 2009: loading image: 0017947.jpg
...
Wed Jun 10 13:52:19 2009: used font: Arial, Regular
Wed Jun 10 13:52:19 2009: used font: Arial, Bold
Wed Jun 10 13:52:19 2009: used font: Arial, Italic
Wed Jun 10 13:52:19 2009: used font: Georgia, Regular
Wed Jun 10 13:52:56 2009: loading image: 0025088.jpg
...
Wed Jun 10 13:52:56 2009: ---- end
gryszkalis
more info:

I tried to compare broken and valid PDFs and the broken part is:

<</Producer (Prince 6.0 \(www.princexml.com\))
/Title (Untitled)>>
endobj
xref
0 6964
0000000000 65535 f
0000000016 00000 n
0000200885 00000 n
0000222539 00000 n

...

0007379865 00000 n
0007381596 00000 n
--00000000000000018689
Content-Type: application/pdf
Content-Range: bytes 184022-50585853/50725222
075006C002E0020004C006500670069006F006E00F30077002000390033002F00390035>
/Parent 2011 0 R
/Dest [1845 0 R /XYZ 0 826 0]
/Prev 2560 0 R
/Next 2562 0 R>>
endobj
2562 0 obj
<</Title <FEFF00570065007A00770061006E0069006500200064006F0020007A006100700142006100740079>
/Parent 2011 0 R
/Dest [1845 0 R /X421324 00000 n
0007423055 00000 n
0007428473 00000 n
0007430204 00000 n

...

0050585768 00000 n
trailer
<</Info 6963 0 R
/Size 6964
/Root 1 0 R>>
startxref
50585854
%%EOF


while in valid PDF "0007428473 00000 n" sequence is not damaged.

I removed the offending piece of code manually and I still get "Warning: An error occurred while reading an XREF table." but no error (seems to be expected as I removed broken code).
gryszkalis
I forgot to mention that I used 6.0r7 on FreeBSD.

Please tell me if I can provide you with more information.
mikeday
Is this text actually in the PDF file:
Content-Type: application/pdf
Content-Range: bytes 184022-50585853/50725222

This looks like something inserted by the browser; are you downloading the PDF over HTTP and then saving it?
gryszkalis
It's the oryginal generated by princexml. Note that princexml can generate such header (I don't know what for though):

$ strings ./lib/prince/bin/prince | grep Content-Range

Content-Range:
Content-Range: bytes %s%lld/%lld
Content-Range: bytes %s/%lld
mikeday
Hmm, that's strange. The only place that Content-Range would be used is libcurl, but we only use libcurl to make HTTP requests for images and such, and we never make partial downloads. So when you generated this PDF, it didn't pass through HTTP at all, it was generated from the command-line and immediately saved to a file?
gryszkalis
Excuse me for late answer - I tried to find some details and it looks like something mysterious happened :) the file magically healed itself. I still have broken copy (downloaded via http!) but few minutes before you asked your last question I whould bet my head that oryginal (not downloaded) file was broken too.

Anyway, I cannot reproduce the problem now... thanks for support, I'll do more testing next week or so - we'll see if it's ok.

greetings