Forum Bugs

Prince PDF without EOF?

nishantvarma
Are there any scenarios where Prince might not put EOF to the pdf blob? We have some PDF's getting stored which doesn't seem to have EOF and our only source is Prince XML for generating them. Just wanted to know if there is a possibility of Prince returning a content that doesn't have EOF Marker.
mikeday
The last few bytes of the PDF file should always be "%%EOF", so perhaps some buffers have been dropped in the server wrapper somewhere?
nishantvarma
Here is our architecture:

Prince installed and deployed via Tomcat.
`Urllib` call from our web app which runs on Zope (Python).

Can you please clarify what is server wrapper?
mikeday
Are you using the Prince Java wrapper with Tomcat? Which convert method are you calling? It would help to test one file and check that the web app is receiving the exact number of bytes that it should be.
nishantvarma
Yes we use the Java Wrapper. We do prince.setHTML(true) and prince.convert(input, out) in our code to get the data. This doesn't happen for all the PDF's. We print a lot of PDFs and we see a very few having this however. So we are investigating the root cause.

About testing one file and testing if it is receiving the exact no: of bytes - this might not be happening for all the time - so are you suggesting us to fire a lot of requests or something like that?

I am not a network expert but shouldn't TCP/IP connection guarantee that? We don't need to check if response bytes match when we use Tomcat right? I am not sure if I need to delve into such details.

Please let me know your inputs, I will research further.

Meanwhile I also reported http://www.princexml.com/forum/topic/3059/how-do-i-understand-a-http-500-error-better which is also happening intermittently and are investigating this again now by logging it.

Edited by nishantvarma

mikeday
Every link in the chain needs to be checked. If the Prince convert method is returning true, then the output buffer should include the complete PDF file. But the Tomcat code needs to make sure it is correctly flushing the stream or closing the response handle (if that is necessary), and the Python code needs to make sure it isn't doing something funny with the response buffer. If you check each point where the data is passed from one system to another, eventually you will find the culprit.
nishantvarma
Thanks. I will have a look and possibly put debug codes to catch the issue in each layer.

Edited by nishantvarma

nishantvarma
Hi Mike,

There is a 100% correlation between HTTP 500 IOException mentioned here http://www.princexml.com/forum/topic/3059/how-do-i-understand-a-http-500-error-better and blob without EOF. Basically we do a retry like this in Python:

while retries:
    try:
        handle = urllib2.urlopen(url)
        blob = handle.read()
    except:
        log(traceback)
        retries = retries - 1
        if not retries:
           raise


retries is 2 by default. Whenever an HTTP 500 happens, it goes to exception and logs it. HTTP 500 is happening because of some enviroment factor which we haven't identified yet. And in the next try more often than not we get HTTP 200 response with incorrupt blob. All the incorrupt blobs have a HTTP 500 error preceding it. The only question for us is it from the Prince/Tomcat side or this Python Logic above.

The main thing is to fix the HTTP 500 which we believe is due to some enviroment factor.

Edited by nishantvarma

mikeday
500 is an Internal Server Error, are there any errors showing in the Tomcat logs?
nishantvarma
It shows 500 response. However the exact error isn't get logged because the logger module is not working as expected. We are fixing that issue so that we could find the real root cause of the Internal Serever Error. It happens sporadically and seems like an issue with the server resource or environment. I will update once we find the cause of the error.

Edited by nishantvarma