Forum How do I...?

Generate PDF from HTML string

jehrenzweig
I'm trying to generate a PDF from an HTML string. I'm using Prince 8.1 and the .NET wrapper in a console application... when I write the HTML code to a file and use that as the input, it works. However, when I try to convert using the same HTML code as a string, it fails.

Here's the code I'm using to test:
using System;
using System.IO;
using System.Text;

namespace PrinceXmlTest
{
	class Program
	{
		static void Main(string[] args)
		{
			var prince = new Prince(@"C:\Program Files (x86)\Prince\Engine\bin\prince.exe");
			prince.SetLog(@"C:\Program Files (x86)\Prince\Engine\bin\log.txt");
			prince.SetHTML(true);

			const string directoryPath = @"C:\pdfs";
			const string inputFilename = "test.html";
			const string fileTestOutputFilename = "file-test.pdf";
			const string stringTestOutputFilename = "file-string.pdf";

			// Generate all the file paths needed to generate PDFs.
			string inputFilePath = String.Format(@"{0}\{1}", directoryPath, inputFilename);
			string fileTestOutputPath = String.Format(@"{0}\{1}", directoryPath, fileTestOutputFilename);
			string stringTestOutputPath = String.Format(@"{0}\{1}", directoryPath, stringTestOutputFilename);

			// Get the contents of the input file.
			string inputFileContents;

			using (var streamReader = new StreamReader(inputFilePath, Encoding.UTF8))
			{
				inputFileContents = streamReader.ReadToEnd();
			}

			// SUCCESS! Convert an HTML document into a PDF file.
			//prince.Convert(inputFilePath, fileTestOutputPath);

			// FAILURE! Convert an HTML document into a PDF file.
			prince.Convert(inputFileContents, stringTestOutputPath);
		}
	}
}


Here is the HTML code I've written to the input file ("test.html"):
<!DOCTYPE html>
<html>
	<head>
		<title>Test</title>
	</head>
	<body>
		Ahoy!
	</body>
</html>


And here is the log file output when it tries to process the HTML string:
Fri Feb 22 13:07:54 2013: ---- begin
Fri Feb 22 13:07:54 2013: Loading document...
Fri Feb 22 13:07:54 2013: warning: failed to load external entity "<!DOCTYPE html>  <html>  	<head>  		<title>Test</title>  	</head>  	<body>  		Ahoy!  	</body>  </html>"
Fri Feb 22 13:07:54 2013: <!DOCTYPE html>  <html>  	<head>  		<title>Test</title>  	</head>  	<body>  		Ahoy!  	</body>  </html>: error: could not load input file
Fri Feb 22 13:07:54 2013: error: no input documents to process
Fri Feb 22 13:07:54 2013: finished: failure
Fri Feb 22 13:07:54 2013: ---- end


I'm wondering if there's a bug in the DLL code that tells prince.exe to parse the string as HTML. The warnings in the log file look like they pertain to parsing the HTML string, but the string is definitely valid -- I ran it through the W3C Validator ( http://validator.w3.org/#validate_by_input+with_options ) and it passed, so I can't think of any other reason that it would be producing such warnings.

Any ideas?
mikeday
You need to call ConvertString, not just Convert. Otherwise the string argument will be treated as a filename.
jehrenzweig
Derp... well that makes more sense. I eventually got things to work by using the Convert() method overload that takes a Stream object, but I expected I was doing something fundamentally wrong. Thank you!
jehrenzweig
Also, I just realized why I didn't use ConvertString() in the first place: it only allows you to output the PDF as a Stream object. There's no overload that allows you to pass in a string pdfPath parameter to save the generated PDF to.

I got around it by writing the following extension method, which converts a string into a Stream object, and then using it to call the Convert() method.
public static Stream ToStream(this string s)
{
	var stream = new MemoryStream();
	var writer = new StreamWriter(stream);
	writer.Write(s);
	writer.Flush();
	stream.Position = 0;
	return stream;
}


And here's some code to illustrate how I used it:

var prince = new Prince(@"C:\Program Files (x86)\Prince\Engine\bin\prince.exe");
var html = "<html><head></head><body>Hello!</body></html>";
var outputFilePath = @"C:\docs\html.pdf";
prince.Convert(html.ToStream(), outputFilePath);
mikeday
Good point, maybe we could add another method in the future, ConvertStringToFile() or something like that. But your workaround looks like a good solution.