Forum Feature requests

Support for custom AFRelationship when attaching files to PDF

bohwaz
I sent an e-mail to the support e-mail but didn't get a reply yet.

Currently when attaching a file to a PDF, Prince is using "Unspecified" for "AFRelationship".

/AFRelationship /Unspecified


The specification for e-invoices in France and Germany (Factur-X/ZugFerd) requires this to be either "Data" or "Alternative" for the attached XML file, and "Supplement" for any (optional) other attachment.

E-invoices will be mandatory for all companies here from 2026.

I'm using this command to generate the file:

prince invoice.html -o invoice.pdf --pdf-profile="PDF/A-3a" --pdf-xmp=Factur-X_extension_schema.xmp --attach=factur-x.xml


In order to be able generate PDF files compliant with the Factur-X specification, it would be nice if Prince allowed to specify the "AFRelationship" for each attached file separately.

Thank you.
mikeday
Sorry for the delay in replying, we will add an option for specifying the attachment file relationship type.
bohwaz
No worries, I was just wondering if my email reached you :)

Thanks, great to hear this!
mikeday
We have added new command-line options to the Prince latest builds for file attachments with a specified AFRelationship field:
--attach-data=FILE
Attach a file to the PDF with the AFRelationship key set to Data.

--attach-source=FILE
Attach a file to the PDF with the AFRelationship key set to Source.

--attach-alternative=FILE
Attach a file to the PDF with the AFRelationship key set to Alternative.

--attach-supplement=FILE
Attach a file to the PDF with the AFRelationship key set to Supplement.

--attach-unspecified=FILE
Attach a file to the PDF with the AFRelationship key set to Unspecified.

All of these options can be used multiple times, similarly to the existing --attach option, and the attachment file relationship can also be specified in the job JSON like this:
{
  "url": "/path/to/xmp1.xml",
  "filename": "xmp1.txt",
  "description": "Some XMP metadata",
  "relationship": "Data"
}
bohwaz
This is great!

I just tried but I have another issue, this Factur-X format requires the attached XML to be attached as text/xml, and Prince is attaching it as application/octet-stream.

There are other requirements (listed below). I think the Params and ModDate requirements are filled by Prince, but not sure about the "Names object tree".

But I'm not sure if it's worth going the path to make Prince conform to everything to produce these files, as I also found a way to transform the PDF generated by Prince to a compliant PDF+XML file using Ghostscript: https://artifex.com/blog/creating-zugferd-documents-with-ghostscript

It just feels like there isn't much missing from Prince to generate these PDF+XML documents correctly, so it's possible it woule be nice, but if not I'll stick to Ghostscript.

Requirements from Factur-X specification:

The invoice data in the XML format is embedded using a file specification dictionary15. In order to do this,
a valid MIME type must be specified for the document to be embedded. The MIME type for Factur-X is
always text/xml.

The embedded file’s stream dictionary should contain a Params key. Params refers to a dictionary
containing at least a ModDate indicating the last modification date of the embedded file.

The embedded document must also be included in the Names object tree so as to enable compliant PDF
tools to represent the file together with additional information.

As a basic principle, several files can be embedded in the PDF/A-3 document, thereby enabling information
documents relating to the invoice check to be packaged together with the invoice data document in the
PDF/A-3. To identify, at PDF level, which of the embedded files is the invoice data document, the name of
the invoice data document must be included in the corresponding metadata attribute.
The XML file is always embedded with the name "factur-x.xml". The only exception to this is the reference
profile XRECHNUNG, where the name must be “xrechnung.xml”. As an option, additional supporting
documents may be embedded.

6.2.1 Embedding relationship

In the PDF/A-3 standard, an embedded file can principally relate to the whole (PDF) document (document
level) or to a particular page (page level). Irrespective of the type of relationship, the file specification
dictionary can be found in either the Document dictionary or the Page dictionary. The relationship link is
established by use of an array called AF (for Associated Files), which is entered in the respective dictionaries and contains a reference to the file specification dictionary.

In Factur-X 1.0 standard, the structured invoice dataset is always provided in factur-x.xml file or a reference profile such as xrechnung.xml (see chapter 7.7) file embedded in PDF/A-3 document. The "document level" is therefore the relationship type to be selected. This does not affect the embedding of other documents and files supporting the invoice.

6.2.2 Data relationship

In addition to the relationship type, ISO 19005-3 requires a data relationship to be specified, i.e. the
relationship between the embedded document and the PDF part, i.e. its visualization. This data relationship is expressed by the AFRelationship tag and may have one of the following values:

• Data: the embedded file contains data which is used for the visual representation in the PDF part,
e.g. for a table or a graph.

• Source: the embedded file contains the source data for the visual representation derived therefrom in
the PDF part, e.g. a PDF file created via an XSL transformation from an (embedded) XML source file or the MS Word file from which the PDF file was created.

• Alternative: this data relationship should be used if the embedded data are an alternative
representation of the PDF contents.

• Supplement: this data relationship is used if the embedded file serves neither as the source nor as
the alternative representation, but the file contains additional information, e.g. on easier automatic
processing.

• Unspecified: this data relationship term applies where none of the data relationships above
apply, or where there is an unknown data relationship.

Note:
There are no technical consequences within the PDF file from specifying the data relationship. In particular,
this means that specifying a Source data relationship, for instance, does not suggest that the contents of
the embedded data and the invoice image are identical. Instead, they provide the invoice with an indication
of how the role of the embedded data should be understood.
If the visual representation contains more invoicing data than the XML structured file (especially for
MINIMUM and BASIC WL profiles), the Data value must be used. It indicates that the XML structured file
contains invoicing information that is strictly identical to what is shown in the visual representation to enable
an automatic invoice process.
If the visual representation has been built from the XML structured file, the Source value can be used. It
indicates that the source file is the full structured XML file and that the visual representation, which
consequently contains strictly the same invoicing information as the structured file, has been built from this
structured XML file attached in the PDF (“factur-x.xml” or “xrechnung.xml”).

Finally, if the XML structured file and the visual representation contain both strictly the same invoicing
information and constitute two alternative presentations of an identical invoice content, the
Alternative value must be used. This indicates that the fiscally relevant content of both representations
is identical, and that the XML file is merely an alternative and independent form of representation which is
better suited to machine processing (copies of a document with identical contents). For the use of Factur-x in
Germany (ZUGFERD 2.2.x = Factur-X 1.0), it is imperative to use the value Alternative in conjunction
with the permissible profiles BASIC, EN 16931, EXTENDED and XRECHNUNG.
mikeday
We have added the ability to specify the file attachment MIME type in the latest build, also it will default based on the file extension, so filenames ending in ".xml" will be embedded with the "text/xml" MIME type automatically. Hope this helps!
bohwaz
I just tried, can see in the PDF file that the MIME type is correct, but the official validator still doesn't find the XML file… I'm contacting the organization responsible for the validator, to understand what's going on.

Thank you for your help!
bohwaz
Hello Mike,

I think I know what's going on here.

In the PDF generated by Prince, I can see:

<</Names [(Attach000) 9 0 R]>>


And in the original PDF from Factur-X organization I see this:


<<
/Type /Catalog
/Pages 1 0 R
/AF 9 0 R
/Metadata 8 0 R
/Names <<
/EmbeddedFiles <<
/Names [ (factur\055x\056xml) 5 0 R ]
>>


It seems that the recognized file name from the Prince PDF file is "Attach000" and not "factur-x.xml" as it should. This appears in https://pdf.hyzyla.dev when I open both PDF files (see attachments).
  1. scr_1ef53b595a1a.jpg24.7 kB
    From PrinceXML PDF file
  2. scr_c62b4eff97c8.jpg38.0 kB
    From official factur-x example
bohwaz
I have been using this Python library: https://github.com/akretion/factur-x

This is what lead me to the wrong file name:

./facturx-pdfextractxml -l debug /tmp/w.pdf /tmp/a.xml   
2024-09-23 16:33:05,090 [DEBUG] get_xml_from_pdf with factur-x lib 3.1
2024-09-23 16:33:05,090 [DEBUG] Searching for filenames ['factur-x.xml', 'zugferd-invoice.xml', 'ZUGFeRD-invoice.xml', 'order-x.xml']
2024-09-23 16:33:05,092 [DEBUG] pdf_root={'/Type': '/Catalog', '/Pages': IndirectObject(2, 0, 140191879847312), '/AF': [IndirectObject(9, 0, 140191879847312)], '/Names': {'/EmbeddedFiles': IndirectObject(12, 0, 140191879847312)}, '/ViewerPreferences': {'/DisplayDocTitle': True}, '/Outlines': IndirectObject(13, 0, 140191879847312), '/StructTreeRoot': IndirectObject(14, 0, 140191879847312), '/MarkInfo': {'/Marked': True}, '/Metadata': IndirectObject(15, 0, 140191879847312), '/OutputIntents': [IndirectObject(16, 0, 140191879847312)]}
2024-09-23 16:33:05,092 [DEBUG] embeddedfiles_node={'/Names': ['Attach000', IndirectObject(9, 0, 140191879847312)]}
2024-09-23 16:33:05,092 [DEBUG] embeddedfiles=['Attach000', IndirectObject(9, 0, 140191879847312)]
2024-09-23 16:33:05,092 [DEBUG] embeddedfiles_by_two=[('Attach000', IndirectObject(9, 0, 140191879847312))]
2024-09-23 16:33:05,092 [DEBUG] found filename=Attach000
2024-09-23 16:33:05,093 [INFO] Returning an XML file False
2024-09-23 16:33:05,093 [DEBUG] Content of the XML file: False
2024-09-23 16:33:05,093 [WARNING] File /tmp/a.xml has not been created
mikeday
We've made some changes in the latest build to use the attachment filename in the PDF name tree (which is a strange requirement as the filename is already present in the attachment dictionary, but it seems to be what the tools expect) does this help?