Using lastbox.pageNum with floating objects

jenniferflint
5 Jul 2023

Hi. We are running into a significant problem when a PDF ends with a floating object(s). In those cases, our Javascript is not returning an accurate value for lastbox.pageNum since it doesn't appear to know where the floating object ends up. Is there a different way to query this box location in Javascript so that we can retrieve the proper value for this item?

I've attached a ZIP sample that contains a snippet of the HTML, CSS and PDF to show when this problem occurs. Also included in the ZIP is a copy of Javascript log file that shows start-page:1 and end-page:1 because it cannot find the lastbox.pageNum in this 6-page file. The current Javascript that is creating this log is also included. Any help you can provide would be greatly appreciated.

float-problem.zip‎ 224.2 kB

pieter.lamers
5 Jul 2023

I can confirm the problematic behavior. When you select the boxes of a part of a document, e.g. with `document.getElementById('c14').getPrinceBoxes()` the floating boxes in that part are not part of the selection, not only the ones at the end, but also floating table pages halfway the document. @mikeday is that a bug or a feature?

mikeday
12 Jul 2023

Right, we will investigate this issue with page floats in the box tree.

mikeday
15 Jul 2023

Consider this simple example of a page float:

<div id="container">
Container
<div id="page-float" style="float: bottom">
Float
</div>
</div>

The page-float element is a child of the container element in the DOM tree, the elements are nested. However the boxes that these elements generate are not nested, as the container element will generate boxes that flow in the main page body area while the page-float element will generate boxes that flow in the bottom page float region.

This is demonstrated by this script that prints the boxes from both:

<script>
Prince.trackBoxes = true;
Prince.addEventListener("complete", checkPages, false);

function checkPages() {
    console.log("container boxes");
    dumpBoxes(" - ", document.getElementById("container").getPrinceBoxes());
    console.log("page-float boxes");
    dumpBoxes(" - ", document.getElementById("page-float").getPrinceBoxes());
}

function dumpBoxes(indent, bs) {
    for (let i = 0; i < bs.length; ++i) {
        let b = bs[i];
        switch (b.type) {
            case "TEXT":
            console.log(indent+b.type+" "+JSON.stringify(b.text)+" on page "+b.pageNum);
            break;

            default:
            console.log(indent+b.type+" on page "+b.pageNum);
            break;
        }
        dumpBoxes("  "+indent, b.children);
    }
}
</script>

Producing the following output:

container boxes
 - BOX on page 1
   - BOX on page 1
     - LINE on page 1
       - TEXT "Container" on page 1
page-float boxes
 - BOX on page 1
   - LINE on page 1
     - TEXT "Float" on page 1

You can see the recursively inspecting the boxes for the container element does not find the boxes for the page-float element, as they have been taken out of the normal flow and placed elsewhere.

In order to find the boxes generated by page floats you will need to find the DOM element for the page float first and check its boxes, or start from the page BODY box and then work downwards to its children to find the FLOATS box.

Forum › How do I...?

Using lastbox.pageNum with floating objects