Forum How do I...?

Merge repeated cross-links?

rlpowell
So I have an index that gets generated; by the time it reaches Prince it looks like this:

              <dt>girls' school</dt>
              <dd>
                <dl>
                  <dt>little: <a class="indexterm" href="#idm217009053168">Three-part tanru grouping with
    bo</a>, <a class="indexterm" href="#idm217009048368">Three-part tanru grouping with
    bo</a>, <a class="indexterm" href="#idm217009044128">Three-part tanru grouping with
    bo</a></dt>
                </dl>
              </dd>


The problem is that all of those are on the same page, so we get:

girls' school:
little: 18, 18, 18


which is pretty unfortunate. Any ideas for how I could fix that? Hell, can I regex it out of the final PDF? :D
mikeday
You can attack this with JavaScript, but it's not simple, as described in this mega-thread.
rlpowell
Yeah, I saw that, but that thread is about generating the index in prince, which is a different thing, so I wasn't sure how applicable it was.
rlpowell
Uh. OK, according to you in that thread:

"Unfortunately you can't do this yet. Generated content does not show up in the DOM, and once document conversion is finished, JavaScript cannot trigger reconversion. We're still considering different ways of doing this."

I take this to mean that what I see in the PDF (i.e. "little: 18, 18, 18") is not, at any point, accessible to the javascript.

So how does that thread help me? I have no interest in using javascript to *generate* an index; I only want to weed duplicates out of the *already generated* index, after the links have been turned into page numbers by Prince.
rlpowell
"However, there are still difficulties coalescing multiple references to identical page numbers, which we are still working on. " -- it's now three years later; how did that work go? :)
mikeday
You can do it with script functions, but it requires a reasonable level of ingenuity:
<span style="content: prince-script(index_entry, target-counter(url(#term1), page), target-counter(url(#term2), page), ...)"></span>
rlpowell
Why would I want to use target counters when my goal is to modify the already-generated index, not generate a new one?
rlpowell
So I've tried deleting index elements based on repeated same-page-ness in Prince.addEventListener("complete", shaveix, false); , and I've tried it with window.addEventListener("load", shaveix, false);

In the former case, the .removeChild appears to be completely ignored.

In the latter case, elements don't have associated page numbers, so there's no basis on which to do the removal.

So what I'm hearing is that this literally *cannot be done*, that if I want an index without repeats, I need to throw away everything generated by the other tools in my stack, and generate an index using javascript. Is that correct?

Do you understand why, having already generated a perfectly working index, the idea of rewriting it is not very appealing? -_-
rlpowell
WRT your span example: again, this is using stuff auto-generated by previous parts of the process; please read the example I gave. My URLs are not "cows", they are "#idm217009053168". I don't see any way to do what I want from anything you said or anything in the giant thread, and I don't think it's an unreasonable thing that I want. Please help.
mikeday
The markup in your example looked like this:
              <dt>girls' school</dt>
              <dd>
                <dl>
                  <dt>little: <a class="indexterm" href="#idm217009053168">Three-part tanru grouping with
    bo</a>, <a class="indexterm" href="#idm217009048368">Three-part tanru grouping with
    bo</a>, <a class="indexterm" href="#idm217009044128">Three-part tanru grouping with
    bo</a></dt>
                </dl>
              </dd>

You could use JavaScript to change it to this:
              <dt>girls' school</dt>
              <dd>
                <dl>
                  <dt>little: <span style="content: prince-script(index_term, target-counter(url(#idm217009053168), page), target-counter(url(#idm217009048368), page), target-counter(url(#idm217009044128), page))"></span></dt>
                </dl>
              </dd>

This would then call the index_term function and pass three arguments (18, 18, 18) or whatever the page numbers are, so that the function can combine them if necessary and return the final text that will get printed for that index term.
rlpowell
Ah, I see. That'll be interesting for the index elements that have a hundred or more items, but seems doable. I'll let you know if I get it working.
mikeday
The trick only works because JavaScript functions have an arguments array that allows them to take an arbitrary number of arguments. :)
rlpowell
rlpowell
Thanks for the help! I still think it shouldn't have been *that* hard, but all's well that ends well.
mikeday
Nicely done! :D
fbrzvnrnd
thank you +1