Forum How do I...?

Create an index... maybe I misunderstand things?

michaeldupuis
Greetings,

I'm trying to create an index, but what I'm doing doesn't seem to be correct, so I think maybe I'm misinterpreting how to do things. Let's say I have several references to the string "London" in my document. In my index, I'd normally want to see the page numbers for all the "London" references .

I do this by wrapping each instance of the word "London" with the dfn tag like so:
<dfn class="london-tag">London</dfn>

I then have an index (just simple testing) that looks like this (I'm using the Designing sample as my reference):
<div class="index" id="index-h-1">
<ul class="index"><li><a href="#london-tag">London</a></li></ul>
</div>

What I find though, is that I don't get the page numbers, I seem to get a count, or sometimes I do, sometimes I DO get a single page number (what looks like one), but often I get 0 or 1 (in my real doc, things removed for clarity) I also get an error that "london-tag" is defined more than once too, so clearly I'm doing something wrong.

So, what I'm wondering is, am I actually building things correctly? Should I actually see an entry for London with 3 page numbers after it, if things get parsed correctly?
mikeday
Internal links reference id attributes, not class attributes. However, ids must be unique within the document. This means that you would need to use an id like "index-london1" on the first element, "index-london2" on the second, and so on, and then link to them all from the index.

This kind of index structure is time consuming and error-prone to maintain by hand. An easier solution is to use a class attribute the way you are currently doing, and then use an XSLT transform or Perl script to add the unique ids and generate appropriate links.
michaeldupuis
Thanks Mike,

Internal links reference id attributes, not class attributes. However, ids must be unique within the document. This means that you would need to use an id like "index-london1" on the first element, "index-london2" on the second, and so on, and then link to them all from the index.

Oh, yeah, not class, that was a typo on my part, i meant id, not class. Sorry for the confusion.
This kind of index structure is time consuming and error-prone to maintain by hand. An easier solution is to use a class attribute the way you are currently doing, and then use an XSLT transform or Perl script to add the unique ids and generate appropriate links.

I am actually generating everything dynamically, so I just need to make sure that I do things correctly. Is it even possible, though, to have an index listing that looks like this if "London" is on pages 3, 10, and 25?

London 3, 10, 25

considering that we're really dealing with links to ids (at least I think so)? From what I understand now, you're limited to something like

London 3
London 10
London 25
mikeday
You can do that, you would just need to dynamically generate your document like this:
text ... <span id="tag-london1">London</span> ...
text ... <span id="tag-london2">London</span> ...

Index:
London <a href="#tag-london1"></a> <a href="#tag-london2"></a>

And then add a CSS rule that uses target-counter to style the links in the index as page number cross-references.
michaeldupuis
Hi Mike,

Thanks, worked perfectly. For some reason I wasn't thinking in terms of links like I should have been (duh?). Your suggestion cleared things up for me.
jbarrus
I did this same thing but I have a question about the formatting. My list of page numbers has a space after each number. Can the space after each page number be eliminated?

The output on the pdf looks like this
computer, 12 , 13 , 13 , 19 , 22 , 22 , 23

css:
#index a::after {content: ", " target-counter(attr(href), page);}


markup:
computer<a href="#indexterm-1"/><a href="#indexterm-2"/>
[/code]


Just for anyone who is trying to implement this and would spend hours trying to get the xsl to create an index, here is what I did (most likely not the best way).

In the markup, each term I want indexed is surrounded with <indexterm></indexterm>

In the xsl I have

at the top:
<xsl:key name="indexlist" match="indexterm" use="text()" />


in the main template section at the bottom
<div id="index">
	<ol>
	<xsl:for-each select="descendant::indexterm">
		<xsl:sort select="indexterm"/>
		<xsl:if test="count(. | (key('indexlist', text())[1])) = 1"> 
			<li>
				<xsl:value-of  select="translate(text(), $ucletters, $lcletters)"/>
				<xsl:for-each select="key('indexlist', text())">
					<xsl:element name="a"><xsl:attribute name="href">#indexterm-<xsl:number level="any"/></xsl:attribute></xsl:element>
				</xsl:for-each>
			</li>
		</xsl:if>
	</xsl:for-each>
	</ol>
</div>


and

as a template
<xsl:template match="indexterm">
	<xsl:element name="span">
		<xsl:attribute name="id">indexterm-<xsl:number level="any"/></xsl:attribute>
		<xsl:value-of select="text()"/>
	</xsl:element>
</xsl:template>


the output ends up something like this
<div id="index"><ol><li>computer<a href="#indexterm-1"/><a href="#indexterm-2"/><a href="#indexterm-3"/><a href="#indexterm-5"/><a href="#indexterm-6"/><a href="#indexterm-7"/><a href="#indexterm-8"/></li><li>renaissance<a href="#indexterm-4"/></li></ol></div>
jbarrus
Additionally, many of the terms occur more than once on a page. Is there any way to make the css exclude duplicate pages?
mikeday
For the spaces, check that the XSLT processor is not adding spaces after each <a> element. When I make a test document here following your example I don't seem to get any spaces before the commas.

Unfortunately at the moment I don't think there is any way to avoid duplicate page references. We will need to introduce a new generated content mechanism to work around this problem for indexing.
somnath
Hi Mike,

Warm Wishes to all of the NEW YEAR :-)

Repeating the same question, Is there a way yet to suppress duplicate page numbers in output PDF?

Also I have few indexterms with links in a chapter numbered with roman numerals (i, ii, iii ....). The output index list in PDF shows decimal numbers instead of roman. Can u comment on this?


Thanks,
Somnath
mikeday
To answer your second question first, target-counter() can take a format parameter of "lower-roman" just like counter() can, to control the output format. This is necessary as the same counter can be used with different formats.

We don't yet have a mechanism for providing a unique list of page references for indices. Clearly this is important, and we're going to be working on related issues this year, so hopefully we'll be able to solve this problem.
memrich
We currently evaluate prince as possible processor for our learning books. We tried to generate an index and stumbled upon the problem described here:

We don't yet have a mechanism for providing a unique list of page references for indices. Clearly this is important, and we're going to be working on related issues this year, so hopefully we'll be able to solve this problem.


This is very important for us. Since this post is 5 years old, is there a solution available yet? or is there a workaround?

thanks,
Marco Emrich
webmasters akademie Nürnberg GmbH
http://www.webmasters.de
mikeday
5 months, not 5 years! :)

We're making progress, and I've moved this issue up the roadmap.
memrich
Uuuups, I looked at the Joined-Date - sorry.

And thanks for the good news.
bookdev
An automatic index generator along the lines of LaTeX's makeindex would be super helpful. It's great to see that's on the road map.
mikeday
The next major release of Prince will support JavaScript for doing document transformations like index generation and building tables of contents.
bookdev
Great. Thanks for the update, Mike.
Frank
Various wrote:
Also I have few indexterms with links in a chapter numbered with roman numerals (i, ii, iii ....). The output index list in PDF shows decimal numbers instead of roman. Can u comment on this?

target-counter() can take a format parameter of "lower-roman" just like counter() can, to control the output format. This is necessary as the same counter can be used with different formats.


Prince is great. It is doing everything I need it to do. I use it to convert a 120 page blog to a PDF document.

As a bonus, I am wondering whether you can elaborate on how to enhance the style sheets to allow the index to use roman numerals when pointing to pages numbered with roman numerals... How to allow most references to be formatted as integers in the index, but differentiate the index references to items on the preface, table of contents, and other frontmatter pages to be formatted with roman numerals. That is the question

Thank You and please keep up the great work!

Frank
mikeday
At the moment there would need to be a class on the link indicating which section it points to, or you could use JavaScript to follow the link, then work up through the parent elements and see where it is. A bit fiddly, I must admit.
Frank
mikeday wrote:
At the moment there would need to be a class on the link indicating which section it points to, or you could use JavaScript to follow the link, then work up through the parent elements and see where it is. A bit fiddly, I must admit.


Thanks for the hint Mike. I've been working on it, but haven't gotten anywhere so far. So, until I can find the right way, I just added "(ii)" and "(iv)" in the index after the decimal page numbers for the Table of Contents and the Preface. Not even close to perfect. But a little better than showing just decimal 2 & 4 for frontmatter page numbers.

Keep up the great work!

Frank