Forum How do I...?

Prev 1 2

Index generation (using PrinceXML 8)

mikeday
It works for me on Linux, but it generates a 2044 page PDF file.
sfinktah
I tested it on Linux aswell (vs. OSX), stops at the same point.

Ahhhh....


I just returned to that shell session, and it finished. It must just have been taking a nap. Just under 5 minutes... runs in under two without the javascript. My apologies.

Yes, 2048 pages. Perfect candidate for being split :)

The last line it outputs: prince: @@CHAPTERS: chapters=( "3..64" .... )
is read in by a shell script, which uses pdfjam (which uses pdflatex), and splits it, into three beautiful 700 page PDFs.

#!/usr/bin/env bash
# vim: set ts=3 sts=48 sw=3 cc=76 et fdm=marker: # **** IGNORE ******
get_range() { RANGE= # <-- OUTPUT                  **** THIS   ******
   local rstart rend i arr=( "$@" )  # ported from **** JUNK   ******                        
   for (( i=0 ; i < $# ; i++ )); do  # http://stackoverflow.com   
      (( rstart = arr[i] ))          # /a/2270987/912236          
      rend=$rstart; while (( arr[i+1] - arr[i] == 1 )); do
      (( rend = arr[++i] )); done; (( rstart == rend )) && 
      RANGE+=" $rstart" || RANGE+=" $rstart-$rend"; done; } # }}}

INPUT=${1}  # Take input from command line
INPUT=jcv_vsm.html
BASE=${INPUT%.html}
OUTPUT="$BASE.pdf"

# Take the cover/copyright notice from the 
# final output document, and insert it before
# each volume.
#
# You can specify other pages from other sources
# here too.
#    eg:  INSERTS=( "cover-a4.pdf" "1" )

INSERTS=( "$OUTPUT" "1-2" )

# Join the chapters up-to a maximum of MAX_SECTION_PAGE_COUNT

MAX_SECTION_PAGE_COUNT=$(( 732 ))

# It takes a while to generate 2,000 pages so 
# for testing and demonstration purposes, we'll
# use a document that has already completed.

SAMPLE_PRINCE_OUTPUT='prince: @@CHAPTERS: chapters=(  "3..64" "65..66" \
"67..80" "81..226" "227..230" "231..232" "233..234" "235..260" \
"261..298" "299..334" "335..336" "337..378" "379..414" "415..446" \
"447..512" "513..626" "627..702" "703..704" "705..838" "839..852" \
"853..880" "881..900" "901..914" "915..918" "919..936" "937..998" \
"999..1000" "1001..1086" "1087..1092" "1093..1100" "1101..1112" \
"1113..1130" "1131..1134" "1135..1136" "1137..1204" "1205..1248" \
"1249..1290" "1291..1402" "1403..1444" "1445..1602" "1603..1826" \
"1827..1978" "1979..1998" "1999..2010" "2011..2026" "2027..2044" )'

# Else, we could run prince and grab the output
# SAMPLE_PRINCE_OUTPUT=$(
# prince                                        \
#   --script jcv-chapters.js                    \
#  -s jcv-chapters.css                          \
#  jcv_vsm.html                                 \
#  -vo jcv_vsm.pdf                              \
#  2>&1                                         \
#  | tee /dev/stderr                            \
#  | grep 'prince: @@CHAPTERS'                  \
# )

# Extract the chapter information from the output
# string.

chapter_vardef=${SAMPLE_PRINCE_OUTPUT##*@@CHAPTERS: chapters=}
declare -a PRINCE_CHAPTER_LIST="$chapter_vardef"

# Now we're going to expand out each chapter 
# into individual pages, then copy those pages
# to a numbered PDF.  Yes, we could have
# just used a range, but this way is cooler.
# e.g. we can easily count how many pages there
# are in a group, and do things with odd/even
# CHAPTERS, or factors of 4 or 8 (for pageup's)

# Iterate through all the CHAPTERS, and join 
# them into volumes.

VOLUME_NUMBER=1
SECTION_PAGE_COUNT=0
SECTION_PAGE_LIST="${INSERTS[@]} "

for key in "${!PRINCE_CHAPTER_LIST[@]}"
do
   chapter=${PRINCE_CHAPTER_LIST[$key]}
   
   # Expand the chapter range into page numbers
   declare -a 'CHAPTER_PAGE_LIST=({'"$chapter"'})'

   # Work out total page count
   CHAPTER_PAGE_COUNT=${#CHAPTER_PAGE_LIST[@]}
      
   # Compress list of pages into a range (don't laugh)
   get_range "${CHAPTER_PAGE_LIST[@]}" # put into $RANGE
   if (( CHAPTER_PAGE_COUNT + SECTION_PAGE_COUNT 
      >= MAX_SECTION_PAGE_COUNT ))
   then  # have we exceed the maximum page length? if so, output
         # everything until now. zero pad volume number.
      printf -v part %02d $VOLUME_NUMBER

      echo "Making part $part with $SECTION_PAGE_COUNT pages"
      pdfjam $SECTION_PAGE_LIST --outfile "$BASE-part$VOLUME_NUMBER.pdf" 

      (( VOLUME_NUMBER ++ ))
      (( SECTION_PAGE_COUNT = CHAPTER_PAGE_COUNT ))
      SECTION_PAGE_LIST="${INSERTS[@]} $OUTPUT $RANGE "
   else
      SECTION_PAGE_LIST+="$OUTPUT $RANGE "
      (( SECTION_PAGE_COUNT += CHAPTER_PAGE_COUNT ))
   fi
done

# We have to deal with the pages left over yet!

printf -v part %02d $VOLUME_NUMBER 

echo "Making part $part with $SECTION_PAGE_COUNT pages"
pdfjam $SECTION_PAGE_LIST --outfile "$BASE-part$VOLUME_NUMBER.pdf" 

echo Done.

Edited by sfinktah

howcome
I couldn't get the one-pass solution above to work, but here's a minimalistic two-pass solution where the document is adorned with a TOC and index. On a linux command-line, this should work:

wget http://www.princexml.com/howcome/2015/index/musick.html -O foo.html; prince --javascript foo.html >>foo.html; prince --javascript foo.html; evince -p 6 foo.pdf


I'd be interested in learning how it can be converted to a one-pass solution.
RichardForrester
I'm looking into making an auto Table of Authorities for legal briefs. It should function similar programmably to an index so I'm here catching up on the available solutions.

This thread is pretty old. Is this still the preferred approach?

I would think target-counters() (http://www.w3.org/TR/css-gcpm-3/#funcdef-target-counters) would be used.

Edited by RichardForrester

mikeday
JavaScript can create an index, although two passes will be necessary if you wish to coalesce references to identical page numbers.

The target-counters() generated content function refers to multiple values of a scoped hierarchical counter, eg. a section number like 2.1.3 where each level is still called "section".
abaker87
richardforrester wrote:
I'm looking into making an auto Table of Authorities for legal briefs. It should function similar programmably to an index so I'm here catching up on the available solutions.

This thread is pretty old. Is this still the preferred approach?


Richard, we use PrinceXML for pdf rendering and TOA generation, so I can confirm that the two pass javascript approach will work.
sfinktah
All the files hosted on my server (nt4.com) were missing for a number of years, I have just managed to restore them (mostly)... nm2.js is still lost.

Thus, this example should now work again to create a one-pass (ish) index of anything:

prince --script http://nt4.com/js/jquery --script http://nt4.com/js/jquery.highlight --script http://nt4.com/js/jquery.tinysort --script http://nt4.com/js/underscore --script http://nt4.com/js/underscore-string --script http://nt4.com/nm.js --script http://nt4.com/msn-index.js http://www.princexml.com/doc/9.0/javascript/ -s http://nt4.com/nm.css -vo t.pdf

howcome
@sfinktah, Thanks for following up an providing running code -- it works for me!

Meanwhile, I've written a simple guide on how to generate indexes:

https://css4.pub/2022/indexes/
Prev 1 2
Showing 51 - 58 of 58