Re: [greenstone-users] Can I count no of pages in my collection of PDF files?

From Katherine Don
DateMon, 12 Mar 2007 15:55:18 +1300
Subject Re: [greenstone-users] Can I count no of pages in my collection of PDF files?
In-Reply-To (c387a3b60703052005x6d2b9dd3s9470c1301ffc11d9-mail-gmail-com)
Hi Ata

Does this number of pages (roughly) equal the numsections number in your
collection's build.cfg? If so, just use the number from build.cfg, then
you don't need to worry about building the classifier.
Otherwise, I don't know of any other method for determining this.

For the next release, I have added in _numsections_ and _numwords_
macros which get set to the numsections and numwords values from
build.cfg. If you want to do this yourself, edit
src/recpt/pageaction.cpp, look for the line like
disp.setmacro ("numdocs", displayclass::defaultpackage, cinfo->numDocs);
and add a second one
disp.setmacro ("numsections", displayclass::defaultpackage,
cinfo->numSections);

then you'll need to recompile.

Cheers,
Katherine

Ata ur Rehman wrote:

> Dear Katherine
>
> I am going to add new documents in my collection once in two or three
> days. So i have adopted a solution for that. I have made a new
> browsing classifier for ex.Title and in format features, vlist for
> that classifier is consist of only following statement:
> {if}{[NumDocs], <br>[NumDocs]}
>
> so i always find a long list of no of pages in a column (no of pages
> in each title in one line). I copied all the columns and paste it in
> MS Excel and find the sum by using =sum(...) function. After that I
> delete that classifier. Is there an other solution for it?
>
> Ata
>
> On 3/6/07, Katherine Don <kjdon@cs.waikato.ac.nz> wrote:
>
>>
>>
>> Hi Ata
>>
>> I don't think there is a macro for number of pages. If you look in your
>> collection's build.cfg file, you'll see numdocs and numsections entries.
>> If your collection has one section per page, then the number of
>> sections
>> should be equal to the number of pages.
>> numdocs gets turned into _numdocs_ macro, but numsections doesn't.
>>
>> Are you rebuilding the collection very often? What you could do is hard
>> code the number of sections into the collectionextra, e.g. No. of Pages:
>> 334.
>> You would need to change this text every time you added more
>> documents to
>> the collection.
>>
>> Regards,
>> Katherine
>>
>> Ata ur Rehman wrote:
>> Oh. Actually If I put [NumPages] in format features, I will get the
>> number
>> of pages in the entire document, which is not the goal. The goal is
>> to get
>> the total number of pages in all the documents, so i am trying to put
>> [NumPages] in collect.cfg as given below:
>>
>> collectionmeta collectionextra [l=en] "
>> No of Pages: [NumPages]
>> "
>>
>> Actually I want to give the total no of pages in the entire
>> collection at
>> About Page of this collection.
>>
>>
>> Ata
>>
>>
>> On 2/21/07, shaoqun@cs.waikato.ac.nz <shaoqun@cs.waikato.ac.nz > wrote:
>> > Hello Ata,
>> >
>> > Can you see it in the gli's enrich panel (ex.NumPages ) ? Where did
>> you
>> put
>> > [NumPages]? It should go into a format statement (gli->format) not
>> macros
>> > files. Please ignore _1_ and pages as they have nothing to do
>> with the
>> > number of pages.
>> >
>> > Regards
>> > Shaoqun
>> >
>> > > Thank you very much Shoaqun for your prompt response: I changed the
>> index
>> > > on
>> > > section level and made PDFPlug to use sections option but the
>> output for
>> > > the
>> > > following code:
>> > >
>> > > No of pages: pages <br>
>> > > No of pages: _1_ <br>
>> > > No of pages: [NumPages]
>> > >
>> > > is as under:
>> > >
>> > > No of pages: pages
>> > > No of pages:
>> > > No of pages: [NumPages]
>> > >
>> > >
>> > > :(
>> > >
>> > > Ata
>> > >
>> > > On 2/20/07, sw64@cs.waikato.ac.nz <sw64@cs.waikato.ac.nz> wrote:
>> > >>
>> > >> hello Ata,
>> > >>
>> > >> Sorry, I got it wrong again. The number of pages is stored as
>> extracted
>> > >> metadata [NumPages], you can see it in gli's Enrich panel and
>> refer to
>> > >> it
>> > >> in a format statement.
>> > >>
>> > >> Regards
>> > >> Shaoqun
>> > >>
>> > >>
>> > >> > Hello Ata,
>> > >> > It sould be _1_ rather than pages.
>> > >> >
>> > >> > Regards
>> > >> > Shaoqun
>> > >> >
>> > >> >>> Hello Ata,
>> > >> >>>
>> > >> >>> You can get it from pages macro in document.dm, and make
>> sure you
>> > >> >> build the
>> > >> >>> index on the section level and use the use-sections option of
>> > >> PDFPlug.
>> > >> >>>
>> > >> >>> Regards
>> > >> >>> Shaoqun
>> > >> >>>
>> > >> >>>> Dear All
>> > >> >>>>
>> > >> >>>> I have a collection of PDF files on Windows 2003 server
>> using GSDL
>> > >> >>>> 2.71 .
>> > >> >>>> I
>> > >> >>>> can count the number of documents by _numdocs_. Now each
>> document
>> > >> can
>> > >> >>>> have
>> > >> >>>> more than one page. How I can count no of pages in these
>> > >> documents?
>> > >> >>>> Any
>> > >> >>>> macro, or other idea?
>> > >> >>>>
>> > >> >>>>
>> > >> >>>> Regards,
>> > >> >>>>
>> > >> >>>> Ata ur Rehman,
>> > >> >>>>
>> > >> >>>> Librarian,
>> > >> >>>> Akhter Hameed Khan Resource Center (AHKRC),
>> > >> >>>> NRSP-Institute of Rural Management,
>> > >> >>>> F-6/4, Islamabad
>> > >> >>>>
>> > >> >>>> Ph: +92 51 2822752
>> > >> >>>> +92 51 2822792
>> > >> >>>>
>> > >> >>>> http://www.irm.edu.pk/
>> > >> >>>> _______________________________________________
>> > >> >>>> greenstone-users mailing list
>> > >> >>>> greenstone-users@list.scms.waikato.ac.nz
>> > >> >>>>
>> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>> > >> >>>>
>> > >> >>>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >
>> ________________________________
>>
>> _______________________________________________
>> greenstone-users mailing list
>> greenstone-users@list.scms.waikato.ac.nz
>> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>>
>