Re: [greenstone-users] Dead link & mutliple creators

From Leon White
DateTue, 30 May 2006 14:56:06 +1200
Subject Re: [greenstone-users] Dead link & mutliple creators
In-Reply-To (447BACA7-2090305-cs-waikato-ac-nz)
Hi Michael,

yes, it seems to contain the ex.* metadata. More interestingly, it seems to specifically omit the dc.Creator metadata, although as you can see from the previously included screenshot, that metadata has definitely been entered. I tried removing the data in the GLI and entering it again, which had no effect. Directly modifying the metadata in the XML file and rebuilding didnt work either because the build process overwrites whatever is in the archives folders I guess. Here's my two doc.xml files from the screenshot before:

BAD (dc.Creator usually comes after dc.Language):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Archive SYSTEM " http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
<Archive>
<Section>
  <Description>
    <Metadata name="gsdldoctype">indexed_doc</Metadata>
    <Metadata name="Language">en</Metadata>
    <Metadata name="Encoding">utf8</Metadata>
    <Metadata name="dc.Type">Conference Paper</Metadata>
    <Metadata name="dc.Language">English</Metadata>
    <Metadata name="dc.Source">Governance in Pacific States Development Research Symposium</Metadata>
    <Metadata name=" dc.Format">PDF</Metadata>
    <Metadata name="dc.Publisher">Pacific Institute of Advanced Studies in Development and Governance, University of the South Pacific</Metadata>
    <Metadata name=" dc.Subject">governance</Metadata>
    <Metadata name="dc.Subject">development research</Metadata>
    <Metadata name="dc.Subject">economic growth</Metadata>
    <Metadata name="dc.Subject">sustainable development</Metadata>
    <Metadata name="dc.Date">20030930</Metadata>
    <Metadata name="GENERATOR">pdftohtml 0.36</Metadata>
    <Metadata name="Creator">Pramendra Sharma and Mahendra Reddy</Metadata>
    <Metadata name="Title">Governance-Growth Nexus: An exposition and some examples from Fijiā–”s Financial Sector</Metadata>
    <Metadata name="URL">http://E:/Greenstone/gsdl/collect/dig-gov/tmp/PramendraSharmaandMahendraReddyGovernanceGrowthNexusAnexpositionandsomeexamplesfromFiji 'sFinancialSector.html</Metadata>
    <Metadata name="gsdlsourcefilename">importGovernance ConferenceGovernance, Economic Growth, Sustainable DevelopmentPramendra Sharma and Mahendra Reddy - Governance-Growth Nexus- An exposition and some examples from Fiji's Financial Sector.pdf</Metadata>
    <Metadata name="gsdlconvertedfilename">tmpPramendraSharmaandMahendraReddyGovernanceGrowthNexusAnexpositionandsomeexamplesfromFiji'sFinancialSector.html</Metadata>
    <Metadata name="Source">Pramendra Sharma and Mahendra Reddy - Governance-Growth Nexus- An exposition and some examples from Fiji's Financial Sector.pdf</Metadata>
    <Metadata name="Plugin">PDFPlug</Metadata>
    <Metadata name="FileSize">270915</Metadata>
    <Metadata name="FileFormat">PDF</Metadata>
    <Metadata name="srclink">&lt;a href=&quot;/gsdl/collect/gsarch/index/assoc/[archivedir]/doc.pdf&quot;&gt;</Metadata>
    <Metadata name="srcicon">View the PDF document</Metadata>
    <Metadata name="/srclink">&lt;/a&gt;</Metadata>
    <Metadata name="Date">20060518</Metadata>
    <Metadata name="NumPages">17</Metadata>
    <Metadata name="Identifier">HASH99a7ffe247c508cd966910</Metadata>
    <Metadata name="assocfilepath">HASH99a7.dir </Metadata>
    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata>
  </Description>
  <Content>


GOOD:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Archive SYSTEM "http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
<Archive>
<Section>
  <Description>
    <Metadata name="gsdldoctype">indexed_doc</Metadata>
    <Metadata name="Language">en</Metadata>
    <Metadata name="Encoding">utf8</Metadata>
    <Metadata name="dc.Type">Conference Paper</Metadata>
    <Metadata name="dc.Language">English</Metadata>
    <Metadata name="dc.Creator">Parmod Chand</Metadata>
    <Metadata name="dc.Creator">Michael White</Metadata>
    <Metadata name="dc.Source">Governance in Pacific States Development Research Symposium</Metadata>
    <Metadata name=" dc.Format">PDF</Metadata>
    <Metadata name="dc.Coverage">Fiji</Metadata>
    <Metadata name="dc.Publisher">Pacific Institute of Advanced Studies in Development and Governance, University of the South Pacific</Metadata>
    <Metadata name="dc.Subject">governance</Metadata>
    <Metadata name="dc.Subject">development research</Metadata>
    <Metadata name="dc.Subject">economic growth</Metadata>
    <Metadata name="dc.Subject">sustainable development</Metadata>
    <Metadata name="dc.Date">20030930</Metadata>
    <Metadata name="GENERATOR">pdftohtml 0.36</Metadata>
    <Metadata name="Creator">Parmod Chand and Michael White</Metadata>
    <Metadata name="Title">Accountability Of The Private Sector To The People Of Fiji - Regulation Through Global Pressure And Professional Interests</Metadata>
    <Metadata name="URL">http://E:/Greenstone/gsdl/collect/dig-gov/tmp/ParmodChandandMichaelWhiteAccountabilityOfThePrivateSectorToThePeopleOfFiji.html </Metadata>
    <Metadata name="gsdlsourcefilename">importGovernance ConferenceGovernance, Economic Growth, Sustainable DevelopmentParmod Chand and Michael White - Accountability Of The Private Sector To The People Of Fiji.pdf</Metadata>
    <Metadata name="gsdlconvertedfilename">tmpParmodChandandMichaelWhiteAccountabilityOfThePrivateSectorToThePeopleOfFiji.html</Metadata>
    <Metadata name="Source">Parmod Chand and Michael White - Accountability Of The Private Sector To The People Of Fiji.pdf</Metadata>
    <Metadata name="Plugin">PDFPlug</Metadata>
    <Metadata name="FileSize">305672</Metadata>
    <Metadata name="FileFormat">PDF</Metadata>
    <Metadata name="srclink">&lt;a href=&quot;/gsdl/collect/gsarch/index/assoc/[archivedir]/doc.pdf&quot;&gt;</Metadata>
    <Metadata name="srcicon">View the PDF document</Metadata>
    <Metadata name="/srclink">&lt;/a&gt;</Metadata>
    <Metadata name="Date">20060519</Metadata>
    <Metadata name="NumPages">20</Metadata>
    <Metadata name="Identifier">HASH8ddfcbd6db3da2c4ce9b89</Metadata>
    <Metadata name="assocfilepath">HASH8ddf.dir</Metadata>
    <Metadata name="gsdlassocfile">ParmodChandandMichaelWhiteAccountabilityOfThePrivateSectorToThePeopleOfFiji-1_1.jpg:image/jpeg:</Metadata>
    <Metadata name="gsdlassocfile">ParmodChandandMichaelWhiteAccountabilityOfThePrivateSectorToThePeopleOfFiji-2_1.jpg:image/jpeg:</Metadata>
    <Metadata name="gsdlassocfile">ParmodChandandMichaelWhiteAccountabilityOfThePrivateSectorToThePeopleOfFiji-2_2.jpg:image/jpeg:</Metadata>
    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata>
  </Description>
  <Content>

Weird?

Cheers
Leon

On 5/30/06, Michael Dewsnip <mdewsnip@cs.waikato.ac.nz> wrote:
Hi Leon,

Please find the doc.xml file for a problem document in the collection
"archives" directory. Does it include the extracted metadata correctly?

Regards,

Michael



Leon White wrote:

> Hi Katherine, List,
>
> thanks for your elegant solution, and sorry about this long mail, but
> I am experiencing some very strange behaviour... I added dc.Creator
> metadata only to those publications with multiple authors, leaving the
> rest with the automatically ex.Creator metadata. For the most part
> this seems to be working perfectly, but there are some very strange
> exceptions (bugs?).
>
> After a full build of the collection, returning to the 'Enrich' tab
> shows that certain documents do not have any ex.Creator metadata,
> indeed no ex.* metadata at all. This cannot be true, because the
> titles and other stuff I use from the ex.* set is visible in the
> collection frontend, yet it is exactly these publications which are
> reverting to the old "<author1> and <author2>" display. Why do some
> documents not get any ex metadata?? I have triple checked that the
> PDFs do indeed have their metadata entered correctly. See the two
> attached images for my view of the GLI 'Enrich' tab immediately after
> a build.
>
> For examples search my collection
> <http://www.rkb.usp.ac.fj/gsdl/cgi-bin/library.exe?site=localhost&a=p&p=about&c=dig-gov&ct=0&l=en&w=utf-8%2520rtekeep= >
> for 'chand' (an example of things working properly) and 'pramendra'
> (an example of the wrong behaviour). This can also clearly be seen in
> the list of authors.
>
> Once this problem is fixed is it possible to adjust the
> {Or}{[sibling:dc.Creator],[sibling:ex.Creator]} line so that it adds
> an 'and' or a few nbsp's between the list of siblings? The separation
> of metadata should be transparent to the end user.
>
> Thank you very much and I'm looking forward to your reply,
> Leon
>
> On 5/29/06, *Katherine Don* <kjdon@cs.waikato.ac.nz
> <mailto: kjdon@cs.waikato.ac.nz>> wrote:
>
>     Hi Leon
>     Is your problem that the ex.Creator has a value like "smith, jones"
>     and you are getting a bookshelf "smith, jones" but you want two
>     bookshelves, "smith" and "jones"?
>
>     We don't have any way of splitting metadata, so your best solution
>     would
>     be to assign dc.Creator metadata to those documents with multiple
>     extracted metadata, and use
>     AZCompactList -metadata dc.Creator,ex.Creator
>     If you then want to display the Creator for the document node (and not
>     just in the bookshelf), then use
>     {Or}{[sibling:dc.Creator],[sibling:ex.Creator]}
>     You want dc first so that it will use that if its othere,
>     otherwise use
>     ex.Creator.
>     sibling will display all values of multivalued metadata.
>
>     Hope this helps,
>     Katherine
>
>     > Specifically, my problem is as follows: I am importing metadata
>     such as
>     > ex.Title and ex.Creator from the PDF document properties through
>     > PDFPlug. I want to generate an AZCompactList from the ex.Creator
>     data.
>     > However, some documents have mutliple authors entered, which
>     means these
>     > authors appear both on their own and once again together with their
>     > co-authors on the list. Ideally the two authors should appear
>     > individually in the list with the document credited to each of them
>     > exactly once. Can I establish an overide along the lines of {Or}{[
>     > ex.Creator],[dc.Creator]} for this, and manually enter author
>     data for
>     > documents with mutliple authors?
>     >
>
>
>
> ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
>------------------------------------------------------------------------
>
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>