Primo Central Index from ExLibris as a black box for the buzy librarians

One of the main expert fields for libraries, especially research libraries, is metadata. Libraries, or more exactly librarians, know what problems there are if the metadata are insufficient and we also know what efforts it takes to keep it clean.

At the IGELU 2013 coneference in Berlin the opening keynote speaker Michael Cotta-Schønberg, Director of the University Library in Copenhagen and Dep. Director of the Danish National Library, mentioned the future profession: metadata cleaner. We already have them but this far they’ve been mainly called cataloguers, but we have digital librarians working with metadata also. In late 60s computer programmer Henriette Avram invented MARC format to get some structure of the metadata. It still dominates in libraries but as a format it’s really not enough to describe electronic documents, thats why XML as a tool for different applications have made its entry into the (library) world.

The need of a central datawell is huge in the library world and libraries having the discovery tool Primo has Primo Central Index(PCI). Though it’s just in its infancy and far from optimal still. Librarians love metadata, librarians love having different opinions on metadata, librarians also love autonomy, still librarians love to cooperate with other librarians to solve common problems, but if they not agree they love to make local adjustments. It’s like in the Linux world. If you don’t like the latest developments of you favourite distribution you can clone it and make your own version due to FOSS!

So, how does this refer to Primo Central. In Primo you can build your own pipes, meaning you’re harvesting for example your own catalogue or maybe your own repository. Fine. You set up your own normalization rules, like which fields from this format (MARC, MODS, DC etc) should be written to that pnx fields. This takes some efforts and for smaller libraries, even if they have the competence, it’s time consuming. So, let ExLibris harvest your repository in Primo Central Index and set up the normalization rules. Fine.

But after while even the buzy librarian probably will discover some bad metadata. Buzy librarian report to ExLibris and ExLibris fix it after a while. Buzy librarian see even more crappy data and reports. This time ExLibris says: “Our strategy isn’t to change that rule and because this research database will be used by other customers we have decided not to change the strategic….”. Buzy librarian gets irritated of this bad interpreting of the metadata. He-she can’t see the source record from PCI to find arguments of how the normalization rules(NR) should be changed, though she can guess, buzy librarians are great at guessing. Last thing is that buzy librarian can’t technically within the Primo solution set up own NR matching PCI.

Lets make things more clear with an example. Busy librarian finds a record in PCI deriving from Swedish article database Articlesearch (swe. Artikelsök). Busy librarian is able to view the pnx:

<?xml version="1.0" encoding="UTF-8"?><record xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:prim="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search">
<control>
<sourcerecordid>1947274</sourcerecordid>
<sourceid>btj</sourceid>
<recordid>TN_btj1947274</recordid>
<sourcesystem>Other</sourcesystem>
</control>
<display>
<type>newspaper_article</type>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<creator>&lt;span>Ambjörnsson&lt;/span>, &lt;span>Ronny&lt;/span></creator>
<ispartof>DAGENS NYHETER, 2008</ispartof>
<subject>Rasism ; Antisemitism ; Främlingsfientlighet</subject>
<source>ArtikelSök (BTJ)&lt;img src="http://fe.p.prod.primo.saas.exlibrisgroup.com:1701/primo_library/libweb/BTJ_logo_134x31.jpg" style="vertical-align:middle;margin-left:7px"></source>
<snippet/>
</display>
<links>
<openurl>$$Topenurl_article</openurl>
<openurlfulltext>$$Topenurlfull_article</openurlfulltext>
<addlink>$$Uhttp://www.dn.se$$EView_the_periodical</addlink>
</links>
<search>
<creatorcontrib>Ambjörnsson, Ronny</creatorcontrib>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<subject>Rasism</subject>
<subject>Antisemitism</subject>
<subject>Främlingsfientlighet</subject>
<general>1947274</general>
<general>ArtikelSök (BTJ)</general>
<sourceid>btj</sourceid>
<recordid>btj1947274</recordid>
<issn>&lt;Issn/></issn>
<rsrctype>newspaper_article</rsrctype>
<creationdate>2008</creationdate>
<addtitle>DAGENS NYHETER</addtitle>
<searchscope>btj</searchscope>
<scope>btj</scope>
</search>
<sort>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<author>Ambjörnsson, Ronny</author>
<creationdate>20080403</creationdate>
</sort>
<facets>
<frbrgroupid>984107692</frbrgroupid>
<frbrtype>6</frbrtype>
<creationdate>2008</creationdate>
<topic>Rasism</topic>
<topic>Antisemitism</topic>
<topic>Främlingsfientlighet</topic>
<collection>ArtikelSök (BTJ)</collection>
<prefilter>newspaper_articles</prefilter>
<rsrctype>newspaper_articles</rsrctype>
<creatorcontrib>Ambjörnsson, Ronny</creatorcontrib>
<jtitle>Dagens Nyheter</jtitle>
</facets>
<frbr>
<t>99</t>
</frbr>
<delivery>
<delcategory>Remote Search Resource</delcategory>
<fulltext>no_fulltext_linktorsrc</fulltext>
</delivery>
<ranking>
<booster1>1</booster1>
<booster2>1</booster2>
<pcg_type>aggregator</pcg_type>
</ranking>
<addata>
<au>Ambjörnsson, Ronny</au>
<atitle>Skilj på dumheterna : islamofober är inte alltid rasister</atitle>
<jtitle>DAGENS NYHETER</jtitle>
<date>20080403</date>
<risdate>20080403</risdate>
<spage>&lt;PageReference/></spage>
<epage>&lt;PageReference/></epage>
<issn>&lt;Issn/></issn>
<format>unknown</format>
<genre>unknown</genre>
<ristype>GEN</ristype>
</addata>
</record>

Busy librarian finds out that the pnx record has flaws. Busy librarian can't find the date of the article? Busy librarian can find date in addata which is used for references and creationdate in <sort> but it's for sorting purposes. Busy librarian wonders why? Busy librarian can't view the source record in Primo Back Office (PBO).  Bad. Busy librarian goes to articlesearch to find out if they have some kind of source record. Busy librarian finds a field for date: 2008-04-03

artsok

Busy librarian reports this to ExLibris via Salesforce and they fix it after some weeks. Busy librarian also reports that he-she want to include the commentary (swe. kommentar) field. ExLibris says no. Fine. Buzy librarian says: I want to set up my own local normalization rule, as I can do with my own pipes. ExLibris says you can't. It's not a feature we have. You have to follow us in this case. I say bad. Buzy librarian says I will deactivate Articlesearch in PCI and I will set up my own pipe in Primo. ExLibris says fine, but you need to pay for the amount of records you harvest in your pipes. Buzy librarian says I don't want to pay when you bring me bad interpretation of metadata and no options to do local normalization rules.

Well, this was just a scenario. I was at IGELU 2013 in Berlin. There was a session about Primo Central by Rachel Kessler. My question was why customers can't view the source record of PCI? Shlomo Sanders answered that they aren't always allowed to do that because of contracts.

I asked the same question, just a bit more ingoing, to Shlomo in a postconference e-mail:

When I ask for options to view source records in PCI you told us at IGELU because of contracts we can't view it. But if the library as a customer is subscribing to for example Web of Science there should be no problem for a library to view the source record. And take all this free data like our repository Diva, there is no contract issues with that data?

Shlomo: Perhaps not, but we cannot decide that by ourselves and must discuss this with each vendor separately. A very daunting task!

My supplement here: remember here that it’s just for viewing purposes in IP-address recognition with password protected Back Office of Primo. Not downloading the record or something.

Next question from me at the Primo Central session was about the libraries need of the feature to edit normalization rules for PCI records when he said: Wow, its 500 millions of records.

I asked a follow-up to that via e-mail: How could that be a problem? You don't have one single normalization rule for each record, just for each source, like swedish Diva. It's the normalization rules of for example Diva in PCI we want to edit local. Of course we can report cases again and again in Salesforce concerning metadata but what if ExLibris says: "We don't want to interpret the data in that way you prefer".

Shlomo: Perhaps the answer given in the Q&A was not clear enough. We have ~500 million records and plan to increase that significantly. It is not feasible for us to have multiple different Normalized version of these records. We are responsible to update the data in PC and this is done based on the frequency of the data we get from the vendors.

We discuss Normalization, as needed, with Institutions that have made their IR available and will expand these discussion as needed. The suggestion to talk to native non-English speakers when handling non-English data sets (not just IR) is a good idea and we will be following up on this.

Same questions about Primo Central was raised at the Questions and answers session at IGELU when Gilad Gal, as I understood, said there were storage and performance issues due to showing the source record in PCI. Shlomo denies in e-mail:

Shlomo: We are not worried about storage or performance issue related to accessing Source Records of PCI. The issue is contractual.

My last question via e-mail: Does ExLibris still want PCI to become a black box?

Shlomo: No! but we do have contracts. Where possible we will enable access in the future to the data as RDF. We will look into making Source Records of IRs available through URIs.

This issue about RDF and URI was also presented by Shlomo at the SWIG linked open data session at IGELU. We sure have to come back to that issue.

Finally, you buzy librarian working with Primo. Do you want ExLibris PCI to show you the source records in pnx viewer? Do you want the possibility to set up your own rules for PCI records? If, well, lets work for it!

About these ads

4 responses

  1. We welcome your input. Please work with us to improve the normalization rules that apply to the specific collection that you know better than we do.

    We have initiated a change in our Primo Central procedures such that the Primo Central team will work with customer focal points on Normalization Rule related issues. This is especially important for regional specific data sets and data sets that may be in languages that our teams are not fluent.

    Shlomo

  2. Nice! “…change in our Primo Central procedures such that the Primo Central team will work with customer focal points on Normalization Rule related issues”. And how will this practically be done? I presume it can’t solely be arranged via Salesforce, or? In Sweden we would appreciate if we could have a dialogue to improve sources like Diva, Swepub, Artikelsök. In that case we need to view source records for Artikelsök for example, otherwise it will be the buzy librarian’s guesses.

    What source formats is used for Diva? Dublin core? We would appreciate sweMODS. Our plans in Umeå is to change our local pipe for Diva to sweMODS. DC has looooads of limitations. We can evaluate the source records in our own pipe thanks to pnx viewer. Is the source format for swepub Marc-XML or just Marc?

  3. I just had a case with bad meta data in Primo Central records and received this answer from ExLibris support: “This information is coming from the vendor and therefore displayed in the PNX”. I asked for the source record to be able to check the “vendor data” and see where the problem derives from: the vendor data or the ExLibris rules implementation and maybe suggest some changes. I received this answer: “I’ve consulted with our PR-team and they stated that we can’t provide source records to any third party outside of ExLibris”.

    ExLibris keeps coming back to the issue of due to contracts they can’t provide the source record, even if the library is subscribing to that source and theoretically could retrieve the source record directly from their vendor. That’s a pity.

    What has happened to the: “..change in our Primo Central procedures such that the Primo Central team will work with customer focal points on Normalization Rule related issues”, which Shlomo did write about in September. If ExLibris blames on vendor data I want to see the proof, in this case the source record.

Comments are closed.