Primo Central Index from ExLibris as a black box for the buzy librarians

One of the main expert fields for libraries, especially research libraries, is metadata. Libraries, or more exactly librarians, know what problems there are if the metadata are insufficient and we also know what efforts it takes to keep it clean.

At the IGELU 2013 coneference in Berlin the opening keynote speaker Michael Cotta-Schønberg, Director of the University Library in Copenhagen and Dep. Director of the Danish National Library, mentioned the future profession: metadata cleaner. We already have them but this far they’ve been mainly called cataloguers, but we have digital librarians working with metadata also. In late 60s computer programmer Henriette Avram invented MARC format to get some structure of the metadata. It still dominates in libraries but as a format it’s really not enough to describe electronic documents, thats why XML as a tool for different applications have made its entry into the (library) world.

The need of a central datawell is huge in the library world and libraries having the discovery tool Primo has Primo Central Index(PCI). Though it’s just in its infancy and far from optimal still. Librarians love metadata, librarians love having different opinions on metadata, librarians also love autonomy, still librarians love to cooperate with other librarians to solve common problems, but if they not agree they love to make local adjustments. It’s like in the Linux world. If you don’t like the latest developments of you favourite distribution you can clone it and make your own version due to FOSS!

So, how does this refer to Primo Central. In Primo you can build your own pipes, meaning you’re harvesting for example your own catalogue or maybe your own repository. Fine. You set up your own normalization rules, like which fields from this format (MARC, MODS, DC etc) should be written to that pnx fields. This takes some efforts and for smaller libraries, even if they have the competence, it’s time consuming. So, let ExLibris harvest your repository in Primo Central Index and set up the normalization rules. Fine.

But after while even the buzy librarian probably will discover some bad metadata. Buzy librarian report to ExLibris and ExLibris fix it after a while. Buzy librarian see even more crappy data and reports. This time ExLibris says: “Our strategy isn’t to change that rule and because this research database will be used by other customers we have decided not to change the strategic….”. Buzy librarian gets irritated of this bad interpreting of the metadata. He-she can’t see the source record from PCI to find arguments of how the normalization rules(NR) should be changed, though she can guess, buzy librarians are great at guessing. Last thing is that buzy librarian can’t technically within the Primo solution set up own NR matching PCI.

Lets make things more clear with an example. Busy librarian finds a record in PCI deriving from Swedish article database Articlesearch (swe. Artikelsök). Busy librarian is able to view the pnx:

<?xml version="1.0" encoding="UTF-8"?><record xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:prim="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search">
<control>
<sourcerecordid>1947274</sourcerecordid>
<sourceid>btj</sourceid>
<recordid>TN_btj1947274</recordid>
<sourcesystem>Other</sourcesystem>
</control>
<display>
<type>newspaper_article</type>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<creator>&lt;span>Ambjörnsson&lt;/span>, &lt;span>Ronny&lt;/span></creator>
<ispartof>DAGENS NYHETER, 2008</ispartof>
<subject>Rasism ; Antisemitism ; Främlingsfientlighet</subject>
<source>ArtikelSök (BTJ)&lt;img src="http://fe.p.prod.primo.saas.exlibrisgroup.com:1701/primo_library/libweb/BTJ_logo_134x31.jpg" style="vertical-align:middle;margin-left:7px"></source>
<snippet/>
</display>
<links>
<openurl>$$Topenurl_article</openurl>
<openurlfulltext>$$Topenurlfull_article</openurlfulltext>
<addlink>$$Uhttp://www.dn.se$$EView_the_periodical</addlink>
</links>
<search>
<creatorcontrib>Ambjörnsson, Ronny</creatorcontrib>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<subject>Rasism</subject>
<subject>Antisemitism</subject>
<subject>Främlingsfientlighet</subject>
<general>1947274</general>
<general>ArtikelSök (BTJ)</general>
<sourceid>btj</sourceid>
<recordid>btj1947274</recordid>
<issn>&lt;Issn/></issn>
<rsrctype>newspaper_article</rsrctype>
<creationdate>2008</creationdate>
<addtitle>DAGENS NYHETER</addtitle>
<searchscope>btj</searchscope>
<scope>btj</scope>
</search>
<sort>
<title>Skilj på dumheterna : islamofober är inte alltid rasister</title>
<author>Ambjörnsson, Ronny</author>
<creationdate>20080403</creationdate>
</sort>
<facets>
<frbrgroupid>984107692</frbrgroupid>
<frbrtype>6</frbrtype>
<creationdate>2008</creationdate>
<topic>Rasism</topic>
<topic>Antisemitism</topic>
<topic>Främlingsfientlighet</topic>
<collection>ArtikelSök (BTJ)</collection>
<prefilter>newspaper_articles</prefilter>
<rsrctype>newspaper_articles</rsrctype>
<creatorcontrib>Ambjörnsson, Ronny</creatorcontrib>
<jtitle>Dagens Nyheter</jtitle>
</facets>
<frbr>
<t>99</t>
</frbr>
<delivery>
<delcategory>Remote Search Resource</delcategory>
<fulltext>no_fulltext_linktorsrc</fulltext>
</delivery>
<ranking>
<booster1>1</booster1>
<booster2>1</booster2>
<pcg_type>aggregator</pcg_type>
</ranking>
<addata>
<au>Ambjörnsson, Ronny</au>
<atitle>Skilj på dumheterna : islamofober är inte alltid rasister</atitle>
<jtitle>DAGENS NYHETER</jtitle>
<date>20080403</date>
<risdate>20080403</risdate>
<spage>&lt;PageReference/></spage>
<epage>&lt;PageReference/></epage>
<issn>&lt;Issn/></issn>
<format>unknown</format>
<genre>unknown</genre>
<ristype>GEN</ristype>
</addata>
</record>

Busy librarian finds out that the pnx record has flaws. Busy librarian can’t find the date of the article? Busy librarian can find date in addata which is used for references and creationdate in <sort> but it’s for sorting purposes. Busy librarian wonders why? Busy librarian can’t view the source record in Primo Back Office (PBO).  Bad. Busy librarian goes to articlesearch to find out if they have some kind of source record. Busy librarian finds a field for date: 2008-04-03

artsok

Busy librarian reports this to ExLibris via Salesforce and they fix it after some weeks. Busy librarian also reports that he-she want to include the commentary (swe. kommentar) field. ExLibris says no. Fine. Buzy librarian says: I want to set up my own local normalization rule, as I can do with my own pipes. ExLibris says you can’t. It’s not a feature we have. You have to follow us in this case. I say bad. Buzy librarian says I will deactivate Articlesearch in PCI and I will set up my own pipe in Primo. ExLibris says fine, but you need to pay for the amount of records you harvest in your pipes. Buzy librarian says I don’t want to pay when you bring me bad interpretation of metadata and no options to do local normalization rules.

Well, this was just a scenario. I was at IGELU 2013 in Berlin. There was a session about Primo Central by Rachel Kessler. My question was why customers can’t view the source record of PCI? Shlomo Sanders answered that they aren’t always allowed to do that because of contracts.

I asked the same question, just a bit more ingoing, to Shlomo in a postconference e-mail:

When I ask for options to view source records in PCI you told us at IGELU because of contracts we can’t view it. But if the library as a customer is subscribing to for example Web of Science there should be no problem for a library to view the source record. And take all this free data like our repository Diva, there is no contract issues with that data?

Shlomo: Perhaps not, but we cannot decide that by ourselves and must discuss this with each vendor separately. A very daunting task!

My supplement here: remember here that it’s just for viewing purposes in IP-address recognition with password protected Back Office of Primo. Not downloading the record or something.

Next question from me at the Primo Central session was about the libraries need of the feature to edit normalization rules for PCI records when he said: Wow, its 500 millions of records.

I asked a follow-up to that via e-mail: How could that be a problem? You don’t have one single normalization rule for each record, just for each source, like swedish Diva. It’s the normalization rules of for example Diva in PCI we want to edit local. Of course we can report cases again and again in Salesforce concerning metadata but what if ExLibris says: “We don’t want to interpret the data in that way you prefer”.

Shlomo: Perhaps the answer given in the Q&A was not clear enough. We have ~500 million records and plan to increase that significantly. It is not feasible for us to have multiple different Normalized version of these records. We are responsible to update the data in PC and this is done based on the frequency of the data we get from the vendors.

We discuss Normalization, as needed, with Institutions that have made their IR available and will expand these discussion as needed. The suggestion to talk to native non-English speakers when handling non-English data sets (not just IR) is a good idea and we will be following up on this.

Same questions about Primo Central was raised at the Questions and answers session at IGELU when Gilad Gal, as I understood, said there were storage and performance issues due to showing the source record in PCI. Shlomo denies in e-mail:

Shlomo: We are not worried about storage or performance issue related to accessing Source Records of PCI. The issue is contractual.

My last question via e-mail: Does ExLibris still want PCI to become a black box?

Shlomo: No! but we do have contracts. Where possible we will enable access in the future to the data as RDF. We will look into making Source Records of IRs available through URIs.

This issue about RDF and URI was also presented by Shlomo at the SWIG linked open data session at IGELU. We sure have to come back to that issue.

Finally, you buzy librarian working with Primo. Do you want ExLibris PCI to show you the source records in pnx viewer? Do you want the possibility to set up your own rules for PCI records? If, well, lets work for it!

Advertisements

Anti-open source document from SirsiDynix leaked

Awesome librarian Jessamyn West at Librarian.net is blogging about an anti-open source document from SirsiDynix corp. available from WikiLeaks. About the document:

“This document was released only to a select number of existing customers of the company SirsiDynix, a proprietary library automation software vendor. It has not been released more broadly specifically because of the misinformation about open source software and possible libel per se against certain competitors contained therein”.

From the document I have find a lot of interesting statements:

“…it should be noted that it is rare for completely open source projects to be successful. Rather than focusing on best-in-class software choice decision-making, these projects often end up being archipelagos of systems driven by a philosophical principle that is anti-proprietary”.

Apache, Linux, Firefox, Drupal, WordPress… Need we say moore?

“..the number of Linux desktops is meager compared to Microsoft Windows desktops. By choosing a Linux desktop, a user closes the door on some software because it may never be created for or ported to Linux. Add to this the major changes in allied systems that require an adaptation for the ILS and the issue grows exponentially”.

Times are a-changing…

“While some open source ILS companies are offering hosted solutions, these solutions are not at the scale or professionalism of a proprietary SaaS solution, nor do they provide the service level agreements or service expectations that SirsiDynix commits to”.

Was that adressed to LibLime and others? Or even Acquia for Drupal or Canonical for Ubuntu?

“Generally, the available open source ILS platforms have less than half of the features and functions of any SirsiDynix ILS”.

It took at least 2 years for ExLibris to implement an usually common option in their Aleph system that I know a programmer could have fixed in less than a day.

“Proprietary software has more features. Period. Proprietary software is much more user-friendly”.

WordPress is veeeery user-friendly. SharePoint is not user-friendly. RT is not user-friendly. Mac OS is user-friendly. It’s not about proprietary or open source when coming to user-friendliness.

“SirsiDynix consultants have written custom API programs since the company introduced the Application Programming Interface (API) nearly 20 years ago”.

I read at swedish library blog Betabib this observation from the conference Computers in libraries 2009:

“He [Stephen Abram] addressed the issue of SirsiDynix longtime experience of API:s. But when talking to some of their customers they looked at me questioning”.

Even more from the document:

“Some of the most security-conscious entities, like the United States Department of Defense, restrict the use of open source software for fear that it could pose a terrorist opportunity”.

Why did the White House choose open source Drupal? Aren’t they afraid of terrorists?

“Finally, one of the biggest claims of open source proponents is that it is more reliable. They argue that since any programmer can find and fix bugs, the software will be repaired and improved more quickly. There is, however, no guarantee that the bug you want fixed will engage a member of the community to fix it”.

Maybe the IT folks at the White House think they can fix bugs better and even faster within in the community than behind the walls of a proprietary software corporation?

From the end of this document.):

“We’ve [SirsiDynix] led the development of some of the most advanced features and capabilities of ILS platforms. So we know a thing or two about what it takes for library systems to be successful. While we encourage the development of open formats, we must discourage libraries from jumping headlong into an open source platform to operate their ILS system on. At the current production cycle, jumping into open source would be dangerous, at best”.

Every ILS solution is more or less dangerous to speak SirsiDynix language. It’s just that it I like when I can see what is dangerous and share this dangerousness within an open community. Thanks WikiLeaks!

Article on another open source library system called Evergreen

Thanks to Ronald van Dieën from Rotterdam, Netherlands, who pointed me to this article about the open source library system Evergreen called:”Librarians stake their future on open source“. Published 21 December 2006 at Linux.com.

A group of librarians at Georgia Public Library Service have developed their own open source library system which they call Evergreen and version 1.0 was released in november 2006. It’s written in C, JavaScript and Perl, licensed as GPL, runs on servers with Linux/Apache, uses a PostgreSQL database (Ohh, not swedish developed MySQL ;-) etc. Evergreen covers the GPLS’ network of libraries called PINES including 252 member libraries. You can try the PINES catalog.

PINES Program Director Julie Walker says to Linux.com:

“It has really been the easiest conversion I’ve ever been through in my 25 years of working in libraries,”

“Our Sirsi system ran on a great big Sun server that was quite expensive. We poured a lot of money into that over the years to continue to upgrade it, plus the housing of it was very expensive. [Evergreen] runs on a Linux cluster, which is a lot less expensive. Also, we’re not paying licensing fees anymore. When you’re talking 252 libraries, which is what we are today, that’s the great big savings.”

A study that PINES conducted in 2002 showed that if all of GPLS libraries would have to buy a new system, it would cost more than $15 million dollars, plus about $5 million dollars a year for maintenance. GPLS run PINES for a lean $1.6 million a year.

Librarian Brad LaJeunesse, PINES System Administrator with GPLS, says to Linux.com about another open source library system Koha:

“[Koha] wasn’t built with the scalability or deep organizational hierarchy that PINES requires. It would work fine for a 10-branch library system, but not for a statewide system.”

Good point if it’s applicable!

RedLightGreen emerging with WorldCat from OCLC

RedLightGreen (RLG) has decided to emerge with WorldCat because they think it’s better: “…developing and supporting a single product rather than continuing to support two”. The citation formatting from RLG is not supported in WorldCat but will do in 2007. Other functions similar in both RLG and WorldCat are: FRBRization of results, ranking (not just relevancy) and faceted display.
Read more about the features and the incorporation of ReadLightGreen into OCLC’s WorldCat. RLG will end Novemebr 1, 2006.

Three more libraries use open source library system Koha

Three more special library collections have migrated to the open source Integrated Library System Koha, according to LibMe LibLime,, a leader in open source solutions for libraries. Libraries are Native Village of Afognak Library in Alaska, USA; the Alaska Statewide Mentor Project also in Alaska, USA; and the Childcare Resource and Research Unit, a resource room at the University of Toronto, Canada. Try a demo of the web-OPAC of Koha at LibMe.
I wonder if there is any library in Sweden using open source for library systems? Mostly it’s proprietary systems like Voyager, Aleph, Axiell, Horizon etc. If you know libraries with Open Source library systems in Europe, just write a comment!