With the rising popularity of Open Access, organizations expect their OAI repositories to be highly dependable. The repository must be able to deal with millions of records and respond quickly to frequent requests from Service Providers.
The Meresco community followed these developments by continuously improving Meresco’s OAI components. During this process, compliance to the OAI-PMH specification grew to near 100% and new specialized indexes were added to keep query response times well under one second.
Back in 2007 the first OAI-PMH repository components were implemented in the LOREnet project. The 16 components were reduced to 8 in the OpenER project for the Open University. These 8 components still exists but some of them were significantly refactored to keep up with load and volume requirements. End 2008, Berkely DB replaced Lucene, making it respond much faster in the presence of from and until request parameters. In 2009, huge amounts of sets in the LOREnet project required an even more specialized index to maintain query response times.
Today, several multi-million repositories are in use by, among others, Sound & Vision (Beeld en Geluid) and the University of Tilburg (UvT). These two are examples of stand-alone repository implementations. LOREnet and EduRep are examples of repositories integrated in, respectively, a portal and a search engine.
Indexes and Storage
Initially, creating a repository was straightforward using Meresco’s existing storage and Lucene index components. The new specialized indexes for OAI were also made available as reusable components. This extends the range of available indexes, which are now: Full text (Lucene), Facets, Range and Dictionary (BerkelyDB and BurstTrie).
Repositories, Search Engines and Archives
Using the available index and storage components, a repository is just as easily created as a Search Engine or a complete Archive. After all, these are quite similar things. Any repository needs a storage, but also an index for maintaining it. Similarly every search engine needs a index but also a storage to obtain the result records from. And an archive is yet another combination of storage and index, but with different intentions.