Design and Development of a Web Based Digital Repository for Scholarly Communication

  • cc icon
  • ABSTRACT

    Institutional repositories are essential research infrastructures for research ‐based universities. A properly dimensioned institutional repository has the potential to increase research impact and enhance the visibility of an institution through its scholarly outputs. The aim of the study reported in this paper was to design and develop a web‐based digital repository for scholarly communications using NM‐AIST as a case study. The system was developed using open source software. Findings obtained from system validation tests show that the system is a viable solution to the major challenges encountered in the management and sharing of scholarly information at the institution.


  • KEYWORD

    Institutional Repositories , Research Infrastructure , Scholarly Communications , Web­based , Open Source Software

  • 1. Introduction

    An institutional repository (IR) is a system that collects, preserves, manages, and provides access to intellectual products of a community (Hockx, 2006). Institutional intellectual products may include faculty work, student theses and dissertations, e‐journals, datasets, and so on. IRs provide a mechanism for an institution to showcase its scholarly output, centralize and introduce efficiencies to the stewardship of digital documents of value, and respond proactively to the escalating crisis in scholarly communication (Gibbons, 2004). The availability of open‐source repository systems has encouraged and led to the proliferation of IRs worldwide, particularly among academic and research institutions. The following are the benefits behind establishing IRs:

    The growing trend towards online scholarly communication and lack of scholarly content management systems among universities has made digital repositories more important for the collection and distribution of scholarly materials (Budapest, 2002; Chan, 2004; Lynch, 2005). Today, digital repositories are used at academic institutions to store and disseminate scholarly outputs of universities (Lynch and Lippincott, 2005).

    In the beginning, repository systems were developed as a hosted online solution for collecting, preserving, and disseminating scholarship of universities, colleges, and other research institutions. Recently, software have been developed and repositories have evolved into a publication platform for institutions to showcase their scholarship including articles, books, theses, dissertations, and journals. The number of repository platforms has also increased, and the choice of which to use depends on benefits and technical criteria (Bankier, 2014).

    The idea behind establishment of repository software platforms was that the software be open source and locally installed. This approach offered unlimited flexibility for developers to customize them, which made interoperability a problem. The platforms have now been enhanced to include features that require no extra customization. Potential high maintenance costs also led many institutions to move to open source software.

    Today, institutional repository platforms have richer feature sets never witnessed before. The software are openly available and have a wider support from the global community of developers. Universities are free to compare different platforms depending on the features that best address their needs and that would make their repositories more successful (Armbruster and Romary, 2009). Generally, an institutional repository centralizes, preserves, and makes accessible the scholarly works generated by academic institutions, and form part of a larger global system of repositories which are indexed in a standardized way and searchable using a common interface (Sefton, 2009).

    While reviewing the status of open access repositories in Tanzania, Mgonzo and Zaipuna (2014) reported that attitudes and web usage behaviour of users have an impact on the performance of IRs. In a related work, lack of resource sharing policy and lack of proper digital asset management systems have been pointed out as the major factors that hinder the adoption of open access repositories in Tanzania (Mgonzo and Zaipuna, 2014). As a response to these challenges, this paper presents the design and implementation of DSpace@NM‐AIST, a web‐based digital repository for scholarly communication proposed for The Nelson Mandela African Institution of Science and Technology (NM‐AIST). The system is implemented using DSpace repository software. It is not the intention of this paper to show how well the design of the proposed system is, but how well it addresses the challenges identified and how usage behaviour affects its success.

    The paper is organised into seven sections. An overview of the Dspace Repository System is given in Section 2. Section 3 covers Materials and Methods, Section 4 presents the results, Section 5 presents System Design, Section 6 covers System Implementation, and Section 7 has the Conclusion and Recommendations.

    2. Overview of DSpace Repository System

    DSpace is an open source repository development software typically used for creating open access repositories for scholarly and published digital content. A repository is a system for delivering digital content to end‐users. Global statistics show that, DSpace is the most widely used open source repository software for institutional and open access repositories. High use of the software has been observed in universities and research‐based institutions as a way to provide access to research output, scholarly publications, and more (Smith et al., 2003). Usage Statistics show that out of 2792 repositories worldwide, 1159 (42%) are using Dspace software (OpenDOAR, 2014). This is the main reason why Dspace software was chosen to implement Dspace@NM‐AIST. Also the suitability for a stable repository system is another factor that favoured its choice (Lewis et al., 2012).

    DSpace supports Qualified Dublin Core metadata by default and is oriented towards open standards and protocols, and therefore, fully supports the Open Archive Initiative for Metadata Harvesting Protocol (OAI‐PMH). The search engine is based on Lucene, a popular and powerful open‐source engine. In fact, the DSpace software has proven to be a solid repository platform since its launch. That is why it remains promising and competitive amidst other software platforms, like its follower Eprints which currently has 381 (14%) repositories out of 2792 worldwide (OpenDOAR, 2014).

    Content in DSpace is at the highest level organized into communities. At an institutional level, communities could be departments, labs, research centers, or schools. Communities, in turn, each have collections that contain logically‐related material, the items or files. For example, a technical report series might be a collection, which contains items, a grouping of content and metadata that users access as scholarly materials. Items may take the form of a research article, theses or dissertations, or a technical report together with a dataset used in experiments described by the report. Communities and Collections are used within DSpace to provide the repository with an easy to navigate structure often representing an institution’s organizational makeup.

    The Dspace repository architecture follows a three‐layer model, which is composed of the presentation layer, a repository management layer, and a storage layer (Gao and Krogstie, 2010). The storage layer consists of a relational database for storing metadata and a bitstream storage module for storing content data. The repository management layer contains the modules that perform the business logic of the system. The presentation layer of the Dspace platform is the services layer representing the Web user interface (Bass et al., 2002). Figure 1 illustrates the Dspace system architecture.

    3. Materials and Methods

    This section describes the study methodology and materials used. A case study approach was adopted. A comparative survey of repository software was undertaken to select the best repository software to use. Data used in the survey were collected from the Directory of Open Access Repositories (OpenDOAR). Visual Paradigm for UML software was used to describe the system overview through DFDs. Several supporting open source software were chosen and integrated in the repository system. These include apache Tomcat, apache ant, apache maven, PostgreSQL relational database, and Java Development Kit (JDK).

    4. Results

    In choosing the best software to implement the repository system at NM‐AIST, various literature were consulted. Findings in Bankier (2014) and Armbruster and Romary (2009) reports high usage of DSpace followed by Eprints which is next to DSpace in popularity although it is not as widely used as DSpace. Results from the survey conducted revealed that out of 2729 repositories worldwide, Dspace has the highest usage (42.3%) followed by EPrints (14.0%). The results are summarized in Table 1.

    5. System Design

    A system design should specify in detail how the parts of an information system should be implemented. For the case of Dspace@NM‐AIST, dataflow diagrams (DFD) were used in the design part. DFDs are one of the main methods for analyzing data oriented systems because they emphasize the logic underlying the system. DFDs are common tools for structuring information (Valacich et al., 2012). They generally illustrate how data is processed by a system in terms of inputs and outputs, and are used to create an overview of the system and can be also used for visualization of data processing. There are two common notations available for representing DFDs. These are the Gene‐Sarson notations and the Yourdon & Coad notations. In this paper, Gane‐Sarson