GT20: Taking Data to the Next Level
Wednesday, 06/Jun/2018:
3:30pm - 5:30pm

Session Chair: Holly Mercer, University of Tennessee
Location: Ballroom A
Ballroom A is the largest single room meeting space in the SUB. Will fit the whole conference. Live streaming.

A Prototype for the Institutional Research Data Index

Sara Mannheimer, Jason A. Clark, James Espeland

Montana State University, United States of America

Most out-of-the-box institutional repository systems don’t provide the workflows and metadata features required for research data. Consequently, many libraries now support two institutional repository systems—one for publications, and one for research data—even when there are nearly a thousand data repositories in the United States, many of which provide services and policies that ensure their trustworthiness and suitability for institutional research data. Libraries are either increasing spending by purchasing data repository solutions from vendors, or replicating work by building, customizing, and managing individual instances of data repository software. This presentation suggests a potential solution to this issue: a prototype for an open source Institutional Research Data Index (IRDI) that promotes discovery and reuse of institutional datasets through automatic metadata harvesting and search engine optimization. IRDI could lead to a single, unified index for academic institutional research data. A unified data index would lead to three key impacts: increasing discovery, reuse, and citation of open research data; reinforcing the idea that research data is a legitimate scholarly product; and promoting community-wide systems that require less resource expenditure.

Introducing ReDBox 2: a user-centred community-driven open source Data Management application, with services to provision and track research workspaces

Peter Sefton1, Gavin Kennedy2

1University of Technology Sydney, Australia; 2Queensland Cyber Infrastructure Foundation

ReDBox (Research Data Box) is a mature open source software application in use in Australian institutions since 2010 for the planning, management and storage of research data. ReDBox is managed by the Queensland Cyber Infrastructure Foundation (QCIF), a not-for-profit eResearch organisation. It was initially funded through Australian eResearch infrastructure grants, and is now sustained by QCIF via support based subscription service and client funded customisation projects. ReDBox 2 is a renewal of the base software platform and incorporation of new data life cycle services and functions. This new development has been undertaken by QCIF and is funded by the University of Technology Sydney and Research Data Services.

We will describe how the architecture of the platform supports the research data lifecycle, highlighting two innovative features: research data management planning and a service catalogue for provisioning workspaces such as file-shares. We will cover the evolution of the project from its start as a linked-data research data registry, to its current form where it has a researcher-centric focus on supporting data management activities across the research lifecycle. The presentation will provide insights into the sustainability model used to support the development of the open source and plans for the future.

Data2Paper: Giving Researchers Credit for their Data

Neil Stephen Jefferies1, Anusha Ranganathan2, Fiona Murphy3, Thomas Ingraham4, Hollydawn Murray4

1University of Oxford, United Kingdom; 2Digital Nest Ltd; 3University of Reading; 4F1000Research

Started as part of the Jisc Data Spring Initiative, a team of stakeholders (publishers, data repository managers, coders) has developed a simple ‘one-click’ process for submitting data papers related to material in a DataCite/ORCID compliant repository. Data papers cover methodological detail that is not otherwise captured and published in traditional journal articles and/or dataset metadata. As such, it can improve the findability, reusability and reproducibility of the underlying dataset, as well as providing additional scope for citation and credit. The aim is to provide a positive incentive for data deposit.

DataCite and ORCID information is transferred from a data repository via a SWORD-based API to a cloud-based helper application based on the Fedora/Samvera platform. There, the user can select a journal and download a partially filled paper template using metadata drawn from DataCite and ORCID. Later the completed the text of the data paper is combined with additional metadata to generate a package suitable for automatic transfer into a journal submission platform without further user interaction. By reusing metadata from ORCID and DataCite that has already been previously entered/curated, the process is both simplified and made less error prone.

Curation and Publication Pipelines for Simulation Datasets in DesignSafe-CI, an Open Platform and Repository for Natural Hazards Engineering Data

Maria Esteva1, Craig Jansen1, Arduino Pedro2, Kulasekaran Sivakumar1, Balandrano Coronel Josue1

1Texas Advanced Computing Center, University of Texas at Austin, United States of America; 2Civil and Environmental Engineering, University of Washington, United States

Publishing simulation datasets in open repositories is challenging due to the iterative nature of the simulation process which generates large numbers and sizes of files, and the complex documentation required to describe them. DesignSafe-CI (DS-CI) is an open data management, analysis, and repository platform for natural hazards engineering data. To design curation and publication pipelines for simulation data we worked closely with experts in the space using interactive interface mock-ups and mapping to those real data cases. Based on their input we created a data model that captures the main processes, data components, and documentation involved in simulations. The data model was the foundation for the design of interactive functionalities to select, categorize, and register relations between input and output files, and of metadata to describe the simulations. The interactive functions are implemented within workspaces for teams to curate data progressively. The datasets landing pages show the structure of the simulation for data understandability and ease of access. The new simulation publications will be evaluated through continuous feedback from the experts and the broader community of users. Involving them in the design process increased their interest in publishing and reusing data.

