Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
RDM, Big Data, and Reproducibility: Teaching Open Science Locally
Anna Sackmann, Rick Jaffe
University of California, Berkeley, United States of America
UC-Berkeley’s Research Data Management (RDM) Program is an instructional partner in a five-year, NIH Big Data to Knowledge (T32) grant that teaches predoctoral students data science approaches to biomedical research. To incorporate RDM-related topics, instructors have developed a flexible, three-part curriculum that spans the health and physical sciences with a focus on the big data lifecycle and sustainable practices for open science and reproducibility. Learning materials address the FAIR principles, repository selection, data documentation, data publication, and guidelines for identifying and handling sensitive data.
The effort spotlights the challenge of translating rich RDM knowledge and practice to learners in a particular local context. Instructors wrestled with anticipating the students’ varying amounts of background knowledge, providing in-depth training outside of an active project, and generating excitement about long-term data reuse despite its central role in biomedical sciences. These struggles notwithstanding, the project has immersed the RDM Program in big data, expanding its capacity in this important new component of data management. Additionally, the program has developed a collaborative relationship that promises to continue beyond the grant and influence how reproducibility, open science, and data management is taught in UC-Berkeley’s Data Science communities.
DataCrate - a progress report on creating a data packaging format for research data
University of Technology Sydney, Australia
This presentation is about a developing standard for research data packaging, DataCrate, which will be at version 0.2 or higher by the time of OR 2018. The purpose of DataCrate is to allow distribution of data sets via a single file (using Zip, TAR or a disc image format as appropriate) and/or via a URL with integrity checks. Another goal is to be able to host a data set on a web server with appropriate access controls in a way that allows people to inspect the data set via an HTML page which summarises the data set and (optionally) describes the contents in detail, file by file or directory by directory.
The aim is to maximise the utility of the data for researchers (including researchers’ ‘future selves’). Given that a researcher has found a DataCrate package they should be able to tell what it is, how the data may be used and what all the files contain, to enable discovery of the data by exposing metadata as widely as possible and to enable automated ingest into repositories or catalogues.
DataCrate uses existing standards including schema.org and ontologies from the SPAR ontologies, as well as Bagit for data packaging.
Sustaining the momentum, moving the DataVault project to a service
Claire Knowles1, Mary McDerby2, Robin Rice1, Thomas Higgins2
1University of Edinburgh, United Kingdom; 2University of Manchester, United Kingdom
The challenges of implementing a service for long term storage and management of research data that is both resilient and integrated into a wider research data management service will be discussed, focusing on:
How the DataVault service fits into the wider Research Data Service portfolios of the University of Edinburgh and University of Manchester.
How the DataVault is being implemented as a sustainable service, including: policy decisions, pilot users, and ‘learning as a service’.
The challenges and decisions made to ensure the sustainability of research data deposits to the DataVault, including infrastructure decisions, review processes, system interoperability, and metadata.
How the development of the project in the open between two partner institutions has impacted the software development.