Developer track 2: Data
Wednesday, 28/Jun/2017:
3:30pm - 5:00pm

Session Chair: Heather Todd, University of Queensland
Location: Queen's Ballroom
Hilton Brisbane

Mining linked data from text for discovery

Conal Tuohy, Australia

APO Australia Policy Online <> is a curated repository of public policy and practice resources (grey literature and a range of other content types) - from academia, government, think tanks, civil society organisations and industry. To produce additional discovery metadata for items in its collection, APO is experimenting with the use of automatic methods; initially Named Entity Recognition, and subsequently topic modelling.

In the experimental system developed as part of an Australian Research Council LIEF grant, texts are extracted from the repository (a Drupal 7 instance); the texts are mined; the mined dataset is expressed as RDF (Linked Data), and finally the dataset is re-integrated with the repository

In order to maximize flexibility and to facilitate reuse of both the software and the metadata, the metadata creation system is loosely coupled with the APO repository, and the metadata expressed in RDF.

The presentation will focus on the overall system architecture, and on the challenges of using RDF as a way to express metadata which has been generated using standard text mining tools.

Visualising Research Graph using Neo4j and Gephi

Amir Aryani1, Jingbo Wang2, Hao Zhang1, Andy Xiang1, Zhaolian Zhou1, Kun Wang1

1Australian National University, Australia; 2National Computational Infrastructure (NCI)

The goal of this presentation is to provide an insight into the potential interoperability between open scholarship systems. We demonstrate how to export the publication metadata from DSPACE repository and link this information to ORCIDs (Researchers), Funding Records (grants) and research data (data in research) using the Research Graph model and open source software. Furthermore, we demonstrate how to transform this information to Neo4j graph database that enables us to run queries such as finding related publications to a grant with multiple degrees of separation. Finally, we will use the Gephi visualisation tool to plot the large graph and identify the clusters of research activities.

Static Repositories for Research Data

Peter Sefton, Michael Lake, Michael Lynch

University of Technology Sydney

This technical presentation discusses the use of static file-system based repositories for research data; drawing on (a) recent trends in web publishing where static websites are being used to reduce complexity and risk in content management, (b) researcher practice, where repositories of data are often organised using files rather than a repository application, particularly where data needs to be used in a High Performance Computing (HPC) environment and (c) historical approaches to repository development like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Static Repository.

One of the authors (Lake) developed a simple static repository system for speleological (cave) data in NSW, known as the Cave Archive and Versioning Experiment (CAVE) we will use Lake's repository as an example and show how the techniques he developed for collecting metadata about static files and building a static website from them can be generalised, using high-quality linked-data metadata.

