Publishing Data: Workflows Case Statement
Researchers are increasingly encouraged or required to make their research data available for reuse but might often feel there are insufficient incentives for submitting and publishing data, resulting in low submission rates. Moreover, even when research data are preserved and submitted, it often happens with a bare minimum of metadata which inhibits reuse.
Why is this? There are established and/or emerging workflows for selected disciplines that enable the publishing of data and some provide credit via citation mechanisms. But in most disciplines researchers are simply not aware of such workflows and they may not be applicable without significant modification. Having information about workflows is therefore crucial for researchers—and the people/stakeholders supporting them—to understand the options available to practice open science. Workflows that enable persistence, quality control and access are all crucial to enhance the possibilities for greater discoverability as well as efficient and reliable reuse of research data.
The objectives of this Working Group are to provide an analysis of a representative range of existing and emerging workflows and standards for data publishing, including deposit and citation, and provide reference models and implementations for application in new workflows.
We will report on:
- Investigation and classification of current workflows for publishing data - including a brief gap analysis across disciplines for the identified use cases.
- Identification of a smaller set of reference models covering a range of such workflows to include:
- when and where QA/QC and data peer-review fit into the publishing process (the broader subject of peer review itself is proposed as a future separate working group)
- the role of researchers, institutions, data centers, publishers, funders, service providers and the wider community in the data publishing process
- key barriers for identified use cases
- Selection of key use cases and organizations in which components of a reference model can be applied and promoted to the wider community, working closely with other WGs under the Publishing Data Interest Group. We will build on the work of major past and current initiatives in which many of the working group members played leading roles. While these initiatives essentially focused on mature examples in particular in the Earth Sciences the work of this group will address a much more comprehensive and multi-disciplinary range of use cases, will classify workflow steps, components, and roles and eventually produce generic workflow models which towards the end of the project will be ready for testing and application in particular in the context of the ICSU World Data System and major science publishers.
- Provide a report summarizing the results of the investigation of current workflows including gap analysis
- A classification of a representative range of workflow models, in each case identifying the varying stakeholders and their different roles and responsibilities, to include where possible the likely associated resource and cost implications (working with relevant proposed RDA-WDS Costs of Publishing Data WG)
- Reference models summarizing key characteristics for each class of workflow
- Implementation of key components of a reference model to an existing use case(s) in order to illustrate the benefits to researchers and organizations of the reference model and the associated implications for the Working Groups on Costs, Publishing Services and Bibliometrics.
Research communities and their institutions are considering—or in fewer cases are already implementing—workflows on their campuses or using platforms, such as discipline specific or national data centers, to allow their users to share and publish their research data. Many of them have to reinvent the wheel as there is no central resource or knowledge base to guide their efforts. Generic workflows, individual use cases or best practices for publishing data would aid them in establishing appropriate solutions that might include local systems enabling data deposit.
Research data are usually part of a network of scholarly objects, e.g. documentation, lab books or journal articles. It is expected that such clusters of information will continue to evolve and become more complex in the future as users expect to navigate seamlessly within them. They want to discover and access related information without major additional effort. This can only be facilitated by building a detailed understanding of the workflows and publishing outlets available right now. The main challenge will be identifying generic model elements while accounting for discipline specific features. We will cover different steps in the research lifecycle, as needed, e.g. from depositing data in repositories to dedicated data centers, data articles and journals. Identifying the steps in publishing workflows and who is responsible for various tasks can dispel some of the uneasiness for those encountering data publication for the first time as well as offer guidance across emerging and established tools being used in more advanced data sharing communities.
In classifying the current workflows, we will establish general models that allow for the individual imprints from various communities. The result will be reference models and components offering guidance for the wider community, from beginners to more advanced data publishers. This resource will be of use for any stakeholder group involved in publishing data. Repositories are often not aware of journal workflows and vice versa; understanding other parts of this complex endeavor helps each party see its role in the wider context. It is also useful in setting up mechanisms to link data and publications. Librarians have a role here in supporting researchers to find repositories to deposit in that are relevant both to their discipline and their publishing intentions, and offering guidance on the respective journal and repository workflows. The consortium of this working group comprises representatives of all these stakeholder groups to ensure the coverage of the wide range of use cases/best practices and viewpoints already emerging.
One important part of the work in the analysis of workflows will comprise the workflows for the usage of persistent identifiers, in particular Digital Object Identifiers (DOIs), which enable persistent links between digital objects, as well as accurate data citation. Data publication that enables data citation is a key incentive to make data accessible. Furthermore, such persistent identifiers allow an interoperable framework across platforms, publishers, repositories and others.
We plan to build on this in the second phase to test real implementation(s) of generic workflow model components in new scenarios. This offers a mutual benefit for both the provider who can test applicability and promote awareness of tools and for those implementing new workflows and who become able to offer their communities the benefits of publishing data.
Who will benefit
The main beneficiaries of the analysis and subsequent testing provided by this working group are the researchers and the main stakeholders involved in publishing and managing data, as well as in supporting scholarly communication. Better services and strategies for joint workflows will consequently influences the wider research communities. Discoverability and reuse of data will be enhanced, in particular through the unique collaboration between all relevant service groups participating in this group.
Authors will have clear channels for publishing data available to them and, crucially, will be able to derive credit from adhering to best practice in managing and sharing their data. Funders will be able to track the research data they have funded, measure its impact and guard it against repetition. Researchers will be able to work faster and achieve deeper insights outside their immediate subject domain. Librarians and data center experts become an integral part of the ecosystem, e.g. through their expertise in cataloguing and metadata production and reference models for workflows are templates for ingest, QA, archiving and dissemination. Publishers and other service providers can use reference models for linking data with publications and provide innovative solutions to enhance access to and analysis of the published data. Workflows for data and metadata exchange between the stakeholders who hold it will help policy makers, funders and the public better ensure that the data underpinning published research is being made accessible cost-effectively. Policy makers and the public will be able to navigate the knowledge landscape with increased confidence in its veracity.
After the first phases, the identified use cases referenced to generic model elements where appropriate, will allow for a unique assessment of data publishing workflows today. This will directly inform and influence the work of all participating stakeholder groups, from repository providers to journal editors. The first steps enable an information exchange beyond the individual stakeholder groups and thus enable the adoption of best practices from other disciplines or joint workflows.
The proposed test implementation(s) of generic workflow model components in new scenarios benefit providers and users as explained previously and enables communities to meet key international and national government mandates to enable and incentivize data sharing for the benefit of all.
Working closely with the associated RDA-WDS Publishing Data Working Groups proposed on Bibliometrics, Costs and Publishing Services, we will identify the role and implementation of emerging metrics and impact/assessment tools in our test workflows and disseminate best practice. This will further the advancement of data aware incentive systems in research.
Download the full Workflows WG Case Statement.