PLOS’ New Data Policy: Public Access to Data

February 24, 2014 Liz Silva Aggregators Open Access

UPDATE 7 MARCH: Please see new blog post

UPDATE 26 FEBRUARY : A flurry of interest has arisen around the revised PLOS data policy that we announced in December and which will come into effect for research papers submitted next month. We are gratified to see a huge swell of support for the ideas behind the policy, but we note some concerns about how it will be implemented and how it will affect those preparing articles for publication in PLOS journals. We’d therefore like to clarify a few points that have arisen and once again encourage those with concerns to check the details of the policy or our FAQs, and to contact us with concerns if we have not covered them.

Is the policy about what to share, or about how and where to share it?

There is nothing new in the policy about what types and forms of data should be shared. As we said in December, “PLOS journals have requested data be available since their inception, but we believe that providing more specific instructions for authors regarding appropriate data deposition options, and providing more information in the published article as to how to access data, is important for readers and users of the research we publish.” As we have further clarified, “the Data Policy states the ‘minimal dataset’ consists “of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. This does not mean that authors must submit all data collected as part of the research, but that they must provide the data that are relevant to the specific analysis presented in the paper.” The ‘minimal dataset’ does not mean, for example, all data collected in the course of research, or all raw image files, or early iterations of a simulation or model before the final model was developed. We continue to request that the authors provide the “data underlying the findings described in their manuscript”. Precisely what form those data take will depend on the norms of the field and the requests of reviewers and editors, but the type and format of data being requested will continue to be the type and format PLOS has always required.

What is changing is that authors need to indicate where the data are housed, at the time of submission. We want reviewers, editors and readers to have that information transparently available when they read the article. We strongly encourage deposition in subject area repositories (such as GenBank for sequences, clinicaltrials.gov for clinical trials data, and PDB for structures) where those exist, and in unstructured repositories such as Dryad or FigShare where there is no appropriate subject-domain repository. Some institutions provide appropriate centralized repositories for their researchers’ data; We recognize that for those with small amounts of data, they may be wholly included within the article itself as they are now, and that for some other smaller data types it might be most appropriate to include Supplementary Files with the article – although we would also like to ensure these files are used optimally.

What if my dataset is too large for any of these solutions?

We appreciate that some people now work with datasets that are too large for any of these solutions, and would like to work with them to develop methods of sharing that work in these instances. Authors should submit their manuscripts, noting the details of their situation, and we will work with you to arrive at a solution.

What about human patient data?

Like some other types of data, it is often not ethical or legal to share patient data universally, so we provide guidance on the routes available to authors of such data, and we encourage anyone with concerns of this type to contact the journal they would like to submit to, or the data team at [email protected].

Concerns about someone else benefiting from the data

Some raise the concern that, having collected data, they want to be the ones to analyze it and benefit from it. In our view, this sentiment applies to the period before publication. But after publication (in particular, after publication in an Open Access journal) the data should be available for re-use by others. This is not just our view: many institutions and funding agencies (e.g. NIH) now make data sharing a requirement. We understand that some authors will not want to share data, just as some choose not to make their articles available Open Access, but trust that most authors publish their work precisely in order to allow others to benefit from it.

Liz Silva, PLOS ONE
Theo Bloom, PLOS Biology
Emma Ganley, PLOS Biology
Maggie Winker, PLOS Medicine

ORIGINAL POST: Access to research results, immediately and without restriction, has always been at the heart of PLOS’ mission and the wider Open Access movement. However, without similar access to the data underlying the findings, the article can be of limited use. For this reason, PLOS has always required that authors make their data available to other academic researchers who wish to replicate, reanalyze, or build upon the findings published in our journals.

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

~~What do we mean by data?~~

“Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances.” Examples could include spreadsheets of original measurements (of cells, of fluorescent intensity, of respiratory volume), large datasets such as

next-generation sequence reads, verbatim responses from qualitative studies, software code, or even image files used to create figures. Data should be in the form in which it was originally collected, before summarizing, analyzing or reporting.

What do we mean by publicly available?

All data must be in one of three places:

the body of the manuscript; this may be appropriate for studies where the dataset is small enough to be presented in a table
in the supporting information; this may be appropriate for moderately-sized datasets that can be reported in large tables or as compressed files, which can then be downloaded
in a stable, public repository that provides an accession number or digital object identifier (DOI) for each dataset; there are many repositories that specialize in specific data types, and these are particularly suitable for very large datasets

Do we allow any exceptions?

Yes, but only in specific cases. We are aware that it is not ethical to make all datasets fully public, including private patient data, or specific information relating to endangered species. Some authors also obtain data from third parties and therefore do not have the right to make that dataset publicly available. In such cases, authors must state that “Data is available upon request”, and identify the person, group or committee to whom requests should be submitted. The authors themselves should not be the only point of contact for requesting data.

Where can I go for more information?

The revised data sharing policy, along with more information about the issues associated with public availability of data, can be reviewed in full at:

http://www.plos.org/data-access-for-the-open-access-literature-ploss-data-policy/

http://www.plos.org/update-on-plos-data-policy/

Image: Open Data stickers by Jonathan Gray