Why should I share my final research data?
Data sharing achieves many important goals for the scientific community, such as
reinforcing open scientific inquiry
encouraging diversity of analysis and opinion,
promoting new research, testing of new or alternative hypotheses and methods of analysis
supporting studies on data collection methods and measurement
facilitating education of new researchers
enabling the exploration of topics not envisioned by the initial investigators
permitting the creation of new datasets by combining data from multiple sources.
Who benefits from data sharing?
Everyone benefits, including investigators, funding agencies, the scientific community, and, most importantly, the public. Data sharing provides more effective use of NIH resources by avoiding unnecessary duplication of data collection. It also conserves research funds to support more investigators. The initial investigator benefits, because as the data are used and published more broadly, the initial investigator's reputation grows.
Is data sharing widely accepted as a good practice?
National scientific organizations have made a commitment to the sharing and archiving of data through their ethical codes (e.g., the American Sociological Association) or publication policies (e.g., the American Psychological Association). More than 15 years ago, the National Academy of Sciences described the benefits of sharing data. (See http://books.nap.edu/catalog/2033.html) For many years, the National Science Foundation (NSF) Economics Program has required data underlying an article arising from an NSF grant to be placed in a public archive. Similar expectations exist at the National Institute of Justice. Moreover, many scientific journals require that authors make available the data included in their publications. In the biological sciences, protein and DNA sequences are made available to researchers through data archives, such as GenBank. Since 1996, NIH has required data sharing in several areas, such as DNA sequences, mapping information, and crystallographic coordinates.
What do you mean by final research data?
By "final research data", we mean recorded factual material commonly accepted in the scientific community as necessary to validate research findings. Final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.
Does "final research data" include data that were not originally produced under an NIH grant or contract?
Sometimes. For example, where NIH support is sought to transform or link datasets (as opposed to producing new data), the investigator should include a data-sharing plan in the application.
What do you mean by unique data?
By "unique data" we mean data that cannot be readily replicated. Examples of studies producing unique data include: large surveys that are too expensive to replicate; studies of unique populations, such as centenarians; studies conducted at unique times, such as a natural disaster; studies of rare phenomena, such as rare metabolic diseases.
What kinds of data are candidates for sharing?
Potentially all kinds of data are candidates for sharing, but unique data are especially important. Some biologic sciences already have data-sharing plans in place, such as genetic mapping. But other basic science data are also amenable to sharing. Data from human subjects (e.g., surveys, clinical studies) also can be shared if the identity and privacy of research participants can be protected.
Can you give me some examples of data that have been shared?
Examples of shared epidemiologic data include the Framingham Heart Study, the Honolulu Heart Program, the Atherosclerosis Risk in Communities, Epidemiology of Chronic Disease in the Oldest Old, and the Iowa 65+ Rural Health Study. Examples of shared data from clinical trials include the Asymptomatic Cardiac Ischemia Pilot, the Intermittent Positive Pressure Breathing Study, and the Safety and Efficacy Trial of Zidovudine for Asymptomatic HIV Infected Individuals. Examples of shared datasets from the basic sciences include a growing number of genome sequences and maps, as well as protein and nucleotide databases (see ENTREZ http://www.ncbi.nlm.nih.gov/Database/index.html and other resources for molecular biology at the National Center for Biotechnology Information at http://www.ncbi.nlm.nih.gov)
Data from my studies are generated from a very small number of rats, and I publish the final data. Am I expected to provide these data to other investigators as well?
Publishing these final data constitutes an acceptable mechanism for sharing data.
How soon after data collection am I obliged to share the final data?
Recognizing that the value of data often depends on their timeliness, data sharing should occur in a timely fashion. NIH expects the timely release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset. This time point will be influenced by the nature of the data collected. Data from small studies can be analyzed and submitted for publication relatively quickly. If data from large epidemiologic or longitudinal studies are collected over several discrete time periods or waves, data should be released in waves as data become available or main findings from waves of the data are published. NIH recognizes that the investigators who collected the data have a legitimate interest in benefiting from their investment of time and effort. NIH continues to expect that the initial investigators may benefit from the first and continuing use, but not from prolonged exclusive use. While NIH also understands that an institution's desire to exercise its intellectual property rights may justify a need to delay disclosure of research findings, a delay of 30 to 60 days is generally viewed as a reasonable period for such activity.
Does data sharing pertain only to published data?
No. Data-sharing plans should encompass all data from funded research that can be shared without compromising individual subjects' rights and privacy, regardless of whether the data have been used in a publication. Furthermore, data sharing prior to the publication of major results is encouraged in many instances, for example, when data are collected to provide a resource for the scientific community (as in the case of many large surveys).
Due to circumstances beyond my control (an earthquake!), I was unable to recontact a substantial portion of the sample in my longitudinal study. I was planning to put my data in an archive, but the resulting high rate of attrition makes the data minimally useful. Should I still archive the final dataset?
Investigators need to find a balance between the value of the final data and the costs associated with archiving. If the data are of limited usefulness, then it is probably not worth the expense and effort of putting them in an archive. However, if the investigator has published results based on this dataset, then the dataset should be shared.
I am preparing an SBIR application. Am I required to submit a data-sharing plan?
Yes. The specific nature of the data you will collect will determine whether or not you may share the final dataset. If the final data are not amenable to sharing, for example, if they are proprietary, then you need to explain this in your application. Under the Small Business Act, SBIR grantees may withhold their data for 4 years after the end of the award. The Small Business Act provides authority for NIH to protect from disclosure and nongovernmental use all SBIR data developed from work performed under an SBIR funding agreement for a period of 4 years after the closeout of either a Phase I or Phase II grant unless NIH obtains permission from the awardee to disclose these data. The data rights protection period lapses only upon expiration of the protection period applicable to the SBIR award, or by agreement between the small business concern and NIH.
I don't want to share my data, which were generated under an NIH grant. Can I be forced to do so?
When the PI and the authorized institutional official sign the face page of an NIH application, they are assuring compliance with policies and regulations governing research awards. NIH expects grantees to follow these rules and to conduct the work described in the application. Thus, if an application describes a data sharing plan, NIH expects that plan to be enacted. In some instances, for example, NIH may make data sharing a term and condition of award.
Under specific circumstances, your data also may be accessible through the Freedom of Information Act (FOIA). If your competitive grant was awarded after April 17, 2000 and if your data were cited in a Federal regulation or administrative order, then your data may also be accessible through FOIA. (See http://grants.nih.gov/grants/policy/a110/a110_guidance_dec1999.htm).
Will the data-sharing plan affect the priority score of my application?
No. Reviewers will not factor the proposed data-sharing plan into the determination of scientific merit or priority score. Program staff is responsible for overseeing the data-sharing policy and for assessing the appropriateness and adequacy of the proposed data-sharing plan. Program concerns must be resolved prior to making any award.
My research, which seeks support from both the public and private sectors, will involve proprietary data. How do I deal with the data-sharing issue in my application?
NIH recognizes that there may be circumstances where a cofunder has requested restrictions on data sharing as a condition of funding. These restrictions should be identified in the application and a proposal made about how data from the cofunded project will be shared. Should you believe that you are unable to share any of the data, your justification will be considered by NIH program staff.
I'm a busy investigator. I don't have time to process requests for my data. What should I do?
In addition to publishing small datasets, there are several alternatives to responding to each separate request to share data (e.g., putting data in an archive or restricted access facility, and setting up a web site for data access). Archives and data enclaves provide technical assistance for users with questions or problems and may spare busy investigators time.
Can I share data with colleagues under my own auspices?
Yes. Your data-sharing plans should indicate the criteria for deciding who can receive your data and whether or not you will place any conditions on their use. Data should be made as widely and freely available as possible while safeguarding the confidentiality of the data and privacy of participants. You should not place limits on the questions or methods others might pursue nor should you require co-authorship as a condition for receiving the data.
Should the data source be cited or acknowledged in papers that rely on shared data?
It is appropriate to acknowledge the source of data upon which a manuscript is based. Many investigators include this information in the methods and/or reference sections of their manuscripts. Journals generally include an acknowledgement section, in which the authors can recognize people who helped them gain access to the data. However, you should check the policies of the journal to which you plan to submit.
Should I consider contributing my research data to a data archive?
Maybe. Archives are organizations that collect and distribute data. They understand what is needed to prepare data for wider distribution and documentation for users. They provide stable, reliable, and cost-effective means for distributing data. They also provide protections for the dataset and technical assistance for requestors.
Where can I find guidance on preparing data for sharing and archiving?
Guidance is available from a variety of sources. For example, the Inter-University Consortium for Political and Social Research at the University of Michigan has prepared an excellent set of guidelines for preparing data for archiving. While these guidelines were written with social science data in mind, they are broadly applicable. See http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf
For molecular biology information, the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM) at the National Institutes of Health, is ready to assist researchers who have genome-specific and molecular data to submit. For more information about submitting and accessing NCBI data, see the NCBI Website at http://www.ncbi.nlm.nih.gov/Genbank/index.html.
How do I pay for preparing data for sharing and archiving?
NIH recognizes that it takes time and money to prepare data for sharing. You can request funds for data archiving and sharing as part of your grant application for collecting the data. If you have already collected the data, you may want to ask your NIH Project Officer about a competitive or administrative supplement. NIH recommends that you consider procedures and costs for data sharing during the application process rather than after the data have been collected.
Should I address data sharing in my NIH application?
Yes. By the October 1, 2003 application receipt date, NIH requests that all extramural applicants seeking $500,000 or more in direct costs in any one year provide a data-sharing plan in their applications.
What do I need to include in my application and where do I put the information about data sharing?
Scientists submitting grant, cooperative, or contract applications should include a data-sharing plan, or provide justification for the absence of such a plan, in a brief paragraph to be placed immediately after the Research Plan Section (i.e., immediately after PHS 398 Section I. Letters of Support in the Research Plan Section of their application) so it does not count toward the application page limit. Additional information on data sharing might be included in other sections of the application, as appropriate. For example, if you are producing a large dataset that will become an important resource for the scientific community, you probably want to mention this in the significance section. If you are requesting funds to prepare, document, and archive the data, you would want to include relevant information in the budget and budget justification sections. In the Human Subjects section of the application, you should discuss the potential risks to research participants posed by data sharing and steps you will take to address those risks.
The informed consent form for my recently completed study states explicitly that only my research team will see the data provided and that we will not share the data. Am I now expected to share it?
No, but if you plan to collect additional data from those subjects under a grant with a data-sharing plan, you should revise the consent procedure to be consistent with the data-sharing plan. In preparing and submitting a data-sharing plan during the application process, investigators should avoid developing or relying on consent processes that promise research participants not to share data with other researchers. Such promises should not be made routinely or without adequate justification described in the data-sharing plan.
How can I protect the privacy of my subjects?
It is the responsibility of the investigators, their IRB, and their institution to protect the rights of participants and the confidentiality of their data. Data should be redacted to strip all individual identifiers, and effective strategies should be adopted to minimize risk of disclosing a participant's identity. Options to protect privacy include: withholding part of the data, statistically altering the data in ways that will not compromise secondary analyses, requiring researchers who seek data to commit to protect privacy and confidentiality, and providing data access in a controlled site, sometimes referred to as a data enclave. Some investigators use hybrid methods, releasing a redacted dataset for general use but providing access to more sensitive data through a user contract or data enclave. In most instances, sharing data is possible without compromising participant confidentiality and privacy.
Can institutions and investigators subject to the Federal Health Insurance Privacy and Portability Act (HIPAA) Privacy Rule share data in accord with the NIH Data Sharing policy?
Yes. NIH recognizes that data sharing may be complicated or limited, in some cases, by institutional policies or local IRB rules, as well as by local, state and Federal laws and regulations like the Privacy Rule. To protect the rights and privacy of people who participate in NIH-sponsored research, data intended for broader use should be free of identifiers that would permit linkages to individual research participants, and exclude variables that could lead to deductive disclosure of the identity of individual subjects. When data sharing is limited, applicants should explain such limitations in their data sharing plans.
I collect data on sensitive and, sometimes, illegal behaviors. Are these data too sensitive to be shared?
Not necessarily. The collection of sensitive data does not preclude sharing. For example, the National Center for Chronic Disease Prevention and Health Promotion at CDC operates the Youth Risk Behavior Surveillance System (YRBSS), available at http://www.cdc.gov/nccdphp/dash/yrbs/, which provides data on six health risk behaviors among youth: unintentional injuries and violence, tobacco use, alcohol and other drug use, sexual behaviors, dietary behaviors, and physical activity. Similarly, data from the National Survey of Family Growth, which includes statistical data on family life, marriage and divorce, contraception, sexual experience, pregnancy, and infertility, can be obtained from the National Center for Health Statistics.
Sensitive data can be shared so long as appropriate privacy safeguards are in place. Investigators must determine if and how the rights and privacy of the subjects can be protected. And investigators collecting data on sensitive and illegal behaviors should obtain a Certificate of Confidentiality (http://grants.nih.gov/grants/policy/coc/) to protect against the involuntary release of data that could identify research participants.
Can data from a clinical trial be shared?
It depends. Participants' privacy must be protected in accord with all applicable laws and regulations. Clinical trial datasets are frequently rich in items that could potentially identify individual subjects. For example, many early phase trials use small samples, which make it difficult to protect the privacy of the participants. Researchers who are planning clinical trials and intend to share the resulting data should think carefully about the study design, the informed consent documents, and the structure of the resulting data prior to the initiation of the study.
There are many precedents for sharing of clinical trial data. For example, data from a number of clinical trials supported by the National Heart, Lung, and Blood Institute (NHLBI) are available for research use (See http://www.nhlbi.nih.gov/resources/deca/directry.htm). The National Institute of Allergy and Infectious Diseases (NIAID) also lists their clinical trials datasets that they have made available through the National Technical Information Service (NTIS) for public use (See http://www.niaid.nih.gov/research/aidsdata.htm).
Is data on DNA and protein sequences archived?
Yes. For example, GenBank (http://www.ncbi.nih.gov/Genbank/) and Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) archive gene sequencing data. The sharing of materials, data, and software in a timely manner has been an essential element in the rapid progress that has been made in the genetic analysis of mammalian genomes.
I did not request support for sharing data in my application, which was funded. Can I charge requestors for the costs associated with sharing the data?
Yes, as long as such costs are reasonable and not excessive and reflect actual costs associated with complying with the request. These expenses for preparing and shipping the data might include costs of personnel, computing time, supplies, and other directly related expenses. NIH requirements for accountability for various types of income under NIH grants are specified elsewhere, see http://grants.nih.gov/grants/policy/nihgps_2003/NIHGPS_Part8.htm#_Toc54600138
I am working on a select pathogen and cannot share the data for reasons of national security. Is this an acceptable reason for not sharing?
If I am required to submit a revised data-sharing plan, what do I need to do?
As is the case with PIs who submit any additional or revised application material, your revised data-sharing plan must be signed by your institutional official and by you.
I want to request a dataset from a recent publication. How do I do this?
You should check the publication to see if reference is made to an archive, an enclave, or a Website where the data might be available. If no such information is provided, you may wish to send a letter to the PI to see if the data are available for sharing, and where you might be able to get the data and associated documentation.
I am a PI on a P30 center grant with a budget in excess of $500,000 (direct costs) in each year. Some of the research projects that collect survey data benefit by the infrastructure support provided by the P30 but these research projects are not funded by NIH. Am I still expected to share data from these research grants?
If any NIH support (i.e., partial support) is provided for resource development, even if those research resources were developed primarily with non-NIH funds, then those research resources must be shared in line with NIH policy as if NIH funded the entire project. It should be emphasized that although a data sharing plan is only required of grants awarding direct costs of $500,000 or more in any one year, data sharing itself (without a specific plan submission) continues to be a requirement of all NIH-funded grants. If the P30 maintains core resources that actually house and are the final repository of the data, e.g., a high throughput array analysis core, then any project using the center’s resources would be subject to the center’s data sharing plan.