DATA MANAGEMENT AND SHARING PLAN

If any of the proposed research in the application involves the generation of scientific data, this application is subject to the NIH Policy for Data Management and Sharing and requires submission of a Data Management and Sharing Plan. If the proposed research in the application will generate large-scale genomic data, the Genomic Data Sharing Policy also applies and should be addressed in this Plan. Refer to the detailed instructions in the application guide for developing this plan as well as to additional guidance on sharing.nih.gov. The Plan is recommended not to exceed two pages. Text in italics should be deleted. There is no “form page” for the Data Management and Sharing Plan. The DMS Plan may be provided in the format shown below.
Public reporting burden for this collection of information is estimated to average 2 hours per response, including the time for reviewing instructions, searching existing data sources, gathering, and maintaining the data needed, and completing and reviewing the collection of information. An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to: NIH, Project Clearance Branch, 6705 Rockledge Drive, MSC 7974, Bethesda, MD 20892-7974, ATTN: PRA (0925-0001 and 0925-0002). Do not return the completed form to this address.
Element 1: Data Type
A. Types and amount of scientific data expected to be generated in the project:
Due to the multidisciplinary nature of this ICEMR, multi-format data from various sources, such as environmental sensors, genomics, GIS, remote sensing, entomological survey, and structured demographic and clinical information, will be captured electronically through the proposed data management system (DMS). In addition, this study will provide site-specific representative cross-sectional data on a target sample of 2,500 participants within the four study sites in Thailand and Malaysia (should match sample calculation).
We expect to generate the following data file types and formats during this project: tabular (.CSV), portable document format (PDF), images (.TIFF), and ESRI shapefile (.SHP). The total size of the data collected is projected to be 300 GB.
B. Scientific data that will be preserved and shared, and the rationale for doing so:
The raw data will be stored within this project's proposed data management system. The cleaned, item-level data for all variables will be shared openly through selected data repositories, along with example quantifications and transformations from initial raw data. All personally identifiable information will be removed before sharing. Final files used to generate specific analyses to answer the Specific Aims and related results will also be shared. The rationale for sharing only cleaned data is to foster ease of data reuse.
C. Metadata, other relevant data, and associated documentation:
To facilitate the interpretation and reuse of the data, the survey questionnaire, data dictionary, collection protocol, and other associated documentation will be generated and deposited into a repository along with all shared datasets. The data dictionary will define and describe all variables in the dataset and how they link to the survey questionnaire. The protocol will include study design, method description, and tools. Documentation will be provided in PDF format.
Element 2: Related Tools, Software and/or Code:
All scientific data will be exported from the system database and shared in .CSV or .PDF format, which can be accessed or manipulated without requiring specialized tools. However, imaging or geospatial data will be made available in .TIFF or .SHP format, which involves using GIS tools, such as ArcGIS Pro (proprietary, esri.com) or QGIS (free, qgis.org), to be accessed and manipulated.
Element 3: Standards:
Formal standards data format for malaria research data have not yet been widely adopted. However, several standards from medical and health science have been implemented. Whenever possible, we will adopt the NIH-Endorsed Common Data Elements (CDEs) standard to structure and organize our data. Terminologies and variables not defined by CDEs will be determined with the best practices based on community agreements, general medical standards, and previous experiences.
Element 4: Data Preservation, Access, and Associated Timelines
A. Repository where scientific data and metadata will be archived:
Public use and restricted access to all scientific data and associated documentation described above in the “data to be shared” section will be made available and deposited into VEuPathDB (veupathdb.org), GenBank (ncbi.nlm.nih.gov/genbank), or Dryad (datadryad.org) based on the appropriate domain.
B. How scientific data will be findable and identifiable:
The repositories mentioned in the above section provide metadata, Persistent Unique Identifiers (PIDs) or Digital Object Identifier (DOI), and long-term access. These repositories are supported by NIH, and shared scientific datasets are available under an Open Data License. All datasets will be indexed by the Thomson-Reuters Data Citation Index, Scopus, and Google Dataset Search so the global research community can access the underlying data.
C. When and how long the scientific data will be made available:
All scientific data generated from this project will be made available after cleaning and quality assurance or at the time of acceptance of the initial publication, whichever occurs first. In general, genomic data will be released within three months after data has been acquired. All scientific data will be accessible for a minimum of 3 years following the closeout of this grant.
Element 5: Access, Distribution, or Reuse Considerations
A. Factors affecting subsequent access, distribution, or reuse of scientific data:
Participants will consent to the broad sharing of de-identified data. De-identified data will be made available for reuse through the repositories mentioned before. In addition, scientific datasets are available under an open data license.
B. Whether access to scientific data will be controlled:
Given the sensitive nature of the dataset, the de-identified data will be made freely available for public use in the data repositories, as mentioned above. Users who download and reuse the data agree to adhere to the open data license. In addition, public data users must register with the data repository and agree to the Terms of Use, which are designed to protect study participants by limiting data use to scientific research and statistical analysis, not to investigate or identify specific study participants.
C. Protections for privacy, rights, and confidentiality of human research participants:
To ensure participant consent for data sharing, IRB paperwork and informed consent documents will include language describing plans for data management and sharing of data, describing the motivation for sharing, and explaining that personal identifying information will be removed. When the ICEMR-related datasets are stored in public repositories, additional safeguards will be taken to eliminate personal identifiers and PHI and conceal specific dates and location information. All human subject data handling will comply with NIH Clinical Terms of Award, US. HIPAA Privacy and Security Rule and IRB-approved protocol guidelines.
Element 6: Oversight of Data Management and Sharing
Project Director and Data Management Core Director will oversee lab/team data management activities and sharing. In addition, the PD, Core Director, and data management team will meet monthly to handle broader issues of DMS Plan compliance oversight and reporting as part of the general stewardship, reporting, and compliance processes.