DATA MANAGEMENT AND SHARING PLAN
Element 1: Data Type
A. Types and amount of scientific data expected to be generated in the project:
As detailed in the Research Strategy Section of Project 1 and Project 2, we propose the generation of three types of data for this project:
-
Ecological data of mosquito distribution and density for larval and adult malaria vectors from the 4 study sites. We expect several hundreds of larval habitats and human dwellings will be sampled every year.
-
Genomic data of mosquitoes and malaria parasites: Mitochondrial COI and Cyt b gene sequence for Anopheles stephensi mosquitoes; microsatellite marker genotype for mosquitoes and malaria parasites; Pfcpmp amplicon deep sequencing of malaria parasites; and antimalarial drug resistance marker and Pfhrp2/3 gene deletions. We plan to generate sequences of mitochondrial genes and microsatellite genotypes in >1,000 mosquitoes (depending on the number of An. stephensi positive sites), and genotype ~2,000 parasite isolates for microsatellites, antimalarial drug resistance marker and Pfhrp2/3 gene deletions.
-
Demographic, clinical, malaria infection data from health centers and hospitals under malaria passive case detection (PCD). We anticipate more than 10,000 PCD records will be obtained each year. Clinical and laboratory data will be captured by each site into the DFdiscover secure electronic data capture system (EDC). Each user will be given role specific access to the EDC, and access will be controlled by granting users individual usernames and passwords. Research-specific labs will be recorded into the Central lab’s database and merged with the final individual level data for the study.
B. Scientific data that will be preserved and shared, and the rationale for doing so:
-
Ecological data of mosquito distribution and density that will be preserved includes sampling date, location, vector species name, density, trapping methods, environmental variables recorded during field sampling (e.g., predators, land use and land cover), and intervention method used.
-
Genomic data of mosquitoes and malaria parasites: The genetic data will be deposited to GenBank or VectorBase, including species name, sampling location, date, marker name, environmental variable associated with mosquito collection. Genomic data sets will be provided using FASTQ, CRAM, and VCF formats.
-
Clinical data: Clinical data that will be preserved and shared are demographical data, insurance status, medical history, medications, lab tests performed by the clinical site and central laboratory, and physical exams data, among other data pertinent to the study. Clinical data from case report forms will be preserved and shared. Clinical data sets will be submitted to ClinEpiDB in .CSV format.
C. Metadata, other relevant data, and associated documentation:
The protocol, case report forms, data dictionary, and code book will be made accessible in data repositories where data are shared. For data submitted to ClinEpiDB, variable-level metadata will be provided using the ClinEpiDB Codebook, which is a templated data dictionary, and will include details of Common Data Elements, definitions, and standards used for data collection and sharing.
Element 2: Related Tools, Software and/or Code
Clinical and laboratory data will be collected in the electronic data capture system (DFdiscover) and analyzed using statistical packages in SAS. For genomic and ecological data analysis, we will use publicly available software. If new analytical methods are developed, open-source codes will be made available through GitHub repository.
Element 3: Standards:
Malaria data from PCD will be standardized to CDISC format whenever possible. Data will be collected in electronic format with ODK compliance installed on tablets. Shared data will be deidentified, and original data will be maintained at the investigator’s institution. We will also consult ClinEpiDB to make sure our data organization structure is consistent with their standards for convenient data sharing. Genomic data will confirm the online submission standards for GenBank or VectorBase.
Element 4: Data Preservation, Access, and Associated Timelines
​
A. Repository where scientific data and metadata will be archived:
Malaria clinical data from PCD will be deposited to ClinEpiDB. Genomic data will be deposited to GenBank or VectorBase.
B. How scientific data will be findable and identifiable:
Clinical and laboratory data will be findable and identifiable using a DOI created by ClinEpiDB. Genomic data will be findable and identifiable using study accession numbers and sequence record accession numbers generated by NCBI.
C. When and how long the scientific data will be made available:
The study team will begin to submit the clinical data 12 months after the project begins. The research community will have access to data as soon as ClinEpiDB, GenBank or VectorBase release them after approval. Data will be preserved within the repositories for at least three years following the completion of the grant.
Element 5: Access, Distribution, or Reuse Considerations
A. Factors affecting subsequent access, distribution, or reuse of scientific data:
Clinical data will be shared with controlled access in ClinEpiDB for general research use. All research participants will be consented for broad data sharing.
B. Whether access to scientific data will be controlled:
Access to clinical data and mosquito and parasite genomic data will be shared through restricted access for general research use (i.e., made available by a data repository only after approval). Clinical and genomic summary data can be shared through unrestricted access.
C. Protections for privacy, rights, and confidentiality of human research participants:
Informed consent documents used for the proposed research will include explicit language to inform the participants that residual specimens, including DNA and plasma, may be stored in a biorepository for other scientific inbesgtaot5ors. The informed consent will contain language permitting secondary use with broad data sharing under controlled access with general research use restrictions in ClinEpiDB. Patients will not be contacted or re-consented for future sharing or accessing data through repositories.
The ClinEpiDB data dictionaries and the ICEMR do not permit personally identifiable information to be shared. Data will be de-identified by removing all HIPAA identifiers prior to sharing.
Element 6: Oversight of Data Management and Sharing:
Data will be submitted by a project data manager from the PI’s project team. The data manager will oversee data collection, analysis, storage, and sharing. Compliance with the plan will be monitored by the PI routinely. The PI will conduct monthly meetings with key study personnel to ensure the timeliness of data entry and will review data to ensure quality of data entry. The PI will ensure data are submitted and shared according to this DMSP.
Validation Schedule:
Clinical data will be harmonized and validated by data manager monthly.