Data used when a researcher is trying to quantify a problem, or address the 'what' or 'how many' aspects of a research question
Quantitative data
Can be counted and, therefore, easily conveyed via charts and graphs
Easier for other researchers to verify the conclusions of quantitative data analysis
Common quantitative collection methods
Experiments
Systematic observations (e.g. lab results)
Number-based questions on surveys (e.g. exercise minutes per week)
Number-based questions in interviews
Qualitative data
Data that describes qualitaties of characteristics
Qualitative data
Collected using questionnaires, interviews, or observation
Frequently appears in narrative form
Coding
Allows the researcher to categorize qualitative data to identify themes that correspond with the research questions and to perform qualitative analysis
Common qualitative collection methods
Individual interviews
Focus groups
Observations (e.g. watching how many people interact with an app)
Open-ended questions on surveys
Types of research data
Observational data
Experimental data
Simulation data
Derived/compiled data
Observational data
Are captured through observation of a behavior or activity
Collected by using methods such as human observation, open-ended surveys, wearable sensor to monitor psychological parameters of patients
Because observational data is captured in real time, it would be very difficult or impossible to recreate if lost
Experimental data
Are collected through active intervention by the researcher to produce and measure change or to create difference when a variable is altered
Typically allows the researcher to determine a causal relationship and is typically projectable to a larger population
Often reproducible, but it often can be expensive to do so
Simulation data
Are generated by imitating the operation of a real-world process or system over time using computer test models
Used to try to determine what would, or could, happen under certain conditions
The test model is often as, or even more, important than the data generated from the simulation
Derived/compiled data
Involves using existing data points, often from different data sources, to create new data through some sort of transformation
Can usually be replaced if lost, but may be very time-consuming (and expensive) to do so
Steps in creating a data planning framework
Identify your data needs
Collect data
Documenting the process
Organize data
Storage and backup
Access for project collaborator
Preserve and share
Collecting the data
When gathering data, it's essential to create a plan detailing the reasons, timing, and methods for collection
Various approaches exist, such as lab experiments, observations, interviews, or focus groups
Planning ensures that collected data aligns with research objectives, enhancing accuracy and integrity by providing control and insight into the collection process
Using existing data
Utilizing existing data can speed up research significantly, but it may not perfectly match your needs
Evaluation of data quality and documentation is essential to ensure suitability for your research
Combination of data collection and data re-use
Combining strategies can lessen shortcomings in existing data
When merging datasets, ensure consistency in variables, especially across different sources with varying measurement standards, such as temperature (Fahrenheit vs. Celsius)
Potential data sources
Other researchers
Government data
Organization data
Data repositories
Data repositories
Are curated spaces for storing research data
Contributors may include individual researchers, organizations, and government agencies
Benefits: data are findable, reusable, citable, and preserved
Keeping data organized
Proper data organization makes it easier to use research data
Folder structure should be logical to facilitate access to files
File and folder naming should be consistent, descriptive, and concise
Version control is important to track changes
Data stewardship
All activities required to ensure that digital research data are findable, accessible, interoperable, and reusable (FAIR) in the long term, including data management, archiving, and reuse by third parties
FAIR
Findable: The data should be uniquely and persistently identifiable and other researchers should be able to find the data
Accessible: The conditions under which the data can be used should be clear to humans and computers
Interoperable: Data should be machine-readable and use terminologies, vocabularies, or ontologies that are commonly used in the field
Reusable: Data should be compliant with the above and sufficiently well described with metadata and provenance information so that the data sources can be linked or integrated with other data sources and enable proper citation
Responsibilities as a clinical researcher
The formal responsibility for personal data lies with your research institute, which is accountable for having adequate policies, facilities, and expertise around data stewardship
Decisions on data stewardship will affect how you can process, analyse, preserve, and share your research data in the future
Steps in preparing a study
Study design and registration
Re-using existing data
Collaborating with patients
Creating a data management plan (DMP)
Describing the operational workflow
Statistical analysis plans are obligatory for RCT's. It is preferable to create this plan before collecting data because this facilitates proper study design.
Data Options Overview
Primary data
Secondary data
Primary Data
Collected directly for research, often through interviews or surveys
Secondary Data
Gathered for other purposes but repurposed for research
Secondary Data Sources
Electronic health records (EHRs)
Paper-based records
Administrative data
Pharmacy data
Regulatory data
Repurposed Trial Data
Registries
Systematic data collections with various purposes (e.g., disease-specific, exposure-specific)
EHR Data
Captures clinical encounter details but lacks standardization and may be fragmented across different facilities
Paper-Based Records
Valuable for patient-reported information and data validation, especially when electronic records are unavailable
Administrative Data
Generated for insurance reimbursement, coded using ICD and CPT systems, with attention to coding validity
Pharmacy Data
Includes outpatient prescription claims and dispensing records, useful for medication adherence and cost studies
Regulatory Data
FDA stores data from regulatory submissions, being increasingly converted for research use, though mainly from efficacy trials
Repurposed Trial Data
Data collected for clinical research, available upon request for comparative effectiveness research (CER) studies
Data Planning Framework
1. Identify data needs
2. Collect data
3. Document the process
4. Organize data
5. Storage and backup
6. Access for collaborators
7. Preserve and share data long-term
Quantitative Data
Numbers and measurements, like counting how many people have a certain condition
Qualitative Data
Descriptions and qualities, like people's feelings or experiences
Data stewardship is like taking care of valuable information in research. It's making sure this information stays easy to find, use, and share for a longtime.
As a researcher, it's your job to look after this data from the start of your study to when you share it with others. This includes keeping it safe and following rules about privacy.