Tutorial 2: Data collection

Cards (42)

  • Quantitative data
    Data used when a researcher is trying to quantify a problem, or address the 'what' or 'how many' aspects of a research question
  • Quantitative data
    • Can be counted and, therefore, easily conveyed via charts and graphs
    • Easier for other researchers to verify the conclusions of quantitative data analysis
  • Common quantitative collection methods
    • Experiments
    • Systematic observations (e.g. lab results)
    • Number-based questions on surveys (e.g. exercise minutes per week)
    • Number-based questions in interviews
  • Qualitative data

    Data that describes qualitaties of characteristics
  • Qualitative data

    • Collected using questionnaires, interviews, or observation
    • Frequently appears in narrative form
  • Coding
    Allows the researcher to categorize qualitative data to identify themes that correspond with the research questions and to perform qualitative analysis
  • Common qualitative collection methods
    • Individual interviews
    • Focus groups
    • Observations (e.g. watching how many people interact with an app)
    • Open-ended questions on surveys
  • Types of research data
    • Observational data
    • Experimental data
    • Simulation data
    • Derived/compiled data
  • Observational data

    • Are captured through observation of a behavior or activity
    • Collected by using methods such as human observation, open-ended surveys, wearable sensor to monitor psychological parameters of patients
    • Because observational data is captured in real time, it would be very difficult or impossible to recreate if lost
  • Experimental data
    • Are collected through active intervention by the researcher to produce and measure change or to create difference when a variable is altered
    • Typically allows the researcher to determine a causal relationship and is typically projectable to a larger population
    • Often reproducible, but it often can be expensive to do so
  • Simulation data

    • Are generated by imitating the operation of a real-world process or system over time using computer test models
    • Used to try to determine what would, or could, happen under certain conditions
    • The test model is often as, or even more, important than the data generated from the simulation
  • Derived/compiled data
    • Involves using existing data points, often from different data sources, to create new data through some sort of transformation
    • Can usually be replaced if lost, but may be very time-consuming (and expensive) to do so
  • Steps in creating a data planning framework
    • Identify your data needs
    • Collect data
    • Documenting the process
    • Organize data
    • Storage and backup
    • Access for project collaborator
    • Preserve and share
  • Collecting the data
    • When gathering data, it's essential to create a plan detailing the reasons, timing, and methods for collection
    • Various approaches exist, such as lab experiments, observations, interviews, or focus groups
    • Planning ensures that collected data aligns with research objectives, enhancing accuracy and integrity by providing control and insight into the collection process
  • Using existing data
    • Utilizing existing data can speed up research significantly, but it may not perfectly match your needs
    • Evaluation of data quality and documentation is essential to ensure suitability for your research
  • Combination of data collection and data re-use
    • Combining strategies can lessen shortcomings in existing data
    • When merging datasets, ensure consistency in variables, especially across different sources with varying measurement standards, such as temperature (Fahrenheit vs. Celsius)
  • Potential data sources
    • Other researchers
    • Government data
    • Organization data
    • Data repositories
  • Data repositories
    • Are curated spaces for storing research data
    • Contributors may include individual researchers, organizations, and government agencies
    • Benefits: data are findable, reusable, citable, and preserved
  • Keeping data organized
    • Proper data organization makes it easier to use research data
    • Folder structure should be logical to facilitate access to files
    • File and folder naming should be consistent, descriptive, and concise
    • Version control is important to track changes
  • Data stewardship
    All activities required to ensure that digital research data are findable, accessible, interoperable, and reusable (FAIR) in the long term, including data management, archiving, and reuse by third parties
  • FAIR
    • Findable: The data should be uniquely and persistently identifiable and other researchers should be able to find the data
    • Accessible: The conditions under which the data can be used should be clear to humans and computers
    • Interoperable: Data should be machine-readable and use terminologies, vocabularies, or ontologies that are commonly used in the field
    • Reusable: Data should be compliant with the above and sufficiently well described with metadata and provenance information so that the data sources can be linked or integrated with other data sources and enable proper citation
  • Responsibilities as a clinical researcher
    • The formal responsibility for personal data lies with your research institute, which is accountable for having adequate policies, facilities, and expertise around data stewardship
    • Decisions on data stewardship will affect how you can process, analyse, preserve, and share your research data in the future
  • Steps in preparing a study

    • Study design and registration
    • Re-using existing data
    • Collaborating with patients
    • Creating a data management plan (DMP)
    • Describing the operational workflow
  • Statistical analysis plans are obligatory for RCT's. It is preferable to create this plan before collecting data because this facilitates proper study design.
  • Data Options Overview

    • Primary data
    • Secondary data
  • Primary Data
    Collected directly for research, often through interviews or surveys
  • Secondary Data

    Gathered for other purposes but repurposed for research
  • Secondary Data Sources
    • Electronic health records (EHRs)
    • Paper-based records
    • Administrative data
    • Pharmacy data
    • Regulatory data
    • Repurposed Trial Data
  • Registries
    Systematic data collections with various purposes (e.g., disease-specific, exposure-specific)
  • EHR Data
    Captures clinical encounter details but lacks standardization and may be fragmented across different facilities
  • Paper-Based Records
    Valuable for patient-reported information and data validation, especially when electronic records are unavailable
  • Administrative Data
    Generated for insurance reimbursement, coded using ICD and CPT systems, with attention to coding validity
  • Pharmacy Data

    Includes outpatient prescription claims and dispensing records, useful for medication adherence and cost studies
  • Regulatory Data
    FDA stores data from regulatory submissions, being increasingly converted for research use, though mainly from efficacy trials
  • Repurposed Trial Data
    Data collected for clinical research, available upon request for comparative effectiveness research (CER) studies
  • Data Planning Framework
    1. Identify data needs
    2. Collect data
    3. Document the process
    4. Organize data
    5. Storage and backup
    6. Access for collaborators
    7. Preserve and share data long-term
  • Quantitative Data
    Numbers and measurements, like counting how many people have a certain condition
  • Qualitative Data

    Descriptions and qualities, like people's feelings or experiences
  • Data stewardship is like taking care of valuable information in research. It's making sure this information stays easy to find, use, and share for a long time.
  • As a researcher, it's your job to look after this data from the start of your study to when you share it with others. This includes keeping it safe and following rules about privacy.