part1

    Cards (134)

    • Chemical informatics
      The application of information technology to chemistry
    • Chemometrics
      The application of statistical methods to chemical data in order to derive predictive models or descriptors
    • Chemoinformatics, chemometrics and chemical informatics are related but distinct fields of research
    • The term "chemoinformatics" was introduced in 1998 by Brown, who defined it as the combination of "all the information resources that a scientist needs to optimize the properties of a ligand to become a drug"
    • Chemoinformatics focuses on decision support by computer and drug discovery relevance, while chemical informatics lacks the specific drug discovery focus
    • Spectrum of chemoinformatics
      • Chemical data collection, analysis, and management
      • Data representation and communication
      • Molecular modeling and simulation
      • Structure-activity relationship analysis
      • Virtual screening and compound selection
      • Reaction informatics
    • As an interdisciplinary field, chemoinformatics involves computational scientists, chemists, and biologists
    • Sources that a scientist needs to optimize the properties of a ligand to become a drug
    • Chemoinformatics
      The application of information technology to chemistry, with a specific focus on drug discovery
    • Chemical informatics
      The application of information technology to chemistry, without a specific drug discovery focus
    • It is increasingly difficult to distinguish between chemoinformatics, chemical informatics, and chemometrics, particularly as far as method development is concerned
    • Spectrum of chemoinformatics
      • Chemical data collection, analysis, and management
      • Data representation and communication
      • Database design and organization
      • Chemical structure and property prediction (including drug-likeness)
      • Molecular similarity and diversity analysis
      • Compound or library design and optimization
      • Database mining
      • Compound classification and selection
      • Qualitative and quantitative structure-activity or – property relationships
      • Information theory applied to chemical problems
      • Statistical models and descriptors in chemistry
      • Prediction of in vivo compound characteristics
    • Chemoinformatics includes all concepts and methods designed to interface theoretical and experimental programs involving small molecules
    • The evolution of chemoinformatics as an independent discipline will much depend on its ability to demonstrate a measurable impact on experimental chemistry programs, regardless of whether these are in pharmaceutical research or elsewhere
    • Hierarchy of bio- and chemoinformatics research
      • DNA sequence
      • Molecular composition
      • Connectivity (graph)
      • Molecular similarity
      • Chemotype
      • Structure
      • Interaction
      • Specific activity
      • Drug
      • Protein sequence
      • Sequence similarity
      • Family
      • Structure
      • Interaction
      • Function
      • Intervention
    • Many algorithms and computational techniques used in chemoinformatics are also used for many applications in bioinformatics
    • Informatics research and development in the life sciences is expected to become much more global in the future
    • Scientific origins of chemoinformatics
      • Quantitative structure-activity relationship (QSAR) analysis
      • Chemical structure storage and retrieval
      • 2D substructure and 3D pharmacophore searching
      • Clustering methods for chemical applications
      • Molecular similarity analysis
      • Molecular diversity and dissimilarity analysis
    • Molecular descriptors

      Computational descriptors of molecular structure, physical or chemical properties, or pharmacophores
    • Chemical space
      1. dimensional reference space into which molecular data sets are projected for analysis or design
    • Types of molecular descriptors
      • Physical properties
      • Atom and bond counts
      • Pharmacophore features
      • Charge descriptors
      • Connectivity and shape descriptors
    • There are no generally preferred descriptor spaces for chemoinformatics applications and it is usually required to generate reference spaces for specific applications on a case-by-case basis
    • Similar Property Principle
      Molecules having similar structures and properties should also exhibit similar activity
    • Similarity coefficients
      • Tanimoto coefficient
      • Dice coefficient
      • Cosine coefficient
    • Molecular similarity, dissimilarity, and diversity
      • Similar molecules can be identified by application of distance functions and analysis of nearest neighbors in chemical space
      • Dissimilar molecules can be identified by maximizing the distance between them in chemical space
      • Molecular diversity refers to the overall spread of a compound collection in chemical space
    • Molecular similarity analysis
      The hallmarks of
    • Molecular similarity assessment
      1. Descriptor combinations expressed as bit strings (fingerprints)
      2. Test molecule assigned characteristic bit pattern
      3. Pair-wise molecular similarity quantified by overlap of bit strings using similarity metrics (coefficients)
    • Similarity metrics (coefficients)
      • Shown in Table 1.4
    • ni and nj
      Number of bits set on for molecules i and j, respectively
    • nij
      Number of bits in common to both molecules
    • Similarity coefficient values

      Range from zero (no overlap; no similarity) to one (complete overlap; identical or very similar molecules)
    • The most widely used metric in chemoinformatics is the Tanimoto coefficient
    • Molecular similarity
      Identified by application of distance functions and analysis of nearest neighbors in chemical space
    • Molecular diversity
      Attempts to either select different compounds from a given population or evenly populate a given chemical space with candidate molecules
    • Diversity selection and design
      Using distance functions to select compounds at least a pre-defined minimum distance away from others or maximize average inter-compound distances
    • Diversity selection and design
      1. Dividing descriptor axes into evenly spaced value intervals (binning) to produce n-dimensional subsections (cells) of chemical space
      2. Selecting a representative compound from each populated cell or populating cells as evenly as possible with computed molecules
    • Molecular diversity is a global concept, while molecular similarity analysis explores pair-wise relationships
    • Dissimilarity
      The inverse of molecular similarity, addressing which molecule in a collection is most dissimilar from a given compound or set of compounds
    • Dissimilarity-based compound selection
      1. Initially selecting a seed compound, then calculating dissimilarity between the seed and all others and selecting the most dissimilar one
      2. Repeating the process to obtain a subset of desired size
    • High-dimensional chemistry spaces might often be too complex for carrying out meaningful and interpretable analyses
    See similar decks