part1

Cards (134)

  • Chemical informatics
    The application of information technology to chemistry
  • Chemometrics
    The application of statistical methods to chemical data in order to derive predictive models or descriptors
  • Chemoinformatics, chemometrics and chemical informatics are related but distinct fields of research
  • The term "chemoinformatics" was introduced in 1998 by Brown, who defined it as the combination of "all the information resources that a scientist needs to optimize the properties of a ligand to become a drug"
  • Chemoinformatics focuses on decision support by computer and drug discovery relevance, while chemical informatics lacks the specific drug discovery focus
  • Spectrum of chemoinformatics
    • Chemical data collection, analysis, and management
    • Data representation and communication
    • Molecular modeling and simulation
    • Structure-activity relationship analysis
    • Virtual screening and compound selection
    • Reaction informatics
  • As an interdisciplinary field, chemoinformatics involves computational scientists, chemists, and biologists
  • Sources that a scientist needs to optimize the properties of a ligand to become a drug
  • Chemoinformatics
    The application of information technology to chemistry, with a specific focus on drug discovery
  • Chemical informatics
    The application of information technology to chemistry, without a specific drug discovery focus
  • It is increasingly difficult to distinguish between chemoinformatics, chemical informatics, and chemometrics, particularly as far as method development is concerned
  • Spectrum of chemoinformatics
    • Chemical data collection, analysis, and management
    • Data representation and communication
    • Database design and organization
    • Chemical structure and property prediction (including drug-likeness)
    • Molecular similarity and diversity analysis
    • Compound or library design and optimization
    • Database mining
    • Compound classification and selection
    • Qualitative and quantitative structure-activity or – property relationships
    • Information theory applied to chemical problems
    • Statistical models and descriptors in chemistry
    • Prediction of in vivo compound characteristics
  • Chemoinformatics includes all concepts and methods designed to interface theoretical and experimental programs involving small molecules
  • The evolution of chemoinformatics as an independent discipline will much depend on its ability to demonstrate a measurable impact on experimental chemistry programs, regardless of whether these are in pharmaceutical research or elsewhere
  • Hierarchy of bio- and chemoinformatics research
    • DNA sequence
    • Molecular composition
    • Connectivity (graph)
    • Molecular similarity
    • Chemotype
    • Structure
    • Interaction
    • Specific activity
    • Drug
    • Protein sequence
    • Sequence similarity
    • Family
    • Structure
    • Interaction
    • Function
    • Intervention
  • Many algorithms and computational techniques used in chemoinformatics are also used for many applications in bioinformatics
  • Informatics research and development in the life sciences is expected to become much more global in the future
  • Scientific origins of chemoinformatics
    • Quantitative structure-activity relationship (QSAR) analysis
    • Chemical structure storage and retrieval
    • 2D substructure and 3D pharmacophore searching
    • Clustering methods for chemical applications
    • Molecular similarity analysis
    • Molecular diversity and dissimilarity analysis
  • Molecular descriptors

    Computational descriptors of molecular structure, physical or chemical properties, or pharmacophores
  • Chemical space
    1. dimensional reference space into which molecular data sets are projected for analysis or design
  • Types of molecular descriptors
    • Physical properties
    • Atom and bond counts
    • Pharmacophore features
    • Charge descriptors
    • Connectivity and shape descriptors
  • There are no generally preferred descriptor spaces for chemoinformatics applications and it is usually required to generate reference spaces for specific applications on a case-by-case basis
  • Similar Property Principle
    Molecules having similar structures and properties should also exhibit similar activity
  • Similarity coefficients
    • Tanimoto coefficient
    • Dice coefficient
    • Cosine coefficient
  • Molecular similarity, dissimilarity, and diversity
    • Similar molecules can be identified by application of distance functions and analysis of nearest neighbors in chemical space
    • Dissimilar molecules can be identified by maximizing the distance between them in chemical space
    • Molecular diversity refers to the overall spread of a compound collection in chemical space
  • Molecular similarity analysis
    The hallmarks of
  • Molecular similarity assessment
    1. Descriptor combinations expressed as bit strings (fingerprints)
    2. Test molecule assigned characteristic bit pattern
    3. Pair-wise molecular similarity quantified by overlap of bit strings using similarity metrics (coefficients)
  • Similarity metrics (coefficients)
    • Shown in Table 1.4
  • ni and nj
    Number of bits set on for molecules i and j, respectively
  • nij
    Number of bits in common to both molecules
  • Similarity coefficient values

    Range from zero (no overlap; no similarity) to one (complete overlap; identical or very similar molecules)
  • The most widely used metric in chemoinformatics is the Tanimoto coefficient
  • Molecular similarity
    Identified by application of distance functions and analysis of nearest neighbors in chemical space
  • Molecular diversity
    Attempts to either select different compounds from a given population or evenly populate a given chemical space with candidate molecules
  • Diversity selection and design
    Using distance functions to select compounds at least a pre-defined minimum distance away from others or maximize average inter-compound distances
  • Diversity selection and design
    1. Dividing descriptor axes into evenly spaced value intervals (binning) to produce n-dimensional subsections (cells) of chemical space
    2. Selecting a representative compound from each populated cell or populating cells as evenly as possible with computed molecules
  • Molecular diversity is a global concept, while molecular similarity analysis explores pair-wise relationships
  • Dissimilarity
    The inverse of molecular similarity, addressing which molecule in a collection is most dissimilar from a given compound or set of compounds
  • Dissimilarity-based compound selection
    1. Initially selecting a seed compound, then calculating dissimilarity between the seed and all others and selecting the most dissimilar one
    2. Repeating the process to obtain a subset of desired size
  • High-dimensional chemistry spaces might often be too complex for carrying out meaningful and interpretable analyses