T_Model Cards

Cards (43)

  • Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment
  • In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics
  • Model cards
    Short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains
  • Model cards
    Also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information
  • While the focus is primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model
  • Two supervised models

    • One trained to detect smiling faces in images, and one trained to detect toxic comments in text
  • Model cards are proposed as a step towards the responsible democratization of machine learning and related artificial intelligence technology, increasing transparency into how well artificial intelligence technology works
  • The goal is to encourage those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation
  • Not only does this practice improve model understanding and help to standardize decision making processes for invested stakeholders, but it also encourages forward-looking model analysis techniques.
  • Slicing the evaluation across groups functions to highlight errors that may fall disproportionately on some groups of people, and accords with many recent notions of mathematical fairness.
  • Including group analysis as part of the reporting procedure prepares stakeholders to begin to gauge the fairness and inclusion of future outcomes of the machine learning system.
  • Model reporting is an approach for responsible transparent and accountable practices in machine learning.
  • If model card reporting becomes standard, potential users can compare and contrast different models in a well-informed way.
  • Future research could include creating robust evaluation datasets and protocols for the types of disaggregated evaluation we advocate for in this work, for example, by including differential privacy mechanisms so that individuals in the testing set cannot be uniquely identified by their characteristics.
  • Model card
    Discloses information about a trained machine learning model, including how it was built, what assumptions were made during its development, what type of model behavior different cultural, demographic, or phenotypic population groups may experience, and an evaluation of how well the model performs with respect to those groups
  • Model card sections
    • Model Details
    • Intended Use
    • Factors
    • Metrics
    • Evaluation Data
    • Training Data
    • Quantitative Analyses
    • Ethical Considerations
    • Caveats and Recommendations
  • Model Details
    Basic information about the model, including the person or organization developing it, model date, version, type, training algorithms, parameters, fairness constraints, features, paper/resource for more information, citation details, license, and where to send questions or comments
  • Intended Use
    Use cases that were envisioned during development, including primary intended uses, primary intended users, and out-of-scope use cases
  • Factors
    Demographic or phenotypic groups, environmental conditions, technical attributes, or others that could impact model performance
  • Relevant factors

    Foreseeable salient factors for which model performance may vary, and how they were determined
  • Evaluation factors

    Factors being reported and why they were chosen
  • Metrics
    Measures of model performance, decision thresholds, and approaches to uncertainty and variability
  • Model performance measures
    Measures of model performance that were selected and why
  • Decision thresholds
    If used, what they are and why they were chosen
  • Variation approaches
    How measurements and estimations of metrics are calculated, e.g. standard deviation, variance, confidence intervals, KL divergence
  • Evaluation Data

    Details on the dataset(s) used for the quantitative analyses, including the datasets, motivation, and preprocessing
  • Training Data
    Details on the dataset(s) used for training the model, mirroring the Evaluation Data section if possible, or providing minimal allowable information on the distribution over various factors
  • Quantitative Analyses
    Unitary and intersectional results
  • Ethical Considerations

    Considerations of the ethical implications of the model
  • Caveats and Recommendations
    Limitations of the model and recommendations for its use
  • When analysing markets, a range of assumptions are made about the rationality of economic agents involved in the transactions
  • The Wealth of Nations was written
    1776
  • Rational
    (in classical economic theory) economic agents are able to consider the outcome of their choices and recognise the net benefits of each one
  • Consumers act rationally by

    Maximising their utility
  • Producers act rationally by

    Selling goods/services in a way that maximises their profits
  • Workers act rationally by

    Balancing welfare at work with consideration of both pay and benefits
  • Governments act rationally by

    Placing the interests of the people they serve first in order to maximise their welfare
  • Groups assumed to act rationally
    • Consumers
    • Producers
    • Workers
    • Governments
  • Rationality in classical economic theory is a flawed assumption as people usually don't act rationally
  • Marginal utility

    The additional utility (satisfaction) gained from the consumption of an additional product