XAI-lecture 3

Cards (62)

  • Univ.-Prof. Dr. Marc Streit is a lecturer on Explainable AI.
  • Explainable AI aims to explain the decisions of complex algorithms and machine learning models.
  • Explaining algorithms involves storytelling principles such as author-driven vs. reader-driven, Martini glass structure, interactive slideshow, and drill-down story.
  • Explaining algorithms can involve topics like sorting algorithms, clustering algorithms, Bayes’ theorem, and decisions trees.
  • Dimensionality reduction is a technique used to transform high-dimensional data into a space with fewer dimensions, such as 2D or 3D.
  • Dimensionality reduction techniques include PCA, UMAP, and t-SNE.
  • Steinparz et al. created an embedding of all books on Wikipedia.
  • Embedding-based trajectories are a type of dimensionality reduction technique.
  • Rauber et al. explored the embedding of all books on Wikipedia.
  • Visualization of embeddings is a crucial aspect of dimensionality reduction techniques.
  • ChemInformatics Model Explorer (CIME): Exploratory Analysis of Chemical Model Explanations
  • Hinterreiter et al. created an embedding of all books on Wikipedia.
  • The goal of dimensionality reduction is to reveal patterns and clusters of similar or dissimilar data.
  • Dimensionality reduction is used in various domains, including document categorization, protein disorder prediction, drug discovery, and machine learning model debugging.
  • The disadvantages of dimensionality reduction include hard to preserve semantics of single dimensions, hard to understand and interpret, and error not visible, which can inspire false confidence.
  • Embedding of all books on Wikipedia is an example of projecting data from nD to (1D)/2D/3D.
  • The goal of projecting data from nD to (1D)/2D/3D is to reveal patterns and clusters of similar or dissimilar data.
  • Various dimensionality reduction techniques and algorithms exist, each with their own strengths and weaknesses.
  • A dataset can consist of images, words, and other high-dimensional vectors.
  • Projecting to 1D Space means arranging artworks according to their average pixel brightness.
  • Projecting to 2D Space is based on image brightness, giving the pieces more room to spread out.
  • Depth cues enable us to perceive 2D images as 3D objects.
  • Vis Excursion: Be Careful with 3D!
  • t-Distributed Stochastic Neighbor Embedding (t-SNE) produces highly clustered, visually striking embeddings, captures local structure well, and is non-linear, but it may lose the global structure in favor of preserving local distances, is more computationally expensive, requires setting hyperparameters that influence the quality of the embedding, and is a non-deterministic algorithm.
  • Shape perception is a benefit of representing data points or items in projections.
  • Dimensionality reduction techniques include linear approaches such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS), and non-linear approaches like t-distributed stochastic neighbor embedding (t-SNE), Uniform manifold approximation and projection (UMAP), and Self-Organizing Maps (SOM).
  • The cons of PCA include that it is a linear reduction that limits the information that can be captured, and it may not be as discriminably clustered as other algorithms.
  • Embeddings can be useful, but it's important to be careful when interpreting patterns as those hyperparameters really matter, cluster sizes in a t-SNE plot mean nothing, distances between clusters might not mean anything, and random noise doesn't always look random.
  • Categorical attributes or features can be represented as color (hue) and shape.
  • Principal Component Analysis (PCA) has the pros of being relatively computationally cheap, saving the embedding model to project new data points into the reduced space, and can be used to cluster data.
  • Some other item representation can be an image or a glyph.
  • Ordered attributes or features can be represented as size and color (brightness/saturation).
  • Rauber et al. used UMAP projections to visualize the inter-epoch evolution of a neural network.
  • Sainburg et al. (2020) proposed a method for embedding neural network embeddings using UMAP.
  • The disadvantages of UMAP include the requirement to set hyperparameters that influence the quality of the embedding and its non-deterministic algorithm.
  • The Time Curve Signature of a surveillance video of a street shows outliers as passing pedestrians.
  • Hinterreiter et al. (2020) also proposed a method for projecting neural network training using UMAP projections.
  • Hinterreiter et al. (2020) proposed a Projection Space Explorer Tool, which uses UMAP projections to visualize data.
  • Moritz Schöfl et al. used UMAP projections in a BSc thesis to solve sorting algorithms.
  • Bach et al. (2015) also proposed the Time Curve Signatures of Different Wikipedia Articles, which show loops as reverts and oscillations as edit wars.