Monitoring

Cards (72)

  • Cloud Monitoring Architecture Overview:
    Data Collection Layer:
    • Collects metrics, logs, and traces from cloud-based systems.
    • Includes Google Cloud Services like GKE, GCE, App Engine, etc.
    Data Storage Layer:
    • Stores collected data and routes it to the visualization and analysis layer.
    • Utilizes Cloud Monitoring API for triaging and storing metrics.
    Data Analysis and Visualization Layer:
    • Analyzes data to identify problems and trends.
    • Features dashboards, uptime checks, alerting policies, and notifications.
  • Platform Monitoring:
    • Commonly used for gaining visibility into the performance of Google Cloud services.
    • System metrics are automatically collected without user efforts.
    Google Cloud Metrics:
    • Over 1,500 metrics collected across 100+ Google Cloud services.
    • Example: Compute Engine reports over 25 unique metrics for each virtual machine instance.
    Prometheus-Based Monitoring for GKE:
    • Some users prefer Prometheus-based monitoring.
    • Google Managed Prometheus (GMP) enables leveraging Prometheus data for GKE cluster and workload metrics.
  • Cloud Ops Agent for Compute Engine:
    • Recommended for Compute Engine.
    • Supports 30+ plugins for various open-source and ISP software.
    • Based on open telemetry standards.
    • Collects in-process metrics and third-party application metrics.
    Custom Applications and Ops Agent:
    • Custom applications can leverage OTEL client libraries for instrumenting code.
    • Ops agent collects custom metrics and integrates them into Cloud Monitoring.
  • Partner Products Integration:
    • Suggests using Datadog or New Relic for expanded software support.
    • Partner products can collect system metrics from Google Cloud using native API-based integrations.
    Cloud Monitoring and Bindplane Integration:
    • Provides a unified view for hybrid cloud.
    • Bindplane by Blue Medora integrates with over 150 popular data sources.
    Cost Considerations:
    • Bindplane metrics and logs are chargeable, similar to other Cloud monitoring charges.
  • Application Monitoring on GKE
  • Compute Engine Monitoring with OpsAgent
  • Hybrid Monitoring with BindPlane:
  • Cloud Ops Agent for Compute Engine (03:21):
    • In-Process and Third-Party Metrics Collection: The Cloud Ops Agent is recommended for applications or workloads in Compute Engine. It collects in-process metrics and extends support for capturing metrics from third-party applications running within VMs.
    • Plugin Support: With support for over 30 plugins, the Ops Agent accommodates a wide array of open-source and ISP software, offering a flexible approach to metric collection.
  • Ops Agent and OpenTelemetry Standards:
    • Open Telemetry Standards Basis: The Ops Agent is built on open telemetry standards, ensuring compatibility and interoperability with various systems and applications.
    • Custom Application Telemetry: Custom applications can leverage OTEL client libraries to instrument their code, generating the necessary telemetry data. The Ops Agent seamlessly collects and integrates these custom metrics into Cloud Monitoring.
    • Monitoring Settings: In Google Cloud, when accessing monitoring settings for a project, the metric scope typically encompasses only the project currently being viewed.
    • Metric Scope in Floating Projects:
    • Project as Metric Scope: Upon creating a Google Cloud project, it becomes a scoping project hosting a metric scope.
    • Scope Functions: It stores configurations for alerts, uptime checks, dashboards, and monitoring groups specifically tailored for that scope.
    • Project-Specific Monitoring:
    • Scope Creation per Project: Each project creates its own metric scope, managing its monitoring configuration independently.
  • Monitoring Relationship Options:
    • Centralized vs. Local Monitoring: A decision must be made on whether to centralize data storage and access or allow each project to monitor itself.
  • Local Monitoring:
    • Advantages:
    • Clear Separation: Projects are monitored individually, ensuring a clear and distinct separation.
    • Access Convenience: Access to development and monitoring resources is simplified when all are in the same place.
    • Automation Ease: Monitoring becomes a standard part of the project setup, facilitating automation.
    • Disadvantages:
    • Limited Visibility: For larger applications spanning multiple projects, there is limited visibility into overall application performance.
  • Centralized Monitoring:
    • Adding Projects to Scope: Multiple projects can be added to a single metric scope, providing visibility into monitoring data for all projects in that scope.
    • Production Deployment Recommendation:
    • Dedicated Project for Configuration: It is recommended to create a dedicated project to host monitoring configuration data. The metric scope of this project is then used to set up monitoring for other projects with actual resources.
  • Centralized Monitoring:
    • Advantages:
    • Single Pane of Glass: Provides a unified view for all related resources or projects through a single metric scope.
    • Environment Comparison: Facilitates easy comparison between non-production and production environments.
  • Centralized Monitoring
    • Disadvantages:
    • Access Challenges: Any user with IAM permissions to access Cloud Monitoring can view metrics for all environments, potentially impacting the separation of monitoring responsibilities.
    • Role Assignment Challenges: The role assigned to a person or project applies universally to all projects monitored by that metric scope, potentially diminishing the separation of responsibilities among different teams.
  • IAM Permissions and Metric Scope:
    • Access Control Implications: IAM permissions related to Cloud Monitoring apply universally to all projects monitored by the metric scope.
    • Role Assignment Impact: A role assigned to an individual or project applies equally to all projects within the metric scope.
  • Tool-Specific Considerations:
    • Metric Scope Exclusivity: Metric scope controls Google Cloud resources related to Cloud Monitoring exclusively.
    • Other Tools: Tools like Cloud Logging, Error Reporting, and Application Performance Management tools are project-based and operate independently of metric scope or monitoring IAM rules.
    • Time Series Basis:
    • General Concept: Monitoring data is recorded in time series, representing data points over time.
    • Components of Time Series:
    • Metric Field:
    • Metric Label and Type: Describes the metric with label values and type, specifying available labels and data representation.
    • Components of Time Series:
    • Resource Field:
    • Resource Label and Monitored Resource: Records label values and the specific resource from which data was collected.
    • Value Type Field:
    • Measurement Data Type: Specifies the data type for measurements, such as Boolean, 64-bit integer, double, or string.
    • Distribution Measurement: For distribution measurements, the value type is "Distribution."
    • Metric Kind and Value Interpretation:
    • Metric Kind: Describes the kind of metric data - gauge, delta, or cumulative.
    • Value Type Interpretation: Informs how to interpret values relative to each other.
  • Example of Time Series:
    • Complete Time Series Example:
    • Structure Overview: Presents a detailed example of a single time series covering a one-minute interval.
    • Fields Covered: Metric field, resource field, metric kind, value type, and points (array of timestamp and values).
  • Custom Dashboards and Monitoring Options:
    • Default Dashboards: Google Cloud auto-creates dashboards for Compute Engine VMs or GKE clusters based on resource types.
    • Dashboard Builder: Use Dashboard Builder to visualize specific application metrics, filter metrics based on requirements.
    • Ad Hoc Chart Creation: Initiate chart creation using Google's Metric Explorer, suitable for exploring data temporarily.
    • Metric Explorer Functions: Save charts to custom dashboards, share charts via URL, view chart configurations as JSON.
  • Metrics Explorer Interface:
    • Metrics Explorer Components:
    • Configuration Region: Pick metric and its options.
    • Chart Display: Displays the metric.
    • Display Panel: Configures axis, sets thresholds, and more.
    • Chart Widget Types and Analysis Modes: Widget type and analysis mode determine how the chart displays data.
  • Filtering and Grouping in Metrics Explorer:
    • Filtering:
    • Filter Criteria: Use filter criteria for resource group, name, resource label, and metric label.
    • Filtering Impact: Reduces data on the chart, improving signal-to-noise ratio.
  • Metrics Explorer Grouping and Alignment:
    • Combining Time Series: Combine time series by specifying grouping and a function.
    • Alignment: Align time series to regularize raw data in time, necessary for aggregation.
    • Alignment Period and Function: Set alignment period (default is one minute) and function (e.g., sum, mean).
    • Alignment Options Override: Override defaults using alignment options like alignment function and min alignment period.
    • Realigning on Chart Changes: Time series must be realigned when changing time intervals or zoom levels.
  • Legend Configuration:
    • Legend Overview:
    • Sorting and Configuration: Click legend column headers to sort. Legend columns are configurable.
    • Chart Widget Types and Analysis Modes:
    • Analysis Modes: Three modes - Standard, Stats, X-ray - affect how time series are displayed on the chart.
    • Threshold Option: Adds a horizontal line as a visual reference for a chosen threshold value.
  • Legend Alias and Customization:
    • Legend Alias Field: Customize descriptions for time series in the chart legend.
    • Default vs. Custom Descriptions: By default, descriptions are created from labels; use the Legend Alias field for customization.
    • Legend Templates: Enter plain text and templates, and add expressions for dynamic descriptions.
    • Building Legend Templates: Templates provide flexibility in constructing meaningful descriptions.
  • Cloud Monitoring Architecture:
  • Cloud Monitoring Architecture:
    1. Data Collection Layer:
    • Collects metrics, logs, and traces from cloud-based systems.
    • Includes Google Cloud services such as Google Kubernetes Engine, Google Compute Engine, App Engine, etc.
    1. Data Storage Layer:
    • Stores collected data and routes it to the configured visualization and analysis layer.
    • Includes the Cloud Monitoring API to triage collected metrics for further analysis.
  • Cloud Monitoring Architecture:
    1. Data Analysis and Visualization Layer:
    • Analyzes collected data to identify problems and trends.
    • Presents analyzed data in an easily understandable format.
    • Comprises various features within Cloud Monitoring, including:
    • Dashboards for visualizing data.
    • Uptime checks for monitoring applications.
    • Alerting policies for configuring alerts.
    • Notifications to alert of events that need attention.
  • One of the most common uses of Cloud Monitoring is platform monitoring Blackbox monitoring of the platform enables users to get visibility into the performance of their Google Cloud services. With Google Cloud, this is enabled by default and system metrics are automatically collected without any user effort. Google Cloud Monitoring is the recommended solution for Platform monitoring.
    • System metrics from Google Cloud are available at no cost to customers. These metrics provide information about how the service is operating. Over 1500 metrics across more than 100 Google Cloud services automatically. For example, Compute Engine reports over 25 unique metrics for each virtual machine (VM) instance.
    • If customers, e.g. in traditional enterprise cohorts, are using 3P products for monitoring and want to aggregate their Google Cloud metrics into those partner products, they can use Cloud Monitoring APIs to ingest these metrics.
  • Google Managed Prometheus (GMP). GMP is a part of Cloud Monitoring and it makes GKE cluster and workload metrics available as Prometheus data. It can ingest monitoring data exposed in Prometheus format, it supports PromQL compatible query language and has natively integrated the Prometheus expression browser, and Prometheus compatible rule evaluation. For application workloads in GKE, we recommend that customers use Google Managed Prometheus.
  • For applications or workloads deployed in Compute Engine, customers should use the Cloud Ops Agent to collect in-process metrics and to collect metrics from 3rd party applications that run in your VMs. Ops agent today supports more than 30 plugins for different open source and ISV software along with a collection of richer and more fine grained metrics at the OS level for Windows and Linux (many flavors).
    • The Ops agent is based on OpenTelemetry standards so custom applications developed by customers can leverage OTEL client libraries for instrumenting their code and generate the needed telemetry
    • The Ops agent can collect these custom metrics and make them available in Cloud Monitoring as well. While this ecosystem of 3P plugins will continue to expand, if users need support for other software products or services, consider using a partner product like Datadog or NewRelic. If you choose to use partner products, they can collect system metrics from the Google Cloud platform by using the native API-based integrations
  • With Google's partner BindPlane by Blue Medora, you can import monitoring and logging data from both on-premises VMs and other cloud providers, such as Amazon Web Services (AWS), Microsoft Azure, Alibaba Cloud, and IBM Cloud into Google Cloud. There are no additional licensing costs for using BindPlane. BindPlane metrics are imported into Monitoring as custom metrics, which are chargeable. Likewise, BindPlane logs are charged at the same rate as other Cloud Logging logs.
    • When you go to monitoring settings for a project, you can see that the current metrics scope only has a single project in it, the one it is currently viewing.
    • When you create a Google Cloud project, that project hosts a metrics scope and becomes the scoping project for that scope.
    • It stores the alerts, uptime checks, dashboards, and monitoring groups that you configure for the scope. For example, if you create a staging project and then access monitoring, you can see the metrics for the resources in the staging project.
    • This happens for every project you create. Each project creates a metrics scope for itself and hosts monitoring configuration for itself. But what if you want to centralize how that data is stored and how it's accessed?
  • Since it's possible for one metrics scope to monitor multiple projects, and also a project can be monitored from only a single metrics scope, you will have to decide which relationship will work best for your organizational culture, and this particular project.