Data Solutions

Cards (30)

  • Storage Services and Features have different SLAs:
  • SLAs:
    • The same can be seen for Cloud Spanner and Firestore, with multi-regional offering higher availability than single region configurations. This is where requirements are extremely important as they will help inform the storage choice.
    • The availability SLAs are typically defined per month. Monthly Uptime Percentage means the total number of minutes in a month, minus the number of minutes of downtime suffered from all downtime periods in a month, divided by the total number of minutes in a month.
  • Durability of data represents the odds of losing the data. Depending on the storage solution, the durability is a shared responsibility. Google Cloud’s responsibility is to ensure that data is durable in the event of a hardware failure. Your responsibility is performing backups of your data. For example, Cloud Storage provides you with 11 9’s durability, and versioning is a feature. However, it’s your responsibility to determine when to use versioning. It is recommended that you turn versioning on and have older versions archived as part of an object lifetime management policy.
  • Data Durability:
    • For other storage services to achieve durability, it usually means taking backups of data. For disks, this means snapshots, so snapshot jobs should be scheduled. For Cloud SQL, you can create a backup at any time (on-demand). This could be useful if you are about to perform a risky operation on your database, or if you need a backup and you do not want to wait for the backup window.
  • Data Durability:
    • Google Cloud also provides automated backups, point in time recovery, and optionally a failover server. You can create on-demand backups for any instance, whether the instance has automatic backups enabled or not. To improve durability, SQL database backups should also be run.
    • Spanner and Firestore provide automatic replication, and you should run export jobs with the data being exported to Cloud Storage.
  • The amount of data, and the number of reads and writes, are important to know when selecting a data storage service:
    • Some services scale horizontally by adding nodes; for example, Bigtable and Spanner, which is in contrast to Cloud SQL and Memorystore, which scale machines vertically. Other services scale automatically with no limits; for example, Cloud Storage, BigQuery, and Firestore.
  • Strong consistency is another important characteristic to consider when designing data solutions. A strongly consistent database will update all copies of data within a transaction and ensure that everybody gets the latest copy of commited data on reads. Google Cloud services providing strong consistency include Cloud Storage, Cloud SQL, Spanner, and Firestore. If an instance does not use replication, Cloud Bigtable provides strong consistency, because all reads and writes are sent to the same cluster.
  • Eventually Consistent:
    • Eventually consistent databases typically have multiple copies of the same data for performance and scalability. They support handling large volumes of writes. They operate by updating one copy of the data synchronously and all copies asynchronously, which means that not all readers are guaranteed to read the same values at a given point in time. The data will eventually become consistent, but not immediately. Bigtable and Memorystore are examples of Google Cloud data services that have eventual consistency.
  • Replication:
    • Replication for Cloud Bigtable is eventually consistent by default. This term means that when you write a change to one cluster, you will eventually be able to read that change from the other clusters in the instance, but only after the change is replicated between the clusters.
  • calculating the total cost per GB is important to help determine the financial implications of a choice.
    Bigtable and Spanner:
    • Bigtable and Spanner are designed for massive data sets and are not as cost-efficient for small data sets.
    Firestore:
    • Firestore is less expensive per GB stored, but the cost for reads and writes must be considered.
    Cloud Storage:
    • Cloud Storage is not as expensive but is only suitable for certain data types.
    BigQuery:
    • BigQuery storage is relatively cheap but does not provide fast access to records, and a cost is incurred for each query.
    • Here’s an example for our online travel portal, ClickTravel. We focussed on the inventory, inventory uploads, ordering, and analytics services. As you can see, each of these services has different requirements that might result in choosing different Google Cloud services
  • Cloud Storage and Database Portfolio
  • Cloud SQL:
    • Cloud SQL is a fixed schema datastore with a storage limit of 30 terabytes. It is offered using MySQL, PostgreSQL, and SQL Server. These services are good for web applications such as CMS or eCommerce.
    Cloud Spanner:
    • Is also relational and fixed schema but scales infinitely and can be regional or multi-regional. Example use cases include scalable relational databases greater than 30 GB with high availability and also global accessibility, like supply-chain management and manufacturing.
    Google Cloud’s NoSQL datastores are schemaless.
  • NoSQL:
    Firestore
    • Is a completely managed document datastore with maximum document size of 1 MB. It is useful for hierarchical data; for example, a game state of user profiles.
    Cloud Bigtable
    • Is also a NoSQL datastore that scales infinitely. It is good for heavy read and write events and use cases including financial services, Internet of Things, and digital advert streams.
    • For object storage, Google Cloud offers Cloud Storage. Cloud Storage is schemaless and is completely managed with infinite scale. It stores binary object data and so it's good for storing images, media serving, and backups.
    • Data warehousing is provided by BigQuery. The storage uses a fixed schema and supports completely managed SQL analysis of the data store
    • For in-memory storage, Memorystore provides a schemaless-managed Redis database. It is excellent for caching for web and mobile apps and for providing fast access to state in microservice architectures.
  • Data Solution Decision Chart:
  • You might also want to consider how to transfer data in Google Cloud. A number of factors must be considered, including cost, time, offline versus online transfer options, and security.
    • While transfer into Cloud Storage is free, there will be costs with the storage of the data and maybe even appliance costs if a transfer appliance is used or egress costs if transferring from another cloud provider.
  • Transferring data into Google Cloud:
    • If you have huge datasets, the time required for transfer across a network may be unrealistic. Even if it is realistic, the effects on your organization’s infrastructure may be damaging while the transfer is taking place.
  • For smaller or scheduled data uploads, use the Cloud Storage Transfer Service, which enables you to:
    ● Move or back up data to a Cloud Storage bucket from other cloud storage providers such as Amazon S3, from your on-premises storage, or from any HTTP/HTTPS location.
    ● Move data from one Cloud Storage bucket to another, so that it is available to different groups of users or applications.
    ● Periodically move data as part of a data processing pipeline or analytical workflow.
  • Storage Transfer Service provides options that make data transfers and synchronization easier. For example, you can:
    • Schedule one-time transfer operations or recurring transfer operations.
    • Delete existing objects in the destination bucket if they don’t have a corresponding object in the source.
    • Delete data source objects after transferring them.
    • Schedule periodic synchronization from a data source to a data sink with advanced filters based on file creation dates, file-name filters, and the times of day you prefer to import data.
  • Storage Transfer Service:
  • Storage Transfer Service:
    • Use the Storage Transfer Service for on-premises data for large-scale uploads from your data center.
    • The Storage Transfer Service for on-premises data allows large-scale online data transfers from on-premises storage to Cloud Storage. With this service, data validation, encryption, error retries, and fault tolerance are built in. On-premises software is installed on your servers—the agent comes as a Docker container and a connection to Google Cloud is set up. Directories to be transferred to Cloud Storage are selected in the Cloud Console.
  • Storage Transfer Service:
    • Once data transfer begins, the service will parallelize the transfer across many agents supporting scale to billions of files and 100s of TBs. Via the Cloud Console, a user can view detailed transfer logs and also the creation, management, and monitoring of transfer jobs.
    • To use the Storage Transfer Service for on-premises, a Posix-compliant source is required and a network connection of at least 300Mbps. Also, a Docker-supported Linux server that can access the data to be transferred is required with ports 80 and 443 open for outbound connections.
  • Storage Transfer Service:
    • The use case is for on-premises transfer of data whose size is > 1TB.
  • Transfer Appliance:
    • For large amounts of on-premises data that would take to long to upload, use Transfer Appliance.
    • Transfer Appliance is a secure, rackable, high-capacity storage server that you set up in your data center. You fill it with data and ship it to an ingest location, where the data is uploaded to Google. The data is secure, you control the encryption key, and Google erases the appliance after transfer is complete.
  • Transfer Appliance:
    • The process for using a transfer appliance is that you request an appliance, and it is shipped in a tamper-evident case. Data is transferred to the appliance. The appliance is shipped back to Google, data is loaded to Cloud Storage, and you are notified that it is available. Google uses tamper-evident seals on shipping cases to and from the data ingest site. Data is encrypted to AES256 standard at the moment of capture. Once the transfer is complete, the appliance is erased per NIST-800-88 standards. You decrypt the data when you want to use it.
  • Transfer Appliance Process:
  • Use Transfer Appliance for large amounts of data
  • BigQuery Data Transfer Service
    • There is also a transfer service for BigQuery. The BigQuery Data Transfer Service automates data movement from SaaS applications to BigQuery on a scheduled, managed basis. The Data Transfer Service initially supports Google application sources like Google Ads, Campaign Manager, Google Ad Manager, and YouTube. There are also data connectors that allow easy data transfer from Teradata, Amazon Redshift, and Amazon S3 to BigQuery.
  • Here’s an example for our online travel portal, ClickTravel:
    • For the inventory service we will use Cloud Storage for the raw inventory uploads. Suppliers will upload their inventory as JSON data stored in text files. That inventory will then be imported into a Firestore database.
    • The orders service will store its data in a relational database running in Cloud SQL.
    • The analytics service will aggregate data from various sources into a data warehouse, for which we’ll use BigQuery.