Module 3

Cards (142)

  • The distributed nature of Cassandra's databases is a key feature with both technological and commercial benefits.
  • When an application is under a lot of stress, Cassandra databases scale readily, and the distribution also guards against data loss due to hardware failure in any specific datacenter.
  • Technical power is another benefit of a distributed design; for instance, a developer can modify the throughput of separate read and write queries.
  • The term "distributed" refers to Cassandra's ability to function across various computers and present itself to users as a single entity.
  • Running Cassandra as a single node serves little purpose, but it is quite beneficial to do so to help you become familiar with how it operates.
  • Cassandra is made to handle huge data workloads across several nodes.
  • Its architecture is predicated on the knowledge that hardware and system failures can and do happen.
  • By using a peer-to-peer distributed architecture across homogeneous nodes where data is spread among all nodes in the cluster, Cassandra overcomes the issue of failures.
  • Using the peer-to-peer gossip communication protocol, each node often transmits state information about itself and other nodes throughout the cluster.
  • To ensure data persistence, each node maintains a sequentially written commit log that records write activities.
  • Information is indexed and written to a memtable, an in-memory structure that mimics a write-back cache.
  • The information is written to disk in an SSTables data file whenever the memory structure is full.
  • All writes are replicated and automatically partitioned throughout the cluster.
  • SSTables are periodically consolidated by Cassandra via a technique known as compaction, which involves removing outdated data marked for deletion with a tombstone.
  • Reliability and fault tolerance in Cassandra can be achieved by replicating a single piece of data over many nodes.
  • Row copies in Cassandra are known as replicas.
  • Cassandra assigns a hash value to each partition key.
  • Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.
  • In a Cassandra cluster, each node is in charge of two sets of tokens in addition to its primary set.
  • A node is set up by default to keep the data it controls in a directory specified in the cassandra.yaml file.
  • You can switch the commitlog-directory in a production cluster deployment to a different disk drive than the data_file_directories.
  • The cassandra.yaml configuration file is the primary configuration file used to set a cluster's initialization parameters, table caching parameters, tuning and resource-use attributes, timeout settings, client connections, backups, and security.
  • Set dynamic snitch thresholds in the cassandra.yaml configuration file on each node.
  • System keyspace table properties can be set using a client application like CQL on a per-keyspace or per-table basis.
  • Data replication and distribution in Cassandra go hand in hand, with each table's data identified by a primary key, which also identifies the node on which it is kept.
  • The replication factor (RF) concept, which specifies how many copies of your data should exist in the database, is supported by Cassandra.
  • Each node in the cluster is responsible for a range of data based on the hash value.
  • Various repair procedures are used to guarantee that the consistency of all data across the cluster.
  • Any node in the cluster can receive client read or write requests.
  • A node acts as the coordinator for a client action when a client connects to it with a request.
  • Between the client application and the nodes that control the requested data, the coordinator serves as a proxy.
  • Smaller and larger computers can be utilized to construct a cluster since the proportion of vnodes allotted to each machine in the cluster can be adjusted.
  • Virtual nodes are chosen at random and are not contiguous within a cluster.
  • A token is a 64-bit integer by default, resulting in the range of possible tokens from -263 to 263-1.
  • Every node in a cluster is mapped by a Cassandra to one or more tokens on a continuous ring shape.
  • In the case of a single token per node, each node is in charge of a token range of values between the allocated token and that node's assigned token plus one.
  • Data is distributed via vnodes using consistent hashing rather than new token creation and assignment.
  • To associate nodes with one or more tokens, Cassandra employs a consistent hashing algorithm.
  • To determine which node the data belongs to, this token value is contrasted with the token range for each node.
  • Because every other node in the cluster is involved, rebuilding a dead node is quicker.