DATABASE NORMALIZATION

Cards (19)

  • DATABASE NORMALIZATION
    •         Optimizing table structures
    •         Removing duplicate data entries
    •         Process of efficiently organizing data in the DB.
    •         A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.
    •         a formal method that identifies relations based on their primary key and the functional dependencies among their attributes.
  • WHY NORMALIZE?
    •         Improved speed
    •         More efficient use of space (Eliminate redundant data in a DB)
    •         Ensure data dependencies make sense
    •         Increased data integrity - decreased chance that data can get messed up due to maintenance (prevent possible corruption of DB stemming from update anomalies -insertion, deletion, modification).
    •         Tables that are not normalized are susceptible to experiencing modification anomalies
  • Insertion Anomaly
    • occurs when certain attributes cannot be inserted into the database without the presence of other attributes.
  • Update Anomaly
    • exists when one or more instances of duplicated data is updated, but not all.
  • Deletion Anomaly
    • exists when certain attributes are lost because of the deletion of other attributes.
  •  Functional dependency

    Describes the relationship between attributes in a relation.
  • Determinant
    -          attribute or set of attributes on the left hand side of the arrow. ISBN is the determinant for the given example.
  • Unnormalized form (UNF)
    -          A table that contains one or more repeating groups.
  • Repeating group
    -          an attribute or group of attributes within a table that occurs with multiple values for a single occurrence of the nominated key attributes of that table.
  • First normal form (1NF)
    A relation in which the intersection of each row and column contains one and only one value, meaning there are no repeating groups.
    First normal form (UNF → 1NF): Remove repeating groups:
    -          Enter appropriate data in the empty columns of rows. Place repeating data along with a copy of the original key attribute in a separate relation. Identifying a primary key for each of the new relations
  • Second normal form (1NF -> 2NF)
    -           A relation that is in 1NF and with no partial dependencies.
    (1NF -> 2NF): Remove partial functional dependencies.
    1NF → 2NF: the partial functionally dependent attributes are removed from the relation by placing them in a new relation along with a copy of their determinant.
  • Identify the candidate key for a relation
    -           recognize the attribute (group of attributes) that uniquely identifies each row in a relation. All of the attributes that are not part of the primary key (non-primary key attributes) should be functionally dependent on the key.
  • Partial functional dependency
    -          when a nonkey attribute is functionally dependent on part (but not all) of the primary key.
  • Third normal form (3NF)
    -          A relation that is in 1NF and 2NF, and in which transitively dependent on the primary key.
    • Remove transitive dependencies
    • 2NF → 3NF: the transitively dependent attributes are removed from the relation by placing them in a new relation along with a copy of their determinant.
    • 2NF → 3NF: the transitively dependent attributes are removed from the relation by placing them in a new relation along with a copy of their determinant.
  • FULL FUNCTIONAL DEPENDENCY
    -          where the dependent attributes are determined by the determinant attributes.
  • PARTIAL FUNCTIONAL DEPENDENCY
    -          the dependent attributes are partially determined by the determinant attributes.
  • SURROGATE KEY
    -          called a synthetic primary key.
    -          automatically generated when a new record is inserted into a table.
    -          sequential number outside of the database.
  • FEATURES OF THE SURROGATE KEY
    ·         It holds an anonymous integer.
    ·         It contains a unique value for all records of the table.
    ·         The value can never be modified by the user or application.
    ·         The surrogate key is called the factless key as it is added just for our ease of identification of unique values and contains no relevant fact(or information) that is useful for the table.
  • USE OF SURROGATE KEY IN DBMS
    • Uniqueness: Data integrity is improved by the guaranteed uniqueness of surrogate keys.
    • Stability: Since surrogate keys do not depend on any business rules or data value, they have a lower chance of changing over time.
    • Efficiency: Compared to natural keys, surrogate keys are frequently smaller and process more quickly.
    • Flexibility: In the event that the natural key changes, rows can still be uniquely identified using surrogate keys