File organization

Cards (33)

  • History of File Structures
    Early Work assumed files were on tape, access was sequential and cost of access grew with file size
  • How Can Secondary Storage Access Time be Improved?
    • By improving the File Structure, details of data representation and operations efficiency can help improve secondary storage access time
  • Overview of File Structure Design
    General Goals: Get the information needed with one access to the disk, group information to likely get everything needed with only one trip to the disk
  • History of File Structures
    The emergence of Disks and Indexes: Disks allowed for direct access as files grew large, Indexes kept a list of keys and pointers for quick searches
  • Types of storage
    • Primary Storage (Memory), Secondary Storage (Online Disk/Tape/CDRom), Tertiary Storage (Archival Data)
  • Memory versus Secondary Storage
    • Secondary storage can pack thousands of megabytes in a small physical location, while Computer Memory (RAM) is limited. Access to secondary storage is slower than Memory
  • A collection of data is placed under permanent or non-volatile storage is a file.
  • anything that you can store in a disk hard drive, tape, optical media, and any other medium which doesn’t lose the information when the power is turned off is an example of a file.
  • A collection of bytes stored on a disk or tape that physically exists in the secondary storage appearing in its file directory and known by the operating system is called a physical file
  • Logical file has a logical name used for referring to the file inside the program. This logical name is a variable inside the program.
  • A file is a collection of data placed under permanent or non-volatile storage
  • Examples of files
    • Anything that can be stored in a disk hard drive, tape, optical media, and any other medium which doesn’t lose information when the power is turned off
  • Operating system is responsible for associating a logical file in a program to a physical file in disk or tape. Writing to or reading from a file in a program is done through the operating system
  • Logical file is a "channel" that connects the program to a physical file, hiding the details of the file’s location and physical format to the program. This is what your program actually uses
  • When a program wants to use a particular file, the operating system must find the physical file and assign a logical file to it. This logical file has a logical name which is used inside the program
  • Input devices (keyboard) and output devices (screen, printer, etc) are treated as files from the program's point of view
  • File opening modes in C
    • \r = open for reading,
    • \w = open for writing (file need not to exist),
    • \a = open for appending (file need not to exist)
  • The action of moving directly to a certain position in a file is often called seeking.
  • Physical Devices are also treated as Files
  • Source_file = the logical file name in which the seek will occur
  • Deletion of Fixed Length Record
    1. Mark record as deleted
    2. Insert deleted record into AVAIL LIST
  • Deletion of Variable Length Record
    1. Mark record as deleted
    2. Insert deleted record into AVAIL LIST with size as a field
  • Placement Strategies for New Records(Variable length records)
    • First Fit
    • Best Fit
    • Worst Fit
  • Record Deletion and Storage Compaction
    1. Deletion can be done by marking a record as deleted
    2. Space for the record is not released, program must check if record is deleted
    3. After many deletions, a special program is used to squeeze (compress) the file
  • Deleting Fixed-Length Records and Reclaiming Space Dynamically
    1. Use an "AVAIL LIST", a linked list of available records
    2. A header record stores the beginning of the AVAIL LIST
    3. When a record is deleted, it is marked as deleted and inserted into the AVAIL LIST
  • Deleting Variable-Length Records and Reclaiming Space Dynamically
    1. Use an AVAIL LIST, but take care of the variable-length difficulties
    2. Records in AVAIL LIST must store size as a field
    3. Exact byte offset must be used instead of RRN
    4. Addition of records must find a large enough record in AVAIL LIST
  • Types of fragmentation: Internal Fragmentation (wasted space within a record) and External Fragmentation (space available in AVAIL LIST but too small to reuse)
  • First Fit Strategy
    AVAIL LIST is not sorted by size, first record large enough to hold new record is chosen
  • Best Fit Strategy
    AVAIL LIST is sorted by size (Ascending), smallest record large enough to hold new record is chosen
  • Worst Fit Strategy
    AVAIL LIST is sorted by decreasing order of size, largest record is used for holding new record, unused space is placed again in AVAIL LIST
  • If the added record is smaller than the item taken from AVAIL LIST by few bytes, the options are: 1) Leave the space unused within record (internal fragmentation, best-fit) or 2) Return the unused space as a new available record to AVAIL LIST (external fragmentation, worst-fit)
  • Compacting external fragmentation can be done by combining adjacent records in AVAIL LIST into a larger record, or by using a placement strategy like worst-fit to minimize fragmentation
  • Placement strategies only apply to variable-length records. For fixed-length records, first-fit and best-fit are better than worst-fit to minimize internal fragmentation. For external fragmentation, worst-fit should be considered.