pipelining and vector processing

Cards (98)

  • Pipelining
    1. Decompose a sequential process into suboperations
    2. Execute each subprocess in a special dedicated segment
    3. Operate concurrently with all other segments
    4. Associate a register with each segment
    5. Registers provide isolation between each segment
    6. Segment consists of an input register and a combinational circuit
    7. Apply a clock to all registers after enough time has elapsed to perform all segment activity
  • Pipeline organization

    • Each segment has one or two registers and a combinational circuit
    • The five registers are loaded with new data every clock pulse
  • Any operation that can be decomposed into a sequence of suboperations of about the same complexity can be implemented by a pipeline processor
  • Task
    The total operation performed going through all the segments in the pipeline
  • To complete n tasks using a k-segment pipeline requires k+(n-1) clock cycles
  • Nonpipeline unit that performs the same operation

    Takes a time equal to tn to complete each task
  • Speedup of pipeline processing over nonpipeline processing

    S = ntn/(k+n-1)tp
  • The theoretical maximum speedup that a pipeline can provide is k, where k is the number of segments in the pipeline
  • To duplicate the theoretical speed advantage of a pipeline process by means of multiple functional units, it is necessary to construct k identical units that will be operating in parallel
  • The pipeline cannot operate at its maximum theoretical rate due to different segment times and incorrect assumptions about nonpipeline circuit delays
  • Areas of computer design where pipeline organization is applicable

    • Arithmetic pipeline
    • Instruction pipeline
  • Parallel processing
    A large class of techniques used to provide simultaneous data-processing tasks to increase computational speed and throughput
  • The amount of hardware and cost increases with parallel processing
  • Levels of parallel processing complexity

    • Parallel and serial operations by register type
    • Multiplicity of functional units performing identical or different operations simultaneously
  • A multifunctional organization is usually associated with a complex control unit to coordinate all the activities among the various components
  • Ways to classify parallel processing

    • Internal organization of the processors
    • Interconnection structure between processors
    • The flow of information through the system
  • Flynn's classification of computer systems

    • Single instruction stream, single data stream (SISD)
    • Single instruction stream, multiple data stream (SIMD)
    • Multiple instruction stream, single data stream (MISD)
    • Multiple instruction stream, multiple data stream (MIMD)
  • SISD organization

    • Single computer with a control unit, processor unit, and memory unit
    • Instructions executed sequentially
    • May have internal parallel processing capabilities
  • SIMD organization

    • Many processing units under the supervision of a common control unit
    • All processors receive the same instruction but operate on different data
    • Shared memory unit must contain multiple modules
  • MISD structure is only of theoretical interest as no practical system has been constructed using this organization
  • MIMD organization

    • Computer system capable of processing several programs at the same time, e.g. multiprocessor and multicomputer systems
  • Flynn's classification emphasizes the behavioral characteristics of the computer system rather than its operational and structural interconnections
  • Pipelining does not fit Flynn's classification of parallel processing
  • Main topics of parallel processing

    • Pipeline processing
    • Vector processing
    • Array processing
  • Arithmetic pipeline

    • Found in very high speed computers
    • Floating-point operations, multiplication of fixed-point numbers, and similar computations in scientific problems
    • Floating-point operations are easily decomposed into suboperations
  • Floating-point addition and subtraction pipeline

    1. Compare exponents
    2. Align mantissas
    3. Add or subtract mantissas
    4. Normalize the result
  • The comparator, shift, adder, subtractor, incrementer, and decrementer in the floating-point pipeline are implemented with combinational circuits
  • The pipeline floating-point arithmetic delay is 110ns, while the nonpipeline delay is 320ns, resulting in a speedup of 2.9
  • Instruction pipeline

    • Pipeline processing can occur in the instruction stream as well as the data stream
    • Computers with complex instructions require phases like fetch, decode, effective address calculation, operand fetch, instruction execution, and result storage
  • delay
    tp=t3+tr =110ns
  • Nonpipeline floating-point arithmetic delay
    • tn=t1+t2+t3+t4+tr=320ns
  • Nonpipeline floating-point arithmetic delay
    Speedup: 320/110=2.9
  • Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing
  • Reference: W. Stallings
  • Pipeline processing

    Can occur not only in the data stream but in the instruction as well
  • Computer with an instruction fetch unit and an instruction execution unit

    • Designed to provide a two-segment pipeline
  • Sequence of steps to process an instruction

    • Fetch the instruction from memory
    • Decode the instruction
    • Calculate the effective address
    • Fetch the operands from memory
    • Execute the instruction
    • Store the result in the proper place
  • Difficulties that will prevent the instruction pipeline from operating at its maximum rate

    • Different segments may take different times to operate on the incoming information
    • Some segments are skipped for certain operations
    • Two or more segments may require memory access at the same time, causing one segment to wait until another is finished with the memory
  • Four-Segment Instruction Pipeline

    • Decoding of the instruction can be combined with the calculation of the effective address into one segment
    • The instruction execution and storing of the result can be combined into one segment
  • Fig 4-7 shows how the instruction cycle in the CPU can be processed with a four-segment pipeline