pipelining and vector processing

Created by

Reshmika

Cards (98)

Pipelining 
1. Decompose a sequential process into suboperations
2. Execute each subprocess in a special dedicated segment
3. Operate concurrently with all other segments
4. Associate a register with each segment
5. Registers provide isolation between each segment
6. Segment consists of an input register and a combinational circuit
7. Apply a clock to all registers after enough time has elapsed to perform all segment activity
Pipeline organization 
Each segment has one or two registers and a combinational circuit
The five registers are loaded with new data every clock pulse
Any operation that can be decomposed into a sequence of suboperations of about the same complexity can be implemented by a pipeline processor
Task 
The total operation performed going through all the segments in the pipeline
To complete n tasks using a k-segment pipeline requires k+(n-1) clock cycles
Nonpipeline unit that performs the same operation 
Takes a time equal to tn to complete each task
Speedup of pipeline processing over nonpipeline processing 
S = ntn/(k+n-1)tp
The theoretical maximum speedup that a pipeline can provide is k, where k is the number of segments in the pipeline
To duplicate the theoretical speed advantage of a pipeline process by means of multiple functional units, it is necessary to construct k identical units that will be operating in parallel
The pipeline cannot operate at its maximum theoretical rate due to different segment times and incorrect assumptions about nonpipeline circuit delays
Areas of computer design where pipeline organization is applicable 
Arithmetic pipeline
Instruction pipeline
Parallel processing 
A large class of techniques used to provide simultaneous data-processing tasks to increase computational speed and throughput
The amount of hardware and cost increases with parallel processing
Levels of parallel processing complexity 
Parallel and serial operations by register type
Multiplicity of functional units performing identical or different operations simultaneously
A multifunctional organization is usually associated with a complex control unit to coordinate all the activities among the various components
Ways to classify parallel processing 
Internal organization of the processors
Interconnection structure between processors
The flow of information through the system
Flynn's classification of computer systems 
Single instruction stream, single data stream (SISD)
Single instruction stream, multiple data stream (SIMD)
Multiple instruction stream, single data stream (MISD)
Multiple instruction stream, multiple data stream (MIMD)
SISD organization 
Single computer with a control unit, processor unit, and memory unit
Instructions executed sequentially
May have internal parallel processing capabilities
SIMD organization 
Many processing units under the supervision of a common control unit
All processors receive the same instruction but operate on different data
Shared memory unit must contain multiple modules
MISD structure is only of theoretical interest as no practical system has been constructed using this organization
MIMD organization 
Computer system capable of processing several programs at the same time, e.g. multiprocessor and multicomputer systems
Flynn's classification emphasizes the behavioral characteristics of the computer system rather than its operational and structural interconnections
Pipelining does not fit Flynn's classification of parallel processing
Main topics of parallel processing 
Pipeline processing
Vector processing
Array processing
Arithmetic pipeline 
Found in very high speed computers
Floating-point operations, multiplication of fixed-point numbers, and similar computations in scientific problems
Floating-point operations are easily decomposed into suboperations
Floating-point addition and subtraction pipeline 
1. Compare exponents
2. Align mantissas
3. Add or subtract mantissas
4. Normalize the result
The comparator, shift, adder, subtractor, incrementer, and decrementer in the floating-point pipeline are implemented with combinational circuits
The pipeline floating-point arithmetic delay is 110ns, while the nonpipeline delay is 320ns, resulting in a speedup of 2.9
Instruction pipeline 
Pipeline processing can occur in the instruction stream as well as the data stream
Computers with complex instructions require phases like fetch, decode, effective address calculation, operand fetch, instruction execution, and result storage
delay 
tp=t3+tr =110ns
Nonpipeline floating-point arithmetic delay
tn=t1+t2+t3+t4+tr=320ns
Nonpipeline floating-point arithmetic delay
Speedup: 320/110=2.9
Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing
Reference: W. Stallings
Pipeline processing 
Can occur not only in the data stream but in the instruction as well
Computer with an instruction fetch unit and an instruction execution unit 
Designed to provide a two-segment pipeline
Sequence of steps to process an instruction 
Fetch the instruction from memory
Decode the instruction
Calculate the effective address
Fetch the operands from memory
Execute the instruction
Store the result in the proper place
Difficulties that will prevent the instruction pipeline from operating at its maximum rate 
Different segments may take different times to operate on the incoming information
Some segments are skipped for certain operations
Two or more segments may require memory access at the same time, causing one segment to wait until another is finished with the memory
Four-Segment Instruction Pipeline 
Decoding of the instruction can be combined with the calculation of the effective address into one segment
The instruction execution and storing of the result can be combined into one segment
Fig 4-7 shows how the instruction cycle in the CPU can be processed with a four-segment pipeline