Some are better suited to different types of data than others
Programmers must decide which of the data structures available is the best to use
Arrays
An indexed set of related elements
Must be finite and indexed
An array’s index often starts from zero
Must only contain elements with the same data type
Arrays can be created in many dimensions
Fields, records and files
Information is stored by computers as a series of files
Each file is made up of records which are composed of a number of fields
Abstract data types/data structures
Don‘t exist as data structures in their own right
Make use of other data structures to form a new way of storing data
Queues
An abstract data structure based on an array
The first item added to a queue is the first to be removed
Referred to as “first in, first out” (FIFO) abstract data structures
Used by computers in keyboard buffers and in the breadth first search algorithm
Linear queues
Have two pointers, a front pointer and a rear pointer
These can be used to identify where to place a new item in a queue or to identify which item is at the front of the queue
The item at the front of the queue has been in the queue for the longest
Operations that can be performed on a queue include:
Enqueue
Dequeue
IsEmpty
IsFull
Emptiness can be detected by comparing the front and rear pointers
Circular queues
A type of queue in which the front and rear pointers can move the two ends of the queue
More memory efficient than a linear queue
Priority queues
Items are assigned a priority
High priority items are dequeued before low priority items
In the case that two or more items have the same priority, the items are removed in the usual first in, first out order
Frequently used in computer systems
Stacks
A first in, last out (FILO) abstract data structure
Often based on an array
Have one pointer: a top pointer
The item at the bottom of a stack has been in the stack for the longest
Operations that can be performed include:
Push (add an item)
Pop (remove the top item)
Peek (return the value of the item at the top of the stack without actually removing the item)
The functions IsFull and IsEmpty can be executed on stacks
If a push command is executed on a full stack, a stack overflow error is thrown
A stack underflow can be caused by attempting the pop command on an empty stack
Graphs
An abstract data structure used to represent complex relationships between items within datasets
Can be used to represent networks e.g., transport networks, IT networks and the Internet
Consists of nodes (vertices) joined by edges (arcs)
A weighted graph is one in which edges are assigned a value, representing a value such as time, distance or cost
Can be represented in two different ways:
Adjacency matrices
Adjacency lists
Adjacency matrices
A tabular representation of a graph
Each of the nodes in the graph is assigned both a row and a column
Unweighted:
1 is used to show that an edge exists between two nodes, 0 indicates no connection
They have a characteristic diagonal line of 0s and display diagonal symmetry
Weighted:
Contain the weight of a connection between two nodes, if no connection exists, an arbitrarily large value is used e.g., infinity
Adjacency lists
Represent a graph using a list
For each node, a list of adjacent nodes is created
Only records existing connections in a graph
Evaluation of adjacency matrices vs lists
Adjacency matrix:
Memory inefficient: Stores every possible edge between nodes, even those that don‘t exist, almost half is repeated data
Time efficient: Allows a specific edge to be queried very quickly as can be looked up by its row and column
Well suited to dense graphs with a large number of edges
Adjacency list:
Memory efficient: Only stores the edges that exist in the graph
Time inefficient: Slow to query, as each item in a list must be searched sequentially until the desired edge is found
Well suited to sparse graphs with few edges
Trees
A tree is a connected, undirected graph with no cycles
Rooted trees
Have a root node from which all other nodes stem
The root node is usually displayed at the top of the tree in diagrams
Nodes from which other nodes stem are called parent nodes
The root node is the only node with no parent
Nodes with a parent are called child nodes
Child nodes with no children are called leaf nodes
Binary trees
A rooted tree in which each parent node has no more than two child nodes.
Hash tables
Constant time complexity
A hashing algorithm takes an input and returns a hash
A hash is unique to its input and cannot be reversed to retrieve the input value
A hash table stores key-value pairs
The key is sent to a hash function, which produces a hash
Resulting hash is the index of the key-value pair in the hash table
When an element is to be looked up in a hash table, the key is first hashed
Once the hash has been calculated, the position in the table corresponding to that hash is queried and the desired info located
Hash tables - collisions
Sometimes different inputs produce the same hash
Well-designed hash tables get around collisions by using rehashing
One simple rehashing technique is to keep on moving to the next position until an available one is found
Dictionaries
A collection of key-value pairs
Values are accessed by their associated key
One application of dictionaries is in dictionary-based compression
Vectors
Can be represented as:
lists of numbers
functions
ways of representing a point in space
If viewed as a function, a vector can be represented by using a dictionary, if viewed as a list, a one-dimensional array would be suitable. They can be visually represented as an arrow.
Vector addition
Vectors can be added to achieve translation - with arrows, vectors are added “tip to tail”. Vectors can also be added by adding each of their components.
Scalar-vector multiplication
In order to scale a vector, each of its components are multiplied by a scalar. Scaling a vector only affects its magnitude, not its direction.
Convex combination of two vectors
With two vectors, a and b, a convex combination of the two would be ax + by where x and y:
are non-zero numbers
are less than one
add to 1
The convex combination of a and b with x and y:
is formed on the line that would join the tips of a and b
splits the line in the ratio chosen for the values x and y
Dot product
A single number derived from the components of vectors.
Can be used to determine the angle between two vectors
The dot product of the vectors an and b is notated a.b (said ”a dot b”)
Static and dynamic data structures
Every data structure is either static or dynamic
Static data structures are:
fixed in size
most frequently declared in memory as a series of sequential, contiguous memory locations
Dynamic data structures
change in size to store their content
store each item of data alongside a reference to where the next item is stored in memory
require more work on the part of the computer to set up and use