SCD - Unit 1.4.2
- Created by: Freddie Frame
- Created on: 04-02-20 15:06
Unit 1.4.2 T1 - Arrays, Tuples, Records
Data Structures:
- Elementary - Char, real, Integer, Boolean
- Built-in - Strings, Arrays, Lists, Records
Structured data types are usually made of elementary e.g. a string can have letters, numbers, etc.
2D arrays (and any other) have to have their size delclared e.g. [2,2]
- Arrays with any number of dimensions can be defined, it is a set of elements with the same data type
- Array elements can be referenced using arrayname[arrayindex]
Tuple - An ordered set of values that are immutable (values cannot change)
Records - Fixed number of fields, different data types, implemented as an object and treated like a tuple, when writing records are usually written as a single line
Dynamic Data Structures - Changes size when required
Static Data Structures - Cannot change size (arrays in most languages do not automatically change size; some provide resizing functions but must be programmed)
Unit 1.4.2 T2 - Queues
4 distint operators of a queue:
- Add item at rear of queue
- Remove item from front of queue
- Check if empty
- Check if full
When items leave the queue they remain as it is wasteful of CPU cycles to remove and fill positions with blanks instead pointers are used:
- Front - Holds index (number) of next position to be removed
- Rear - Holds index of last position added
Full queues cannot be added to and empty queues cannot have items removed
A queue may be full as when initialising the max no. of values it can hold needs to be declared
Another valirable size may be needed to hold the number of items currently in the queue
Unit 1.4.2 T2 - Queues - Functions/Methods
- enQueue(item) - Adds item to rear of queue
- deQueue() - Remove and return item from the front
- isEmpty() - Inicates if queue is empty
- isFull() - Indicates if queue is full
Queues can become circular to overcome the problem of reusing spaces freed by dequeuing from the front of the array as pointers go 0 >1>2>3>4>0>1, the MOD function allows array positions to be changed
Psuedocode: (1st two are for standard queues, 2nd two are for circular queues)
Unit 1.4.2 T2 - Queues - Priority Queue
Priority queues allow items to jump the queue by assigning priorities to arriving items
Example: Each item has a priority associated with it,e.g.
- Buses ending with A are high priority
- Buses ending with B are medium priority
- Buses ending with C are low priority
92B, 64C, 142A, 25C, 87B > 142A, 92B, 87B, 64C, 25C
An algorithm such as an insertion sort can be used to insert buses into the queue. If 142A leaves and 75B joins the queue now look like this: 92B, 87B, 75B, 64C, 25C
Unit 1.4.2 T3 - Lists and Linked Lists
Lists can be static (cannot change size) or dynamic(grow and shrink)
Programming Operations for a list:
- isEmpty() - Test for empty list
- append(item) - Add a new item to end of list
- remove(item) - Remove first occourrance of an item from list
- count(item) - Return the number of occurrences of item in list
- len(list) - Return no. of items in list
- index(item) - Return position of item
- insert(pos,item) - Add new item as position pos
- pop() - Remove and return last item in list
- pop(pos) - Remove and return item at position pos
Using a dynamic structure such as a list to implement a queue does not need maxSize and isFull as it is never full
1.4.2 T3 - Linked Lists
Linked lists, like lists are dynamic abstract data structures and is also implemented as an array and uses pointers
Linked lists have nodes with 2 parts:
- Data (can be complex data structure)
- Pointer of next node
A nextfree pointer is used to show the index of the next free space in the array
A start pointer is used to point to the first node in the list (if empty this is null as with any pointers with nothing to point to)
The last node in the list has the pointer null and each pointer holds the next index of the next node depending on how it is sorted (numerical/alphabetical order)
An alphabetically ordered list may look like this
1.4.2 T3 - Linked Lists - Adding & Deleting Nodes
When adding a new node:
- Data is placed in node pointed to be nextfree
- Follow pointers to where the new node needs to be linked in
- Adjust pointers
When deleting a node:
- Adjust pointers
- Pointers of previous item before deleted node needs to point to index of new next node in list
- nextfree pointer adjusted to deleted node if list is full; else it remains the same
1.4.2 T3 - Linked lists - Peeking Ahead
Peeking:
Data and pointer in the current node (p) can be examined without altering the data
1.4.2 T4 - Stacks
Another example of an Abstract data type (ADT) such as queues, lists and linked lists
Stack is a Last in First Out (LIFO) list unlike a queue which is a First in First Out (FIFO) list
Used in many cases such as an interrupt
Opertations of modelling a stack:
- Add item to the top
- Remove item from the top
- Check if stack is full
- Check if stack is empty
Programming Operations:
- Push(item) – adds an item to the top of the stack
- Pop() – removes and returns the item on the top of the stack
- isFull()
- isEmpty()
- peek() – Returns an item from the top without removing it
- size() – Return the number of items in a stack
1.4.2 T4 - Stacks as lists
Stacks can bne implemented as a list using operations for functions such as:
- Append() – Add an item to the stack
- Pop() – remove an item from the stack
- isEmpty() – Check if the stack is empty
- len(stack) – Check the number of items in the stack
Overflow - Attempting to push onto a full stack (unless dynamic)
Underflow - Attempting to pop from a stack that is empty
Stack overflow error may be implemented in a dynamic list but as an overflow cannot happen it will only error if the computer runs out of memory
1.4.2 T4 - Call Stack
Call Stack
System level data structure (on CPU level) that provides mechanism for passing parameters and return addresses to subroutines, this is hidden from the user in high-level languages
An example of this is comparing numbers: Bigger = max(num1,num2) as the programmer doesn't need to know how the arguments are sent to the function or how the result is returned. The values and the return address are saved on the stack and the values are popped when the function completes
1.4.2 T4 - Subroutines
A subroutine can also be called using the following method:
- Parameters are saved onto stack
- Address to which execution returns after the end of the subroutine is reached and saved onto stack
- Execution is transferred to subroutine code
This is then executed as follows
- Stack space is allocated for local variables
- Subroutine code executes
- Return address is retrieved
- Parameters are popped
- Execution is transferred back to the return address
1.4.2 T5 - Hash Tables
A hash table is another ADT which can find any record in a data set almost instantly, no matter the size where the address of the data is calculated from the data itself using a hashing algorithm
A hash table is what is formed when a set of data has a hashing algorithm/hash function applied to it
A problem however is that sometimes the address of a piece of data may be the same as another e.g. Assign 1000 slots to hold records - Address - key mod (number of slots)
When the same address is generated for different primary keys this is called a hashing collision (or a synonym)
To deal with this you can add 1 to the address or putting it in the next free slot, this however can lead to clustering in the table. Alternatively the algorithm increments the skip value by adding 1,3,5,7 etc. this skip value needs to be such that all slots are included which can be easily avoided by using a table with a prime number of slots
1.4.2 T5 - Load Factor + Searching a Hash Table
As a hashing table fills up its load factor (no. of items against table size) increades and more collisions occour, for this reason a table should not have a load factor outside of 50-70% of the table size so it is big enough but no too big
Psuedocode for searching a hash table:
For items that aren’t in the next slot keep going on until the next free slot is found, this means that the value is not in the table.
Many methods exist to search but all use the mod function at some point as the hash address has to be in the range of table addresses
1.4.2 T5 - Other Hashing Methods
Mid Square:
- Square item (assuming it is numeric)
- Extract the middle digits
- Perform mod function to get an address
Folding:
- Divide item e.g. 34721062 into equal size pieces > 34, 72, 10, 62 (last piece may not be of equal size)
- Add pieces together
- Perform mod function to get an address
Dealing with alphanumeric data - can be converted to numeric by using the ASCII code for each character then apply the hashing alogoithm to the reulting digits
Dealing with deletions - itmes to be deleted must be replaced with a dummy item (e.g. Deleted/##) as when searching for an item that hashes to that spot, if not found the algoithm will continue until the value is found or a free space is located. If a new item is added to the address with deleted it will replace the dummy item
1.4.2 T5 - Dictionary
Dictionaries are an ADT made up of associated pairs
- Each pair has a key and a value
- Value is only accessed by the key
- example (python): ID = {145: "Ken", 675: "Belinda")
- Multiple items can share the same key
Used in ASCII or Unicode table in a computer system, translations program, trees and graphs
Operations:
- Add new key:value pair to the dictionary
- Delete a key:value pair from the dictionary
- Amend the value in a key:value pair
- Return the value associated with key k
- Return True or False if the key is in the dictionary
- Return length of dictionary
Address of key:vlaue paur is calculated in a hashing algorithm so they are not held in any particular sequence
For keys with multiple items enclose the items in square brackets[]
1.4.2 T6 - Graphs
A graph is another ADT that represents complex relationships
Graphs contain Nodes/Vertex, Edges/Arcs and Edge Weights
Types of graph:
- Undirected - No arrows indicating direction (bidirectional)
- Directed - Edges are all one-way
- Unweighted - Edges do not have a value associated
- Weighted - Edges which indicate a cost of traversal
1.4.2 T6 - Representing Graphs (Adjacency Matrix)
A computer needs to represent the information about distances and connections in a structured, numerical way, this is implemented using an Adjacency Matrix or an Adjacency List
Adjacency Matrix - Graph is represented by a table, each row and column in a table represents a node, The item in the cell [row, column] indicates a connection (in an unweighted graph, this can be a 1). If the graph is undirected the matrix will be symmetric, if it is weighted the values in the cells represent the weights.
- Advantages - Easier to work with and adding edges is simple
- Disadvantages - A sparse graph will leave cells empty wasting memory space
1.4.2 T6 - Representing Graphs (Adjacency List)
Adjacency List - List of nodes is created and each node points to a list of adjacent (connected) nodes
Weighted graphs can be represented as a dictionary of dictionaries, each key = node, value = dictionary of adjacent edges and weights). Example: AdjacencyList = {A: {C:7, D:13}, B:{C:4, D:5}}
1.4.2 T6 - Graph Traversal + Applications
There are 2 methods for traversing a graph:
Depth First - Follow each connection all the way to the end before going to the next one (often left first) then backtrack
Breadth first – Check each neighbouring connection before moving to the next level
Graphs are used in: Networks, states in a finite state machine, social networks, web pages, chemistry, project managment, maps
1.4.2 T7 - Graphs
Tree is another common ADT that has a root, branches and leaves, the difference in computer science is that the root is on the top
A tree is also a connected, undirected graph with no cycles (unique path from one node to any other node)
In a rooted tree a node is identified as the root that every node apart from it has one unique parent
- Node - Contains tree data
- Edge - connects 2 nodes
- Root - Only node with no incoming edges
- Child - Set of nodes that have incoming edges from the same node
- Parent - Node at the top of connecting edges
- Subtree - Set of Nodes with a parent and descendants (can also be a leaf)
- Leaf - Node with no children (at bottom)
1.4.2 T7 - Binary Trees
Binary Tree - Rooted tree where each node has a maximum of 2 children
Each node consists of: Data, Left pointer to a child, Right pointer to a child
Bulding a binary tree:
- Nodes are added in order given, first item is the root
- Starting at the root each time, if item is less than root it is added to the left, otherwise it goes to the right, this same rule applies at the root of each sub-tree
1.4.2 T7 - Trees as an array + Traversal
Trees can also be represented as an array with numbers in the left and right columns indicating the index position of the array, and a -1 indicating that there is no sub-tree
Traversal Methods:
- Pre-order – Visit root then traverse the left sub-tree, then the right sub-tree, Place dot before letter
- In-order – Traverse the left sub-tree, then visit the root, then traverse the right sub-tree, Place dot under letter
- Post-order – Traverse the left sub-tree, then traverse the right sub-tree, then visit the root, Place dot after letter
1.4.2 T7 - Tree Traversal Algorithms
In-Order - Traversing the nodes in sequential order, alphabetic or numeric, goes down left tree until -1 reached, back up up to previous node, checking right tree until at the top, then right tree traversed
Pre-Order - Used to produce Pre-fix/Polish Notation for a mathematical expression so the expression - +12 15 4 is the pre-order form of the expression 12 + 15 -4
Post-Order - Used to produce post-fix or Reverse Polish notation for an expression. e.g. expression 12 15 + 4 - is the post order form of the expression 12 + 15 - 4
Algorithm:
- Procedure traverse(p)
- print (tree[p].data) – Pre-Order Only
- if tree[p].left <> -1 then
- traverse(tree[p].left)
- endif
- print (tree[p].data) – In-Order Only
- if tree[p].right <> -1 then
- traverse(tree[p].right)
- endif
- print (tree[p].data) – Post-Order Only
- endprocedure
Comments
No comments have yet been made