[RA] Ch24 Heaps and Priority Queues

Reading Assignment: All of Programming Chapter 24 Heaps and Priority Queues

A priority queue is queue where each item has an associated priority, and the next item returned from the queue is the one with the highest priority.

Heap Concepts

Conceptually, a heap is a complete binary tree which obeys the heap ordering rule. Recall from Section 22.1.1 that a complete binary tree is one in which every level, except possibly the last one has as many nodes as possible, and that the last level is filled in from left to right. There are two ways we can define the heap ordering rule, based on whether we want efficient access to the largest element or the smallest element.

In a max-heap, the heap ordering rule says that every node is larger than its children. Similarly, in a min-heap, the heap ordering rule says that every node is smaller than its children.

Using a heap to implement a priority queue, we would want to order the heap by priority. We can then easily implement the peek operation by just looking at the root node-it will always be the highest priority item in the heap. However, we must still be able to implement enqueue or dequeue efficiently, while still maintaining the invariants of the heap.

Insertion

To enqueue an item, we must insert it into the heap. We insert by first placing the new item at the next available place in the tree(the leftmost open slot on the bottom level, or the first slot of a new level if the bottom level is full). Placing the new item in this location ensure that the tree remains complete(which is one of the rules of the heap), but may cause the heap ordering rules to be violated. We then fix the heap ordering by “bubbling up” the node. We compare the node to its parent, and if the heap ordering is violated, we swap the two. We then repeat the process with the node in its new position and its new parent. This process continues until either the node is correctly ordered with respect to its parent(which ensures that the entire heap is correctly ordered), or the node reaches the root.

Deletion

If we want to remove the root element from a heap, we can do so efficiently. The first step for removal is to swap the position of the root with the rightmost element on the last row. Now we can remove the element that was the root while still maintaining the completeness of the tree. However, we have violated the heap ordering rule by swapping an element from the bottom of the heap to the top. We must therefore repair the heap ordering by bubbling that node down the heap until it is in a correct position.

To bubble an element down, we compare it to its two children, and determine which is largest or smallest. That element should be the parent of the other two, so if it is not, the current parent must be swapped with it, and the bubble down process continues from that point. If the elements are already correctly ordered, then the heap is correctly ordered, so no other steps are required. Note that sometimes when we want to compare a node to its children, those children might not exist. In such a case, the missing children do not participate in the comparison. If a node has only one child, it is compared against that child. If it has no child, then there is nothing else to do.

Heap Array Implementation

Even though heaps are conceptually trees, they are actually implemented in arrays. We can use one of two indexing schemes, based on whether or not we want to store the root of the heap at index 0 or index 1 or the array.

While storing the root at index 0 seems like a natural choice in languages that use 0-based array indexing, we may want to instead use index 0 for a sentinel. A sentinel is a special item that is not actually part of the heap’s data, which is ordered so that it stops the bubble up process without a special case. In a min-heap, the sentinel node would be the smallest possible item of a given type, while it would be the largest possible item in a max-heap. Use of a sentinel in index 0 heap means that you do not need to explicitly check if the inserted node has become the root, as you will then compare it with the sentinel and find it always correctly ordered.

	Root at 0	Root at 1
Parent	$(i - 1)/2$	$i /2$
Left Child	$2 * i + 1$	$2 * i$
Right Child	$2 * i + 2$	$2 * i + 1$

int minIndex = right;
if (data[left] < data[right]) {
  minIndex = left;
}
if (data[minIndex] < data[current]){
  //swap with minIndex then recurse
}

STL Priority Queue

http://www.cplusplus.com/reference/queue/priority_queue/

For the third parameter of priority_queue, which is Compare, it allows us to specify the ordering we want to use in the priority queue. The default is less which just uses the < operator on whatever types it is operating on. However, we might want a different ordering for a variety of reasons.

First, STL’s priority queue always uses a max-heap(that is, when we peek or pop from it, we always get the largest element). However, we might want to get the smallest element instead. We can still use TIL’s priority queue, we just have to reverse the ordering. That is, we could give it std::grater for Compare:

1	std::priority_queue<int, vector<int>, std::greater> myPq;

Second, we might just want a different ordering than < give. We can write our own comparison class(which needs to override the operator(), taking in two const references to the types to compare, and returning a bool).

Priority Queue Use: Compression

One use of priority queues is Huffman Coding, which is a compression algorithm. A compression algorithm takes input data, and re-encodes it in a particular way in hopes of reducing how many bytes are required to represent the data. Reducing the number of bytes needed to encode the data is helpful in terms of making it take less storage space, or requiring less time to transmit it across a network.

Huffman Coding gives an algorithm to find the optimal encoding where no symbol’s encoding is a prefix of another’s. The algorithm centers around building a binary tree whose leaf nodes correspond to input symbols. Once the tree is built, we can determine the encoding for each symbol by examining the path from the root of the tree to the leaf node with that symbol.