Lab 4: Searching in a Sorted List¶
In Chapter 9 of the Math 341 textbook, you saw an example of finding (up to machine precision) the value of \(\sqrt{2}\). This same algorithm works for approximating other numbers.
Task 1¶
Adapt the code from Chapter 9 of the Math 341 textbook to create a function cube_root(K)
, which finds the cube root of any natural number \(K\), (i.e., \(\sqrt[3]{K}\)). You should only need to change two lines, one to set an appropriate initial value for the list b
, and one to change the number for which the algorithm is searching. Your code should mimic the outputs of the given program precisely.
Searching in a Sorted Space¶
What is this algorithm doing? At step \(n\), we consider the number \(x\) halfway between \(a_{n-1}\) and \(b_{n-1}\), creating two intervals: \([a_{n-1}, x]\) and \([x, b_{n-1}]\). Because \(\sqrt{2}\) is between \(a_{n-1}\) and \(b_{n-1}\), it has to lie in one of these intervals. We compare it to \(x\) to check which one it is in and then toss out the other interval. This “tossing out” is accomplished by setting either \(a_n = x\) or \(b_n = x\) and keeping \(b_n = b_{n-1}\) or \(a_n = a_{n-1}\), respectively. At each step, we are cutting in half the interval in which we are searching. See the figure below.

Why does this algorithm work? First of all, this algorithm only finishes in a finite number of steps because floats actually have limited precision. Combining this fact with an estimate for the upper and lower bound (\(a_0\) and \(b_0\)), we are really only searching within a finite set. If we had an arbitrary level of precision, we could obtain better and better approximations to \(\sqrt{2}\), but the code would never finish running. Second, our algorithm relies on our ability to compare because we were searching in a sorted space. The number system \(\mathbb R\) is an ordered field, so checking something like \(x^2 < 2\) is possible.
A sorted list has both of these qualities. For example, the list l = [0,1,2,5,9,22,57]
is both finite and ordered. How could we find the index of a given number n
in this list? As a first approach, we could start at the beginning and check each number to see if it matches n
. This works great if we are searching for 1
, but not so great if we are searching for 57
. Imagine if we had a list with millions of entries. Checking every entry would take a long time!
We will use a binary search. This algorithm is exactly like the algorithm from Task 1. It is called a binary search because each time we cut our search space in half. Here is the general idea of the algorithm. Suppose we have a sorted list l
(sorted smallest to largest) and we want to find the index of the element n
in this list. We need three other variables. For this example we call them min_index
, max_index
, and curr_index
. These are exactly analogous to \(a_{n-1}\), \(b_{n-1}\), and \(x\).
Binary Search Algorithm
Set
min_index
equal to0
and setmax_index
to belen(l) - 1
. Thencurr_index
is set to be an integer halfway between them.If
l[curr_index] = n
, then we have found it!If
l[curr_index] < n
, updatemin_index
to becurr_index + 1
so that we are now searching in the second half of the list.If
l[curr_index] > n
, updatemax_index
to becurr_index - 1
so that we are now searching in the first half of the list.With our new
min_index
andmax_index
, recalculatecurr_index
to be halfway between them and continue recursively by calling the function again with the new values for the variables.If we never find it, return
-1
.
Task 2¶
With the pseudocode given above, write a recursive binary search function called binary_search(l,n)
that takes as input a sorted list of integers l
and an integer n
to search for and returns an index at which that integer appears. Return -1
if the integer is not in the list. Here are some clarifications:
Write a function
binary_search_helper(l,n,min_index,max_index)
. Then, callbinary_search_helper
from withinbinary_search
with the initial values formin_index
andmax_index
. (It is typically good practice to define helper functions inside the function they are helping)You may assume that the list will be sorted from smallest to largest.
Do not use any built-in functions such as
index
orfind
. Do not iterate over the whole list with afor
loop or use the keywordin
to check ifn
is inl
.
>>> l = [4, 8, 15, 16, 23, 42]
>>> binary_search(l,15)
2
>>> binary_search(l, 22)
-1
Challenge¶
The pseudocode will always return an index at which n
appears, but it may not be the first index. How can you alter your code so that the index returned is always the first index? This would replicate the behavior of Python’s built-in find
function.
Timing¶
When we introduced the binary search, we mentioned that checking every element would take a long time. So, does the binary search actually go faster? If so, how much faster? We need two things to test this: 1) a linear_search
function and 2) a big list of numbers. Running both functions on a big list of numbers will let us explore how well they perform.
Task 3¶
Write a function linear_search(l,n)
that takes as input a sorted list of integers l
and an integer n
and returns the index at which the integer n
first appears. Return -1
if the integer is not in the list. Use a for
or while
loop to search linearly instead of using a binary search. You may assume that the list is sorted, which may help you know when to return -1
. (Note: this problem should be much simpler than problem 2.)
We need an ordered list of integers. Let’s use the prime numbers! The sympy
library has a useful primerange
function that will give us all of the prime numbers in a range. It returns a generator object, so we will use the list
function on it to turn it into a list. Try the following code to generate all primes between 1
and 1000
:
from sympy import primerange
a = list(primerange(1,1000))
from time import time
t = time() # saving the current time
print(binary_search(a, 823)) # running the binary search
print("Time took: ", time() - t) # current time - saved time
Compare the output for binary_search
and linear_search
.
Task 4¶
Run your code on the following examples:
Try
n = 2
,n = 191
,n = 713
,n = 997
, andn = 1000
. Is one algorithm outperforming the other?Increase the length of
a
by makinga
contain all primes with1 <= n <= 10000000
. It may take a few moments to generate. Search forn = 2
,n = 9999991
, andn = 524287
(the7
th Mersenne prime), andn = 1632913
. How are the algorithms comparing? Given an arbitraryn
, which algorithm would you expect to be faster? As the size ofa
grows, which algorithm would you expect to be faster?With the same
a
from (2), search forn = 1234567
,n = 10000000
,n = 1
, andn = 512
. Answer the same questions as in (2).