Day 11
Today
- Recursion vs Iteration in-class Design Challenge in groups
- MP3 consulting time
For Next time
- Please complete MP2 reflection and teaming survey
- Next class will be the midpoint of MP3. Take a look at the suggested milestones to make sure you’re on track. While there’s not an official midpoint submission due, we strongly recommend checking in with course staff about your ideas, plans, and questions.
Recursion vs Iteration Design Challenge in groups
Today we pilot an interaction style for class that the teaching team has been workshopping amongst ourselves. We aim to provide opportunities to go hands-on work as a class that allow us to:
- develop more skill at walking through code for small challenges to increase understanding of core concepts.
- define what constitutes a reasonable outcome with respect to a metric such as speed or memory required (computational overhead).
- practice measuring outcomes of decisions to justify one approach over another for different contexts.
- discuss options for completing a task, as one might in an internship interview session.
- internalize code that may have things you have not seen before, and incorporate your own changes.
A Context for today’s challenge
The files that you will download could be used for numerous things, but one context that might motivate our work is coming up with ways to search for an present when it has been the hottest in Boston. (As we come end an unusually hot February.)
In the following files, we are seeding data randomly instead of using actual values that we read in (or scrape from the Web).
Setting up working groups
- The whole class counts off 1 to 4.
- Half of the class (1s and 3s) moves to a set of chairs close to the west whiteboard.
- The other half of the class (2s and 4s) moves to chairs near the east whiteboard.
- An instructor sitting with the east group will explain Task E, then introduce two different ways of carrying out the task in Python.
- A different instructor sitting with the west group will explain Task W, then introduce two different ways of carrying out the task in Python.
Today’s tasks
Download the zip file recursion_vs_iteration.zip from Canvas’ files section. Unzip to create the 4 files you’ll need.
-
East 1’s code to find the maximum element in a list using recursion. use the file list_max_recursive.py fill in the code. to run the first time use time python list_max_recursive.py -s 42 10 on subsequent runs, you can make the “10” larger and try to find out how long this code takes to execute for larger values
-
East 3’s code to find the maximum element in a list using iteration. use the file list_max_iterative.py. fill in the code. to run the file time python3 list_max_iterative.py -s42 10 then try different min max values to see how long the code takes to execute (working up smaller so that runs don’t take forever)
-
West 2’s code to count nodes using recursion. use the file count_nodes_recursive.py fill in the code. to run the first time use time python count_nodes_recursive.py -s 42 3 5 on subsequent runs, you can make the “10” larger and try to find out how long this code takes to execute for larger values
-
West 4’s code to count nodes using iteration. use the file count_nodes_iterative.py fill in the code. to run the file time python3 count_nodes_iterative.py -s42 3 5 then try different min max values to see how long the code takes to execute (working up smaller so that runs don’t take forever)
Working through tasks for 30 minutes
- Students in each group will spend 10 minutes implementing the approach.
-
spend about 4 minutes testing with different values.
- groups 1 and 3 will split in half and send one half to the other team at the same board.
- try each others test cases and discuss.
Quick reflection
The students give us verbal feedback to help us improve the design decision challenge process.
Bad Kangaroos
In this Reading Journal we looked at a very common error that can be very frustrating to debug:
class Kangaroo:
def __init__(self, name, contents=[]): # Warning: mutable default argument bug!
self.pouch_contents = contents
The core “gotcha” here is that default argument values are evaluated once when the function is defined, not every time it is called.
In this case, it means that the new list is created one time, and each new Kangaroo
instance (that did not provide contents to override the default) has a reference to the same list.
This would be true of any mutable object as a default argument, not just list
(e.g. dict
, or your custom classes). Here is another implementation, showing a Python idiom commonly used to avoid this bug:
class Kangaroo:
def __init__(self, name, contents=None):
if contents == None:
contents = []
self.pouch_contents = contents
Here we’ve used None
(immutable) as a stand-in for the case when no contents have been passed and we should create an empty list. The new list is created inside the body of the __init__
method when it is called, so each Kangaroo
instance gets its own empty list vs a shared reference.
General advice: avoid using mutable objects as default values of function arguments.
Going beyond on the reading journal agenda: sorting objects
For this Reading Journal, you had to implement a print_agenda
method that displays Event
objects in chronological order.
The key challenge here was sorting the Event
s: Python provides many
techniques for sorting,
but they don’t know how to sensibly sort your custom objects without some help from you.
Let’s imagine that you have a list of Event
s.
The first design choice to make is whether you should maintain the list in sorted order at all times, or sort it only when required (e.g. whenever print_agenda
is called). There are pros and cons to each style, depending on application:
Always sorted
In this case the burden falls on your add_event
method to ensure that you insert new Event
s in chronological order (perhaps using your is_after
method). It takes a little more time to add an Event
, but you afterwards you can rely on the list being sorted at no extra cost.
Sort on demand
With this approach you have a simpler insertion process, but you must sort the list every time you need it to be in order. This may be more expensive depending on whether adding new Event
s or printing the Agenda
is more common.
Python knows how to sort built-in types (e.g. integers), but not your custom classes. Let’s say you have a Time
class:
class Time:
def __init__(self, hour=0, minute=0, second=0):
self.hour = hour
self.minute = minute
self.second = second
def to_int(self):
return 60*(60*self.hour + self.minute) + self.second
Sorting key function
Python’s list.sort
method and sorted
function both take an optional key
parameter, which expects a function that takes the object to be sorted and returns a key used to sort it. We could write such a function for Event
s:
def event_start(event):
"""Given Event object containing a start Time, return time as sortable key"""
return event.start.to_int()
and then use it to sort a list of Events:
for event in sorted(some_events, key=event_start):
... do something
Anonymous lambda functions
If you just want to sort Event
s, it can be a bit clunky to write the event_start
function solely to pass as a key
parameter.
You’ll often see people use an anonymous lambda
function (a function defined on the fly that is not bound to a name) for this purpose. Equivalently:
for event in sorted(some_events, key=lambda e: e.start.to_int())
... do something
Implement comparison operator(s)
If you implement the less-than __lt__ method for your classes (so that time1 < time2
works), then Python now knows how to sort your custom objects.
# inside class Time:
def __lt__(self, other):
return self.to_int() < other.to_int()
Agenda: __str__ vs __repr__
You might have observed the following behavior when trying to print your Time class using the __str__
method:
>>> times = [Time(6,20), Time(2,45), Time(4)]
>>> print(times[0])
06:20:00
>>> print(times)
[<__main__.Time object at 0x105b1d0b8>, <__main__.Time object at 0x105b1d0f0>, <__main__.Time object at 0x105b1d128>]
This arises because Python has two ways of converting an object into a string that are used differently. Read about each: __str__ and __repr__.
When printing collections like lists, the __repr__
method is used for the contained objects. As an example of why, let’s imagine we have a TicTacToe
class with a pretty __str__
method that gives:
X| |O
-----
|X|O
-----
O|X|
Very nice way to visualize the state of one Tic Tac Toe board - who needs a Graphical User Interface?
But, if we used that same representation for the __repr__
and printed a list of two TicTacToe
objects, it would be a mess!:
[X| |O
-----
|X|O
-----
O|X| , X| |O
-----
|X|O
-----
O|X| ]
vs
[<__main__.TicTacToe object at 0x105b1d160>, <__main__.TicTacToe object at 0x105b1d168>]
When writing __repr__
methods, a great goal is for the result to be a valid Python expression that could recreate the object (e.g. Time(6,20,0)
not 06:20:00
).
In Python speak, ideally eval(repr(obj)) == obj
.
If a class does not define a __str__
method, the __repr__
method is used instead (if it exists).
If neither exists, Python defaults to <ClassName object at 0x(memory address)>
.
In short:
- output of
__str__
should be “nicely printable- output of
__repr__
should be “information-rich and unambiguous”, and if possible should be valid Python to recreate the object
Text Mining consulting
By this point in MP3 we hope that you’ve thought of an interesting topic to investigate and tried out a couple of relevant text data sources (note that these don’t have to be the ones that you stick with for the project).
One great next step would be to generate an extensive list of interesting questions you might explore via your data set. Using this list, you can prioritize different analysis techniques that can help you address your questions. It’s helpful to have a longer list of topics than you’ll actually pursue, since as you begin to investigate you’re likely to find that some may not pan out and others may have disappointing results.
We strongly suggest that you consult with course staff to share your project idea, potential data sources, list of research questions, and analysis strategies. We can help by suggesting related alternatives you may not have considered and by estimating the relative difficulty of potential approaches.