Advent of Code 2017: Days 1 - 5

Posted on 05 December 2017 in Technology

This post is part of the series, Advent of Code 2017, where I work my way through (part of) the 2017 Advent of Code challenges in order to re-learn some of the basics of Python and reflect on some simple concepts and functionality. All my challenge code is available in cdubz/advent-of-code-2017 on GitHub.

Day 1: zip()

Challenge: Day 1: Inverse Captcha
Solution: advent-of-code-2017/01

This challenge taught me about Python's built-in zip function. The basic goal is to compare values in a list with specific positional relationships to other values (e.g. next to, X steps from, etc.) in the same list. zip assists with this task by combining multiple lists in to tuples. My initial solution used a construct similar to:

digits = [1, 2, 3, 4, 5]
total = 0
for a, b in zip(digits[::1], digits[1::1]):
    if a == b:
        total += a

This code uses zip and Python's list indexing to evaluate each element in the digits list against the next element in the list by creating and combining two lists: one starting from the first digit in digits and one starting from the second digital in digits. Here are what the different parts of this process look like:

>>> digits = [1, 2, 3, 4, 5]
>>> digits[::1]
[1, 2, 3, 4, 5]
>>> digits[1::1]
[2, 3, 4, 5]
>>> list(zip(digits[::1], digits[1::1]))
[(1, 2), (2, 3), (3, 4), (4, 5)]

Day 2: itertools.combinations()

Challenge: Day 2: Corruption Checksum
Solution: advent-of-code-2017/02

I have been a bit slow to pick up on Python's powerful iteration tools (itertools) so it took me a while (and a bit of searching, of course) to come up with a solution for this challenge. The basic goal is to evaluate all possible combinations of integers in a list. While I had some vague understanding of an efficient way to do this, my initial solution was extremely basic and inefficient:

digits = [2, 4, 5, 6, 9]
total = 0
for i in digits:
    for j in digits:
        if i != j and i % j == 0:
            total += i/j

This code has two major issues:

1. It compares every number against every other number, regardless of whether or not those numbers have already been compared. In this case, 2 is compared to 4 and 4 is compared to 2 - two comparisons where only one is needed. This issue will compound and cause greater inefficiencies with a larger list.

1. It assumes that no digit will be repeated. With, for example, digits = [2, 2, 3], the two initial values would never be evaluated because they are equal. While this wouldn't actually cause any problems with the original list provided, it is a pretty silly way to do things.

itertools.combinations can make this process more efficient without creating particularly complex code. This method takes a list and returns all possible unique combinations based on element position instead of value - meaning it solves both of the above issues. A refactoring of the above code using this method looks like this:

import itertools

digits = [2, 4, 5, 6, 9]
total = 0
for i, j in itertools.combinations(digits, 2):
    if i % j == 0:
        total += i/j
    if j % i == 0:
        total += j/i

Now, instead of a nested loop over the same list, there is only a single loop that evaluates list elements based on position instead of value.

This was also a valuable reminder about big O notation. The original code above is O(N^2), meaning it grows in complexity exponentially as the size of the list increases. The revised code, on the other hand, is O(N), so its complexity is only linearly related to the size of the list.

Day 3: Manhattan Distance

Challenge: Day 3: Spiral Memory
Solution: advent-of-code-2017/03

When I originally started this challenge I did not realize that Manhattan Distance was a link in the prompt. I had never heard this term before and a StackExchange question, How to calculate the Euclidean and Manhattan distance?, clued me in to the process for calculating the distance in a matrix. I thought about involving numpy to make things easier on myself, but decided to try solving it first without any external libraries.

The solution to this problem is very simple once the (x,y) coordinates of the number being evaluated are known (because it is being compared to 0,0). To do this I basically just brute-forced my way through creating a dictionary full of dictionaries, one for each column of the matrix, by checking for existing data in the N, S, E, and W directions of the "2D" array. With the matrix built, the solution is calculated by using the abs(x2 - x1) - abs(y2 - y1) formula.

Completing this challenge was mostly just a reminder that I need to dive in to numpy eventually. My solution is fairly clean, but horribly inefficient and I'm sure it could be improved upon greatly.

Day 4: set()

Challenge: Day 4: High-Entropy Passphrases
Solution: advent-of-code-2017/04

This challenge was another reminder about a simple and useful Python built-in type: set(). set() can be used to quickly determine if a list has duplicate items by comparing the length of the original list and its set because a set is a collection of distinct items:

>>> digits = [1, 1, 2, 3, 4, 5]
>>> digits
[1, 1, 2, 3, 4, 5]
>>> set(digits)
{1, 2, 3, 4, 5}
>>> len(digits) == len(set(digits))
False

Note that set is a class separate from list and dict. I originally attempted to check for equality between a list and set but this did not work, even with elements in the same order, because the two classes are not the same:

>>> list1 = [1, 2, 3, 4]
>>> list2 = [1, 2, 3, 4]
>>> list1 == list2
True
>>> list1
>>> [1, 2, 3, 4]
>>> set1 = set(list1)
>>> set1
{1, 2, 3, 4}
>>> list1 == set1
False

Day 5: Profiling

Challenge: Day 5: A Maze of Twisty Trampolines, All Alike
Solution: advent-of-code-2017/05

The challenge of jumping around a dict and updating its values was fairly simple. The execution of the script for part one took no time at all, but part two complicated the task and running the script ends up taking 15 - 20 seconds or more (on my very old laptop).

Curious to see if there was anything I could do to improve this, I first turned to Python's built-in profiler cProfile by running the suggested python -m cProfile <script> command:

python -m cProfile maze_part_two.py
26395586
     6 function calls in 18.451 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1   18.450   18.450   18.451   18.451 maze_part_two.py:11(<module>)
    1    0.001    0.001    0.001    0.001 maze_part_two.py:12(<dictcomp>)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    1    0.000    0.000    0.000    0.000 {method 'read' of 'file' objects}
    1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    1    0.000    0.000    0.000    0.000 {open}

This didn't tell me much of anything because cProfile works at the function level. Since my solution is a simple script that does not use any functions, all this really says is: "the script took 18.451 seconds to run". This would definitely be a very useful tool to quickly identify problem areas in a larger code base, but is not helpful here.

To find more detail, I turned to the line_profiler module. As the name implies, line_profiler will provide line-by-line profiling for a Python script. I still had to refactor the code slightly to put it in a function that would allow me to use line_profiler's decorator:

@profile
def run():
    [...]

if __name__ == '__main__':
    run()

The @profile decorator is what tells line_profiler to take a closer look at the function. The evaluation is a two step process - the kernprof command generates an lprof file that line_profiler can then translate in to useful output:

kernprof -l maze_part_two.py
26395586
Wrote profile results to maze_part_two.py.lprof

This took about three minutes to run because of the detailed evaluation. Next, line_profiler can translate the lprof file using python's -m flag:

python -m line_profiler maze_part_two.py.lprof
Timer unit: 1e-06 s

Total time: 191.125 s
File: maze_part_two.py
Function: run at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
                                         @profile
                                         def run():
       1          886    886.0      0.0      maze = [...]
       1            2      2.0      0.0      pos = 0
       1            1      1.0      0.0      steps = 0
       1            1      1.0      0.0      while True:
26395587     24430262      0.9     12.8          try:
26395587     29073318      1.1     15.2              move = maze[pos]
       1            3      3.0      0.0          except KeyError:
       1           72     72.0      0.0              print(steps)
       1            3      3.0      0.0              break
26395586     24881530      0.9     13.0          last_pos = pos
26395586     26706364      1.0     14.0          pos = last_pos + move
26395586     26374692      1.0     13.8          if move >= 3:
13057988     16389704      1.3      8.6              maze[last_pos] -= 1
                                                 else:
13337598     15591510      1.2      8.2              maze[last_pos] += 1
26395586     27676460      1.0     14.5          steps += 1

There we go! Much more information than cProfile provided. The "Hits" column contains the number of times each line was run, "Time" the total time spent on the line (in the timer unit (1e-06s in this case)), and the remaining columns build on that data.