# Advent of Code 2017: Days 1 - 5

Posted on 05 December 2017 in Technology

This post is part of the series, Advent of Code 2017, where I work my way through (part of) the 2017 Advent of Code challenges in order to re-learn some of the basics of Python and reflect on some simple concepts and functionality. All my challenge code is available in cdubz/advent-of-code-2017 on GitHub.

## Day 1: zip()

- Challenge: Day 1: Inverse Captcha
- Solution: advent-of-code-2017/01

This challenge taught me about Python's built-in zip function. The basic
goal is to compare values in a list with specific positional relationships to
other values (e.g. next to, X steps from, etc.) in the same list. `zip`
assists with this task by combining multiple lists in to tuples. My initial
solution used a construct similar to:

0 1 2 3 4 | digits = [1, 2, 3, 4, 5] total = 0 for a, b in zip(digits[::1], digits[1::1]): if a == b: total += a |

This code uses `zip` and Python's list indexing to evaluate each element in
the `digits` list against the next element in the list by creating and
combining two lists: one starting from the first digit in `digits` and one
starting from the second digital in `digits`. Here are what the different
parts of this process look like:

0 1 2 3 4 5 6 | >>> digits = [1, 2, 3, 4, 5] >>> digits[::1] [1, 2, 3, 4, 5] >>> digits[1::1] [2, 3, 4, 5] >>> list(zip(digits[::1], digits[1::1])) [(1, 2), (2, 3), (3, 4), (4, 5)] |

## Day 2: itertools.combinations()

- Challenge: Day 2: Corruption Checksum
- Solution: advent-of-code-2017/02

I have been a bit slow to pick up on Python's powerful iteration tools
(itertools) so it took me a while (and a bit of searching, of course) to come
up with a solution for this challenge. The basic goal is to evaluate all
possible combinations of integers in a list. While I had some vague
understanding of an efficient way to do this, my initial solution was extremely
basic and *inefficient*:

0 1 2 3 4 5 | digits = [2, 4, 5, 6, 9] total = 0 for i in digits: for j in digits: if i != j and i % j == 0: total += i/j |

This code has two major issues:

1. It compares every number against every other number, regardless of whether or not those numbers have already been compared. In this case, 2 is compared to 4 and 4 is compared to 2 - two comparisons where only one is needed. This issue will compound and cause greater inefficiencies with a larger list.

1. It assumes that no digit will be repeated. With, for example,
`digits = [2, 2, 3]`, the two initial values would never be evaluated because
they are equal. While this wouldn't actually cause any problems with the
original list provided, it is a pretty silly way to do things.

itertools.combinations can make this process more efficient without creating
particularly complex code. This method takes a list and returns all possible
*unique* combinations based on element *position* instead of value - meaning it
solves both of the above issues. A refactoring of the above code using this
method looks like this:

0 1 2 3 4 5 6 7 8 | import itertools digits = [2, 4, 5, 6, 9] total = 0 for i, j in itertools.combinations(digits, 2): if i % j == 0: total += i/j if j % i == 0: total += j/i |

Now, instead of a nested loop over the same list, there is only a single loop that evaluates list elements based on position instead of value.

This was also a valuable reminder about big O notation. The original code above is O(N^2), meaning it grows in complexity exponentially as the size of the list increases. The revised code, on the other hand, is O(N), so its complexity is only linearly related to the size of the list.

## Day 3: Manhattan Distance

- Challenge: Day 3: Spiral Memory
- Solution: advent-of-code-2017/03

When I originally started this challenge I did not realize that Manhattan Distance was a link in the prompt. I had never heard this term before and a StackExchange question, How to calculate the Euclidean and Manhattan distance?, clued me in to the process for calculating the distance in a matrix. I thought about involving numpy to make things easier on myself, but decided to try solving it first without any external libraries.

The solution to this problem is very simple once the (x,y) coordinates of the
number being evaluated are known (because it is being compared to 0,0). To do
this I basically just brute-forced my way through creating a dictionary full of
dictionaries, one for each column of the matrix, by checking for existing data
in the N, S, E, and W directions of the "2D" array. With the matrix built, the
solution is calculated by using the `abs(x2 - x1) - abs(y2 - y1)` formula.

Completing this challenge was mostly just a reminder that I need to dive in to numpy eventually. My solution is fairly clean, but horribly inefficient and I'm sure it could be improved upon greatly.

## Day 4: set()

- Challenge: Day 4: High-Entropy Passphrases
- Solution: advent-of-code-2017/04

This challenge was another reminder about a simple and useful Python built-in
type: set(). `set()` can be used to quickly determine if a list has
duplicate items by comparing the length of the original list and its set
because a set is a collection of *distinct* items:

0 1 2 3 4 5 6 | >>> digits = [1, 1, 2, 3, 4, 5] >>> digits [1, 1, 2, 3, 4, 5] >>> set(digits) {1, 2, 3, 4, 5} >>> len(digits) == len(set(digits)) False |

Note that `set` is a class separate from `list` and `dict`. I originally
attempted to check for *equality* between a `list` and `set` but this did
not work, even with elements in the same order, because the two classes are not
the same:

0 1 2 3 4 5 6 7 8 9 10 | >>> list1 = [1, 2, 3, 4] >>> list2 = [1, 2, 3, 4] >>> list1 == list2 True >>> list1 >>> [1, 2, 3, 4] >>> set1 = set(list1) >>> set1 {1, 2, 3, 4} >>> list1 == set1 False |

## Day 5: Profiling

- Challenge: Day 5: A Maze of Twisty Trampolines, All Alike
- Solution: advent-of-code-2017/05

The challenge of jumping around a `dict` and updating its values was fairly
simple. The execution of the script for part one took no time at all, but part
two complicated the task and running the script ends up taking 15 - 20 seconds
or more (on my very old laptop).

Curious to see if there was anything I could do to improve this, I first turned
to Python's built-in profiler cProfile by running the suggested
`python -m cProfile <script>` command:

0 1 2 3 4 5 6 7 8 9 10 11 12 | python -m cProfile maze_part_two.py 26395586 6 function calls in 18.451 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 18.450 18.450 18.451 18.451 maze_part_two.py:11(<module>) 1 0.001 0.001 0.001 0.001 maze_part_two.py:12(<dictcomp>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {method 'read' of 'file' objects} 1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 {open} |

This didn't tell me much of anything because cProfile works at the **function**
level. Since my solution is a simple script that does not use any functions, all
this really says is: "the script took 18.451 seconds to run". This would
definitely be a very useful tool to quickly identify problem areas in a larger
code base, but is not helpful here.

To find more detail, I turned to the line_profiler module. As the name implies, line_profiler will provide line-by-line profiling for a Python script. I still had to refactor the code slightly to put it in a function that would allow me to use line_profiler's decorator:

0 1 2 3 4 5 | @profile def run(): [...] if __name__ == '__main__': run() |

The `@profile` decorator is what tells line_profiler to take a closer look at
the function. The evaluation is a two step process - the `kernprof` command
generates an `lprof` file that line_profiler can then translate in to useful
output:

0 1 2 | kernprof -l maze_part_two.py 26395586 Wrote profile results to maze_part_two.py.lprof |

This took about three minutes to run because of the detailed evaluation. Next,
line_profiler can translate the `lprof` file using python's `-m` flag:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | python -m line_profiler maze_part_two.py.lprof Timer unit: 1e-06 s Total time: 191.125 s File: maze_part_two.py Function: run at line 13 Line # Hits Time Per Hit % Time Line Contents ============================================================== 13 @profile 14 def run(): 15 1 886 886.0 0.0 maze = [...] 16 1 2 2.0 0.0 pos = 0 17 1 1 1.0 0.0 steps = 0 18 1 1 1.0 0.0 while True: 19 26395587 24430262 0.9 12.8 try: 20 26395587 29073318 1.1 15.2 move = maze[pos] 21 1 3 3.0 0.0 except KeyError: 22 1 72 72.0 0.0 print(steps) 23 1 3 3.0 0.0 break 24 26395586 24881530 0.9 13.0 last_pos = pos 25 26395586 26706364 1.0 14.0 pos = last_pos + move 26 26395586 26374692 1.0 13.8 if move >= 3: 27 13057988 16389704 1.3 8.6 maze[last_pos] -= 1 28 else: 29 13337598 15591510 1.2 8.2 maze[last_pos] += 1 30 26395586 27676460 1.0 14.5 steps += 1 |

There we go! Much more information than cProfile provided. The "Hits" column contains the number of times each line was run, "Time" the total time spent on the line (in the timer unit (1e-06s in this case)), and the remaining columns build on that data.