Advent of code 2023 - Day 1: Trebuchet?!
2023-12-10
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 1 - see https:adventofcode.com/2023/day/1
Part 1 #
Our first task is the following:
The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
For example:
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.
Consider your entire calibration document. What is the sum of all of the calibration values?
Lets start jupyter in our shell to start coding!
conda activate tf
jupyter lab --no-browser --port=8888
First, load the test document
import pandas as pd
import re
txt = pd.read_table('data/2023-12-01-1-aoc.txt', names=['code'])
txt
code
0 jjfvnnlfivejj1
1 6fourfour
2 ninevbmltwo69
3 pcg91vqrfpxxzzzoneightzt
4 jpprthxgjfive3one1qckhrptpqdc
.. ...
995 583sevenhjxlqzjgbzxhkcl5
996 81s
997 2four3threesxxvlfqfive4
998 nine6eightsevenzx9twoxc
999 hmbfjdfnp989mfivefiverpzrjs
[1000 rows x 1 columns]
Second, extract the digits. I had to wrap my head around regex matching in
python first, because I first tried pandas.extract
(which only extracts the
first match), then pandas.extractall
(which extracts all matches but puts them
into a multiindex which makes things more difficult in this case). So I settled
for the re.findall
version, which returns a list. To concatenate the elements
in the list, we take use the join
function.
txt['digits'] = txt.loc[:, 'code'].apply(
lambda x: ''.join(re.findall(r'(\d+)', x)))
txt
code digits
0 jjfvnnlfivejj1 1
1 6fourfour 6
2 ninevbmltwo69 69
3 pcg91vqrfpxxzzzoneightzt 91
4 jpprthxgjfive3one1qckhrptpqdc 31
.. ... ...
995 583sevenhjxlqzjgbzxhkcl5 5835
996 81s 81
997 2four3threesxxvlfqfive4 234
998 nine6eightsevenzx9twoxc 69
999 hmbfjdfnp989mfivefiverpzrjs 989
[1000 rows x 2 columns]
Next, combine the first and the last digit and convert the result from string to integer
txt['calibration'] = txt.loc[:, 'digits'].apply(
lambda x: int(x[0] + x[-1]))
txt
code digits calibration
0 jjfvnnlfivejj1 1 11
1 6fourfour 6 66
2 ninevbmltwo69 69 69
3 pcg91vqrfpxxzzzoneightzt 91 91
4 jpprthxgjfive3one1qckhrptpqdc 31 31
.. ... ... ...
995 583sevenhjxlqzjgbzxhkcl5 5835 55
996 81s 81 81
997 2four3threesxxvlfqfive4 234 24
998 nine6eightsevenzx9twoxc 69 69
999 hmbfjdfnp989mfivefiverpzrjs 989 99
[1000 rows x 3 columns]
Lastly, get the sum of our calibration numbers
txt.loc[:, 'calibration'].sum()
56465
Part 2 #
Now follows part two:
Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.
Equipped with this new information, you now need to find the real first and last digit on each line. For example:
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.
What is the sum of all of the calibration values?
Okay, let’s see if we can update the pattern matching. To deal with potential
overlapping values like oneight
which contains one
as well as eight
, I
used the regex positive lookahead ?=
as described here. Because this enables
capturing overlapping values, I used \d
(one digit) instead of \d+
(one or
more digits), otherwise digits might double. Afterwards, just replace the
spelled out digits with their numerical value.
# for i, r in enumerate(txt.loc[:, 'code']):
# matches = re.findall(
# r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', r)
# result = ''.join([match for match in matches])
# result = result.replace('one', '1').replace('two', '2').replace(
# 'three', '3').replace('four', '4').replace('five', '5').replace(
# 'six', '6').replace('seven', '7').replace('eight', '8').replace(
# 'nine', '9')
# txt.loc[i, 'digits2'] = result
# txt
# a very nice alternative suggested by Tomalak:
digits = '\d one two three four five six seven eight nine'.split()
txt['digits2'] = txt.loc[:, 'code'].apply(lambda v: ''.join(
str(digits.index(m)) if m in digits else m
for m in re.findall(rf'(?=({"|".join(digits)}))', v)
))
txt
code digits calibration digits2
0 jjfvnnlfivejj1 1 11 51
1 6fourfour 6 66 644
2 ninevbmltwo69 69 69 9269
3 pcg91vqrfpxxzzzoneightzt 91 91 9118
4 jpprthxgjfive3one1qckhrptpqdc 31 31 5311
.. ... ... ... ...
995 583sevenhjxlqzjgbzxhkcl5 5835 55 58375
996 81s 81 81 81
997 2four3threesxxvlfqfive4 234 24 243354
998 nine6eightsevenzx9twoxc 69 69 968792
999 hmbfjdfnp989mfivefiverpzrjs 989 99 98955
[1000 rows x 4 columns]
Now, construct the calibration value as before…
txt['calibration2'] = txt.loc[:, 'digits2'].apply(lambda x: int(x[0] + x[-1]))
txt
code digits calibration digits2 calibration2
0 jjfvnnlfivejj1 1 11 51 51
1 6fourfour 6 66 644 64
2 ninevbmltwo69 69 69 9269 99
3 pcg91vqrfpxxzzzoneightzt 91 91 9118 98
4 jpprthxgjfive3one1qckhrptpqdc 31 31 5311 51
.. ... ... ... ... ...
995 583sevenhjxlqzjgbzxhkcl5 5835 55 58375 55
996 81s 81 81 81 81
997 2four3threesxxvlfqfive4 234 24 243354 24
998 nine6eightsevenzx9twoxc 69 69 968792 92
999 hmbfjdfnp989mfivefiverpzrjs 989 99 98955 95
[1000 rows x 5 columns]
… and get the correct sum!
txt.loc[:, 'calibration2'].sum()
55902