This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 1 - see https:adventofcode.com/2023/day/1
Part 1 #
Our first task is the following:
The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
For example:
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.
Consider your entire calibration document. What is the sum of all of the calibration values?
My input file: 2023-12-01-1-aoc.txt
Lets start jupyter in our shell to start coding!
conda activate tf
jupyter lab --no-browser --port=8888
First, load the test document
import pandas as pd
import re
txt = pd.read_table('data/2023-12-01-1-aoc.txt', names=['code'])
txt
code | |
---|---|
0 | jjfvnnlfivejj1 |
1 | 6fourfour |
2 | ninevbmltwo69 |
3 | pcg91vqrfpxxzzzoneightzt |
4 | jpprthxgjfive3one1qckhrptpqdc |
… | … |
995 | 583sevenhjxlqzjgbzxhkcl5 |
996 | 81s |
997 | 2four3threesxxvlfqfive4 |
998 | nine6eightsevenzx9twoxc |
999 | hmbfjdfnp989mfivefiverpzrjs |
1000 rows × 1 columns
Second, extract the digits. I had to wrap my head around regex matching in
python first, because I first tried pandas.extract
(which only extracts the
first match), then pandas.extractall
(which extracts all matches but puts them
into a multiindex which makes things more difficult in this case). So I settled
for the re.findall
version, which returns a list. To concatenate the elements
in the list, we take use the join
function.
txt['digits'] = txt.loc[:, 'code'].apply(
lambda x: ''.join(re.findall(r'(\d+)', x)))
txt
code | digits | |
---|---|---|
0 | jjfvnnlfivejj1 | 1 |
1 | 6fourfour | 6 |
2 | ninevbmltwo69 | 69 |
3 | pcg91vqrfpxxzzzoneightzt | 91 |
4 | jpprthxgjfive3one1qckhrptpqdc | 31 |
… | … | … |
995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 |
996 | 81s | 81 |
997 | 2four3threesxxvlfqfive4 | 234 |
998 | nine6eightsevenzx9twoxc | 69 |
999 | hmbfjdfnp989mfivefiverpzrjs | 989 |
1000 rows × 2 columns
Next, combine the first and the last digit and convert the result from string to integer
txt['calibration'] = txt.loc[:, 'digits'].apply(
lambda x: int(x[0] + x[-1]))
txt
code | digits | calibration | |
---|---|---|---|
0 | jjfvnnlfivejj1 | 1 | 11 |
1 | 6fourfour | 6 | 66 |
2 | ninevbmltwo69 | 69 | 69 |
3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 |
4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 |
… | … | … | … |
995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 |
996 | 81s | 81 | 81 |
997 | 2four3threesxxvlfqfive4 | 234 | 24 |
998 | nine6eightsevenzx9twoxc | 69 | 69 |
999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 |
1000 rows × 3 columns
Lastly, get the sum of our calibration numbers
txt.loc[:, 'calibration'].sum()
56465
Part 2 #
Now follows part two:
Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.
Equipped with this new information, you now need to find the real first and last digit on each line. For example:
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.
What is the sum of all of the calibration values?
Okay, let’s see if we can update the pattern matching. To deal with potential
overlapping values like oneight
which contains one
as well as eight
, I
used the regex positive lookahead ?=
as described
here. Because this enables
capturing overlapping values, I used \d
(one digit) instead of \d+
(one or
more digits), otherwise digits might double. Afterwards, just replace the
spelled out digits with their numerical value.
# for i, r in enumerate(txt.loc[:, 'code']):
# matches = re.findall(
# r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', r)
# result = ''.join([match for match in matches])
# result = result.replace('one', '1').replace('two', '2').replace(
# 'three', '3').replace('four', '4').replace('five', '5').replace(
# 'six', '6').replace('seven', '7').replace('eight', '8').replace(
# 'nine', '9')
# txt.loc[i, 'digits2'] = result
# txt
# a very nice alternative suggested by Tomalak:
digits = '\d one two three four five six seven eight nine'.split()
txt['digits2'] = txt.loc[:, 'code'].apply(lambda v: ''.join(
str(digits.index(m)) if m in digits else m
for m in re.findall(rf'(?=({"|".join(digits)}))', v)
))
txt
code | digits | calibration | digits2 | |
---|---|---|---|---|
0 | jjfvnnlfivejj1 | 1 | 11 | 51 |
1 | 6fourfour | 6 | 66 | 644 |
2 | ninevbmltwo69 | 69 | 69 | 9269 |
3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 | 9118 |
4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 | 5311 |
… | … | … | … | … |
995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 | 58375 |
996 | 81s | 81 | 81 | 81 |
997 | 2four3threesxxvlfqfive4 | 234 | 24 | 243354 |
998 | nine6eightsevenzx9twoxc | 69 | 69 | 968792 |
999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 | 98955 |
1000 rows × 4 columns
Now, construct the calibration value as before…
txt['calibration2'] = txt.loc[:, 'digits2'].apply(lambda x: int(x[0] + x[-1]))
txt
code | digits | calibration | digits2 | calibration2 | |
---|---|---|---|---|---|
0 | jjfvnnlfivejj1 | 1 | 11 | 51 | 51 |
1 | 6fourfour | 6 | 66 | 644 | 64 |
2 | ninevbmltwo69 | 69 | 69 | 9269 | 99 |
3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 | 9118 | 98 |
4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 | 5311 | 51 |
… | … | … | … | … | … |
995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 | 58375 | 55 |
996 | 81s | 81 | 81 | 81 | 81 |
997 | 2four3threesxxvlfqfive4 | 234 | 24 | 243354 | 24 |
998 | nine6eightsevenzx9twoxc | 69 | 69 | 968792 | 92 |
999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 | 98955 | 95 |
1000 rows × 5 columns
… and get the correct sum!
txt.loc[:, 'calibration2'].sum()
55902