Advent of code 2023 - Day 1: Trebuchet?!

2023-12-10

Python

This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 1 - see https:adventofcode.com/2023/day/1

Part 1 #

Our first task is the following:

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

My input file: 2023-12-01-1-aoc.txt

Lets start jupyter in our shell to start coding!

conda activate tf
jupyter lab --no-browser --port=8888

First, load the test document

import pandas as pd
import re

txt = pd.read_table('data/2023-12-01-1-aoc.txt', names=['code'])
txt

	code
0	jjfvnnlfivejj1
1	6fourfour
2	ninevbmltwo69
3	pcg91vqrfpxxzzzoneightzt
4	jpprthxgjfive3one1qckhrptpqdc
…	…
995	583sevenhjxlqzjgbzxhkcl5
996	81s
997	2four3threesxxvlfqfive4
998	nine6eightsevenzx9twoxc
999	hmbfjdfnp989mfivefiverpzrjs

1000 rows × 1 columns

Second, extract the digits. I had to wrap my head around regex matching in python first, because I first tried pandas.extract (which only extracts the first match), then pandas.extractall (which extracts all matches but puts them into a multiindex which makes things more difficult in this case). So I settled for the re.findall version, which returns a list. To concatenate the elements in the list, we take use the join function.

txt['digits'] = txt.loc[:, 'code'].apply(
    lambda x: ''.join(re.findall(r'(\d+)', x)))
txt

	code	digits
0	jjfvnnlfivejj1	1
1	6fourfour	6
2	ninevbmltwo69	69
3	pcg91vqrfpxxzzzoneightzt	91
4	jpprthxgjfive3one1qckhrptpqdc	31
…	…	…
995	583sevenhjxlqzjgbzxhkcl5	5835
996	81s	81
997	2four3threesxxvlfqfive4	234
998	nine6eightsevenzx9twoxc	69
999	hmbfjdfnp989mfivefiverpzrjs	989

1000 rows × 2 columns

Next, combine the first and the last digit and convert the result from string to integer

txt['calibration'] = txt.loc[:, 'digits'].apply(
    lambda x: int(x[0] + x[-1]))
txt

	code	digits	calibration
0	jjfvnnlfivejj1	1	11
1	6fourfour	6	66
2	ninevbmltwo69	69	69
3	pcg91vqrfpxxzzzoneightzt	91	91
4	jpprthxgjfive3one1qckhrptpqdc	31	31
…	…	…	…
995	583sevenhjxlqzjgbzxhkcl5	5835	55
996	81s	81	81
997	2four3threesxxvlfqfive4	234	24
998	nine6eightsevenzx9twoxc	69	69
999	hmbfjdfnp989mfivefiverpzrjs	989	99

1000 rows × 3 columns

Lastly, get the sum of our calibration numbers

txt.loc[:, 'calibration'].sum()

Part 2 #

Now follows part two:

Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.

What is the sum of all of the calibration values?

Okay, let’s see if we can update the pattern matching. To deal with potential overlapping values like oneight which contains one as well as eight, I used the regex positive lookahead ?= as described here. Because this enables capturing overlapping values, I used \d (one digit) instead of \d+ (one or more digits), otherwise digits might double. Afterwards, just replace the spelled out digits with their numerical value.

# for i, r in enumerate(txt.loc[:, 'code']):
#     matches = re.findall(
#         r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', r)
#     result = ''.join([match for match in matches])
#     result = result.replace('one', '1').replace('two', '2').replace(
#         'three', '3').replace('four', '4').replace('five', '5').replace(
#         'six', '6').replace('seven', '7').replace('eight', '8').replace(
#         'nine', '9')
#     txt.loc[i, 'digits2'] = result
# txt

# a very nice alternative suggested by Tomalak:
digits = '\d one two three four five six seven eight nine'.split()


txt['digits2'] = txt.loc[:, 'code'].apply(lambda v: ''.join(
    str(digits.index(m)) if m in digits else m
    for m in re.findall(rf'(?=({"|".join(digits)}))', v)
))
txt

	code	digits	calibration	digits2
0	jjfvnnlfivejj1	1	11	51
1	6fourfour	6	66	644
2	ninevbmltwo69	69	69	9269
3	pcg91vqrfpxxzzzoneightzt	91	91	9118
4	jpprthxgjfive3one1qckhrptpqdc	31	31	5311
…	…	…	…	…
995	583sevenhjxlqzjgbzxhkcl5	5835	55	58375
996	81s	81	81	81
997	2four3threesxxvlfqfive4	234	24	243354
998	nine6eightsevenzx9twoxc	69	69	968792
999	hmbfjdfnp989mfivefiverpzrjs	989	99	98955

1000 rows × 4 columns

Now, construct the calibration value as before…

txt['calibration2'] = txt.loc[:, 'digits2'].apply(lambda x: int(x[0] + x[-1]))
txt

	code	digits	calibration	digits2	calibration2
0	jjfvnnlfivejj1	1	11	51	51
1	6fourfour	6	66	644	64
2	ninevbmltwo69	69	69	9269	99
3	pcg91vqrfpxxzzzoneightzt	91	91	9118	98
4	jpprthxgjfive3one1qckhrptpqdc	31	31	5311	51
…	…	…	…	…	…
995	583sevenhjxlqzjgbzxhkcl5	5835	55	58375	55
996	81s	81	81	81	81
997	2four3threesxxvlfqfive4	234	24	243354	24
998	nine6eightsevenzx9twoxc	69	69	968792	92
999	hmbfjdfnp989mfivefiverpzrjs	989	99	98955	95

1000 rows × 5 columns

… and get the correct sum!

txt.loc[:, 'calibration2'].sum()