Advent of code 2023 - Day 1: Trebuchet?!

Advent of code 2023 - Day 1: Trebuchet?!

Table of Contents

This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 1 - see https:adventofcode.com/2023/day/1

Part 1 #

Our first task is the following:

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

My input file: 2023-12-01-1-aoc.txt

Lets start jupyter in our shell to start coding!

conda activate tf
jupyter lab --no-browser --port=8888

First, load the test document

import pandas as pd
import re

txt = pd.read_table('data/2023-12-01-1-aoc.txt', names=['code'])
txt
code
0 jjfvnnlfivejj1
1 6fourfour
2 ninevbmltwo69
3 pcg91vqrfpxxzzzoneightzt
4 jpprthxgjfive3one1qckhrptpqdc
995 583sevenhjxlqzjgbzxhkcl5
996 81s
997 2four3threesxxvlfqfive4
998 nine6eightsevenzx9twoxc
999 hmbfjdfnp989mfivefiverpzrjs

1000 rows × 1 columns

Second, extract the digits. I had to wrap my head around regex matching in python first, because I first tried pandas.extract (which only extracts the first match), then pandas.extractall (which extracts all matches but puts them into a multiindex which makes things more difficult in this case). So I settled for the re.findall version, which returns a list. To concatenate the elements in the list, we take use the join function.

txt['digits'] = txt.loc[:, 'code'].apply(
    lambda x: ''.join(re.findall(r'(\d+)', x)))
txt
code digits
0 jjfvnnlfivejj1 1
1 6fourfour 6
2 ninevbmltwo69 69
3 pcg91vqrfpxxzzzoneightzt 91
4 jpprthxgjfive3one1qckhrptpqdc 31
995 583sevenhjxlqzjgbzxhkcl5 5835
996 81s 81
997 2four3threesxxvlfqfive4 234
998 nine6eightsevenzx9twoxc 69
999 hmbfjdfnp989mfivefiverpzrjs 989

1000 rows × 2 columns

Next, combine the first and the last digit and convert the result from string to integer

txt['calibration'] = txt.loc[:, 'digits'].apply(
    lambda x: int(x[0] + x[-1]))
txt
code digits calibration
0 jjfvnnlfivejj1 1 11
1 6fourfour 6 66
2 ninevbmltwo69 69 69
3 pcg91vqrfpxxzzzoneightzt 91 91
4 jpprthxgjfive3one1qckhrptpqdc 31 31
995 583sevenhjxlqzjgbzxhkcl5 5835 55
996 81s 81 81
997 2four3threesxxvlfqfive4 234 24
998 nine6eightsevenzx9twoxc 69 69
999 hmbfjdfnp989mfivefiverpzrjs 989 99

1000 rows × 3 columns

Lastly, get the sum of our calibration numbers

txt.loc[:, 'calibration'].sum()
56465

Part 2 #

Now follows part two:

Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.

What is the sum of all of the calibration values?

Okay, let’s see if we can update the pattern matching. To deal with potential overlapping values like oneight which contains one as well as eight, I used the regex positive lookahead ?= as described here. Because this enables capturing overlapping values, I used \d (one digit) instead of \d+ (one or more digits), otherwise digits might double. Afterwards, just replace the spelled out digits with their numerical value.

# for i, r in enumerate(txt.loc[:, 'code']):
#     matches = re.findall(
#         r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', r)
#     result = ''.join([match for match in matches])
#     result = result.replace('one', '1').replace('two', '2').replace(
#         'three', '3').replace('four', '4').replace('five', '5').replace(
#         'six', '6').replace('seven', '7').replace('eight', '8').replace(
#         'nine', '9')
#     txt.loc[i, 'digits2'] = result
# txt

# a very nice alternative suggested by Tomalak:
digits = '\d one two three four five six seven eight nine'.split()


txt['digits2'] = txt.loc[:, 'code'].apply(lambda v: ''.join(
    str(digits.index(m)) if m in digits else m
    for m in re.findall(rf'(?=({"|".join(digits)}))', v)
))
txt
code digits calibration digits2
0 jjfvnnlfivejj1 1 11 51
1 6fourfour 6 66 644
2 ninevbmltwo69 69 69 9269
3 pcg91vqrfpxxzzzoneightzt 91 91 9118
4 jpprthxgjfive3one1qckhrptpqdc 31 31 5311
995 583sevenhjxlqzjgbzxhkcl5 5835 55 58375
996 81s 81 81 81
997 2four3threesxxvlfqfive4 234 24 243354
998 nine6eightsevenzx9twoxc 69 69 968792
999 hmbfjdfnp989mfivefiverpzrjs 989 99 98955

1000 rows × 4 columns

Now, construct the calibration value as before…

txt['calibration2'] = txt.loc[:, 'digits2'].apply(lambda x: int(x[0] + x[-1]))
txt
code digits calibration digits2 calibration2
0 jjfvnnlfivejj1 1 11 51 51
1 6fourfour 6 66 644 64
2 ninevbmltwo69 69 69 9269 99
3 pcg91vqrfpxxzzzoneightzt 91 91 9118 98
4 jpprthxgjfive3one1qckhrptpqdc 31 31 5311 51
995 583sevenhjxlqzjgbzxhkcl5 5835 55 58375 55
996 81s 81 81 81 81
997 2four3threesxxvlfqfive4 234 24 243354 24
998 nine6eightsevenzx9twoxc 69 69 968792 92
999 hmbfjdfnp989mfivefiverpzrjs 989 99 98955 95

1000 rows × 5 columns

… and get the correct sum!

txt.loc[:, 'calibration2'].sum()
55902
comments powered by Disqus