This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 5 - see https:adventofcode.com/2023/day/5
Update
:- substitute
get_generator()
(own implementation) withrange()
(Python inbuilt) - improve grid search so that it goes through all location ranges, still starting with the lowest range
Part 1 #
Lets first read the task:
The almanac (your puzzle input) lists all of the seeds that need to be planted. It also lists what type of soil to use with each kind of seed, what type of fertilizer to use with each kind of soil, what type of water to use with each kind of fertilizer, and so on. Every type of seed, soil, fertilizer and so on is identified with a number, but numbers are reused by each category - that is, soil
123
and fertilizer123
aren’t necessarily related to each other.For example:
seeds: 79 14 55 13
seed-to-soil map:
50 98 2
52 50 48
soil-to-fertilizer map:
0 15 37
37 52 2
39 0 15
fertilizer-to-water map:
49 53 8
0 11 42
42 0 7
57 7 4
water-to-light map:
88 18 7
18 25 70
light-to-temperature map:
45 77 23
81 45 19
68 64 13
temperature-to-humidity map:
0 69 1
1 0 69
humidity-to-location map:
60 56 37
56 93 4
The almanac starts by listing which seeds need to be planted: seeds
79
,14
,55
, and13
.The rest of the almanac contains a list of maps which describe how to convert numbers from a source category into numbers in a destination category. That is, the section that starts with
seed-to-soil map
: describes how to convert a seed number (the source) to a soil number (the destination). This lets the gardener and his team know which soil to use with which seeds, which water to use with which fertilizer, and so on.Rather than list every source number and its corresponding destination number one by one, the maps describe entire ranges of numbers that can be converted. Each line within a map contains three numbers: the destination range start, the source range start, and the range length.
Consider again the example
seed-to-soil map
:
50 98 2 52 50 48
The first line has a destination range start of
50
, a source range start of98
, and a range length of2
. This line means that the source range starts at98
and contains two values:98
and99
. The destination range is the same length, but it starts at50
, so its two values are50
and51
. With this information, you know that seed number98
corresponds to soil number50
and that seed number99
corresponds to soil number51
. o The second line means that the source range starts at50
and contains48
values:50
,51
, …,96
,97
. This corresponds to a destination range starting at52
and also containing48
values:52
,53
, …,98
,99
. So, seed number53
corresponds to soil number55
.Any source numbers that aren’t mapped correspond to the same destination number. So, seed number
10
corresponds to soil number10
.So, the entire list of seed numbers and their corresponding soil numbers looks like this:
seed soil
0 0
1 1
... ...
48 48
49 49
50 52
51 53
... ...
96 98
97 99
98 50
99 51
With this map, you can look up the soil number required for each initial seed number:
- Seed number
79
corresponds to soil number81
.- Seed number
14
corresponds to soil number14
.- Seed number
55
corresponds to soil number57
.- Seed number
13
corresponds to soil number13
.The gardener and his team want to get started as soon as possible, so they’d like to know the closest location that needs a seed. Using these maps, find the lowest location number that corresponds to any of the initial seeds. To do this, you’ll need to convert each seed number through other categories until you can find its corresponding location number. In this example, the corresponding types are:
- Seed
79
, soil81
, fertilizer81
, water81
, light74
, temperature78
, humidity78
, location82
.- Seed
14
, soil14
, fertilizer53
, water49
, light42
, temperature42
, humidity43
, location43
.- Seed
55
, soil57
, fertilizer57
, water53
, light46
, temperature82
, humidity82
, location86
.- Seed
13
, soil13
, fertilizer52
, water41
, light34
, temperature34
, humidity35
, location35
.So, the lowest location number in this example is
35
.What is the lowest location number that corresponds to any of the initial seed numbers?
My input file: 2023-12-05-1-aoc.txt
Wow, this task is a mouthful…
Let’s start slowly and load the data. Our input text document contains several
maps, which are clearly separated and have a title (seed-to-soil map
etc). So
we can tell pandas where each map starts and give each map a dataframe. I got
the values for the skiprows
and nrows
argument by looking at the input file
and… counting
:)
import pandas as pd
import sys
seeds = pd.read_table('data/2023-12-05-1-aoc.txt', nrows=1, sep=' ',
header=None, index_col=0)
seeds = seeds.values.flatten()
map_opt = {'filepath_or_buffer': 'data/2023-12-05-1-aoc.txt',
'header': None, 'sep': ' ', 'dtype': 'Int64',
'names': ['dest_start', 'src_start', 'range']}
seed_soil = pd.read_table(skiprows=3, nrows=23, **map_opt)
soil_fert = pd.read_table(skiprows=28, nrows=9, **map_opt)
fert_water = pd.read_table(skiprows=39, nrows=20, **map_opt)
water_light = pd.read_table(skiprows=61, nrows=40, **map_opt)
light_temp = pd.read_table(skiprows=103, nrows=36, **map_opt)
temp_humi = pd.read_table(skiprows=141, nrows=35, **map_opt)
humi_loc = pd.read_table(skiprows=178, nrows=26, **map_opt)
maps = (seed_soil, soil_fert, fert_water, water_light, light_temp,
temp_humi, humi_loc)
print('seeds are just a numpy array:')
display(seeds)
print('The "humidity-to-location" map as an example:')
humi_loc
seeds are just a numpy array:
array([3169137700, 271717609, 3522125441, 23376095, 1233948799,
811833837, 280549587, 703867355, 166086528, 44766996,
2326968141, 69162222, 2698492851, 14603069, 2755327667,
348999531, 2600461189, 92332846, 1054656969, 169099767])The "humidity-to-location" map as an example:
dest_start | src_start | range | |
---|---|---|---|
0 | 3490144003 | 1623866227 | 218040905 |
1 | 1709610578 | 1620839197 | 3027030 |
2 | 105449249 | 586389428 | 113279526 |
3 | 1899604338 | 2167886292 | 348178199 |
4 | 1712637608 | 2678624215 | 186966730 |
5 | 218728775 | 0 | 245776251 |
6 | 2472992580 | 923734334 | 143670388 |
7 | 2616662968 | 3670169885 | 15297294 |
8 | 2247782537 | 1395629154 | 225210043 |
9 | 0 | 480940179 | 105449249 |
10 | 4113852846 | 3959729057 | 159909096 |
11 | 3322784653 | 1067404722 | 167359350 |
12 | 923734334 | 4119638153 | 175329143 |
13 | 1534964496 | 2516064491 | 162559724 |
14 | 2631960262 | 3496028440 | 140849733 |
15 | 2862695906 | 1234764072 | 97988355 |
16 | 1697524220 | 3636878173 | 12086358 |
17 | 2985646048 | 1332752427 | 62876727 |
18 | 1099063477 | 2970241510 | 435901019 |
19 | 4009202281 | 2865590945 | 104650565 |
20 | 2960684261 | 2142924505 | 24961787 |
21 | 2772809995 | 3406142529 | 89885911 |
22 | 4273761942 | 3648964531 | 21205354 |
23 | 3708184908 | 1841907132 | 301017373 |
24 | 464505026 | 245776251 | 235163928 |
25 | 3048522775 | 3685467179 | 274261878 |
My first attempt was to actually construct ranges like in the example above,
mapping out all possible sources and destinations. Python quickly informed me
that even constructing one pandas.Series
with int64
values mapping seeds to
soil would cost 64GB memory - not the best solution.
So we take a different approach. For convenience, let’s add a src_end
and a
dest_end
column to our maps:
for df in maps:
df['src_end'] = df.loc[:, 'src_start'] + df.loc[:, 'range']
df['dest_end'] = df.loc[:, 'dest_start'] + df.loc[:, 'range']
print('Again the "humidity-to-location" map as an example:')
humi_loc
Again the "humidity-to-location" map as an example:
dest_start | src_start | range | src_end | dest_end | |
---|---|---|---|---|---|
0 | 3490144003 | 1623866227 | 218040905 | 1841907132 | 3708184908 |
1 | 1709610578 | 1620839197 | 3027030 | 1623866227 | 1712637608 |
2 | 105449249 | 586389428 | 113279526 | 699668954 | 218728775 |
3 | 1899604338 | 2167886292 | 348178199 | 2516064491 | 2247782537 |
4 | 1712637608 | 2678624215 | 186966730 | 2865590945 | 1899604338 |
5 | 218728775 | 0 | 245776251 | 245776251 | 464505026 |
6 | 2472992580 | 923734334 | 143670388 | 1067404722 | 2616662968 |
7 | 2616662968 | 3670169885 | 15297294 | 3685467179 | 2631960262 |
8 | 2247782537 | 1395629154 | 225210043 | 1620839197 | 2472992580 |
9 | 0 | 480940179 | 105449249 | 586389428 | 105449249 |
10 | 4113852846 | 3959729057 | 159909096 | 4119638153 | 4273761942 |
11 | 3322784653 | 1067404722 | 167359350 | 1234764072 | 3490144003 |
12 | 923734334 | 4119638153 | 175329143 | 4294967296 | 1099063477 |
13 | 1534964496 | 2516064491 | 162559724 | 2678624215 | 1697524220 |
14 | 2631960262 | 3496028440 | 140849733 | 3636878173 | 2772809995 |
15 | 2862695906 | 1234764072 | 97988355 | 1332752427 | 2960684261 |
16 | 1697524220 | 3636878173 | 12086358 | 3648964531 | 1709610578 |
17 | 2985646048 | 1332752427 | 62876727 | 1395629154 | 3048522775 |
18 | 1099063477 | 2970241510 | 435901019 | 3406142529 | 1534964496 |
19 | 4009202281 | 2865590945 | 104650565 | 2970241510 | 4113852846 |
20 | 2960684261 | 2142924505 | 24961787 | 2167886292 | 2985646048 |
21 | 2772809995 | 3406142529 | 89885911 | 3496028440 | 2862695906 |
22 | 4273761942 | 3648964531 | 21205354 | 3670169885 | 4294967296 |
23 | 3708184908 | 1841907132 | 301017373 | 2142924505 | 4009202281 |
24 | 464505026 | 245776251 | 235163928 | 480940179 | 699668954 |
25 | 3048522775 | 3685467179 | 274261878 | 3959729057 | 3322784653 |
Now we actually compute the mapping. For each seed, we go through all mappings and in each mapping we go through each row. We find the row which contains the mapping and exctract the destination, which is the source for the next map until we reach the last map which gives us the locations.
-
Approach 1:
df.itertuples()
is a convenient way to step through apandas.DataFrame
in this example. It is faster thandf.iterrows()
and returns the row as aNamedTuple
- nice! -
Approach 2: I actually wondered if it would be faster to get all maps in one
pd.DataFrame
and then iterate through the mappings. To test this let’s construct a new DataFramemaps_df
which contains allmaps
. Since the maps have different lengths it is important to cast the datatype toInt64
, which is short forpd.Int64Dtype()
and keeps values as integers, even ifNA
values are in the same column. -
Approach 3: A third alternative I tested (not shown here) was to check if a value is in a Python
range
with thein
operator as in:if 3 in range(5):...
. This was way too slow.
# mapping version 1
def get_location(seed):
current = seed
for df in maps:
current_map = [row
for row in df.itertuples()
if ((current > row.src_start)
and (current < row.src_end))]
if len(current_map) == 0:
pass
elif len(current_map) == 1:
current = (current_map[0].dest_start
+ (current - current_map[0].src_start))
else:
raise ValueError('This should not happen!')
return current
# mapping version 2 - around 10 times slower
# you need to rename the maps_df columns so that they have a unique id
# e.g. 'src_start_1', 'src_start_2' etc
# maps_df = pd.concat(maps, axis='columns')
# def get_dest(i, src):
# return (maps_df[(src > maps_df.loc[:, f'src_start_{i}']) &
# (src < maps_df.loc[:, f'src_end_{i}'])]
# .loc[:, f'dest_start_{i}']
# .iloc[0])
# def get_location2(seed):
# dest = seed
# i = 1
# while i < 7:
# dest = get_dest(i, dest)
# i += 1
# return dest
%timeit get_location(seeds[0])
1.98 ms ± 72 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Now let’s get a list of locations:
locations = [get_location(s) for s in seeds]
locations
2493982655 | 3209845376 | 3992357533 | 4163131463 | 4104485616 | 1952252479 | 3218677354 | 388071289 | 2181441450 | 2594336315 | 4049507670 | 2084517144 | 3119633635 | 428978312 | 3518771991 | 3704555655 | 953918455 | 2107687768 | 3448046330 | 2184454689 |
Lastly, just get the minimum of all location values.
min(locations)
388071289
Part 2 #
Let’s read the task of part 2!
Everyone will starve if you only plant such a small number of seeds. Re-reading the almanac, it looks like the
seeds:
line actually describes ranges of seed numbers.The values on the initial
seeds:
line come in pairs. Within each pair, the first value is the start of the range and the second value is the length of the range. So, in the first line of the example above:
seeds: 79 14 55 13
This line describes two ranges of seed numbers to be planted in the garden. The first range starts with seed number
79
and contains14
values:79
,80
, …,91
,92
. The second range starts with seed number55
and contains13
values:55
,56
, …,66
,67
.Now, rather than considering four seed numbers, you need to consider a total of
27
seed numbers.In the above example, the lowest location number can be obtained from seed number
82
, which corresponds to soil84
, fertilizer84
, water84
, light77
, temperature45
, humidity46
, and location46
. So, the lowest location number is46
.Consider all of the initial seed numbers listed in the ranges on the first line of the almanac. What is the lowest location number that corresponds to any of the initial seed numbers?
Let’s first construct a dataframe containing the range of seeds:
seeds_df = pd.DataFrame({'start': seeds[::2],
'range': seeds[1::2],
'end': seeds[::2] + seeds[1::2]})
print(f'There are {sum(seeds_df.loc[:, "range"]):_} seeds in total')
seeds_df
There are 2_549_759_327 seeds in total
start | range | end | |
---|---|---|---|
0 | 3169137700 | 271717609 | 3440855309 |
1 | 3522125441 | 23376095 | 3545501536 |
2 | 1233948799 | 811833837 | 2045782636 |
3 | 280549587 | 703867355 | 984416942 |
4 | 166086528 | 44766996 | 210853524 |
5 | 2326968141 | 69162222 | 2396130363 |
6 | 2698492851 | 14603069 | 2713095920 |
7 | 2755327667 | 348999531 | 3104327198 |
8 | 2600461189 | 92332846 | 2692794035 |
9 | 1054656969 | 169099767 | 1223756736 |
Now - I really needed some time to finally realize, that going through all seed values is really unfeasable. So how do we deal with this problem?
In the end we need the lowest location number - thus our approach is to take
the humi_loc
map, start with the lowest location number and go up and get the
corresponding seed values. The location of the first seed value which is inside
seeds_df
is our solution.
So first we rebuild the get_location
function to a get_seed
function (we
reverse the maps
tuple with maps[::-1]
and switch src
and dest
).
def get_seed(location):
current = location
for df in maps[::-1]:
current_map = [row
for row in df.itertuples()
if ((current >= row.dest_start)
and (current < row.dest_end))]
if len(current_map) == 0:
pass
elif len(current_map) == 1:
current = (current_map[0].src_start
+ (current - current_map[0].dest_start))
else:
raise ValueError('This should not happen!')
return current
Since the Python 3 range
function does not store it’s contents in memory
(similar to a generator), it is well suited to go through these large ranges.
Lastly, we deal with the sheer amount of possible values by performing a grid
search. First, we check every millionth location. After the first match, we
stop this search and check the last million locations before the match with a
finer grid and so on. The last grid is just 1
, so we find our lowest location.
Update: We order our locations from smallest to largest with sort_values
and
go through them - but the search stops after the first match, since that will be
our lowest location.
def grid_search(start: int, end: int, grid: list[int]):
success = False
for g in grid:
print(f'Start search with grid={g}')
for l in range(start, end, g):
current = get_seed(l)
if any((current >= seeds_df.loc[:, 'start']) & (current < seeds_df.loc[:, 'end'])):
print(f'location {l} is the lowest which contains one of the given seeds ({current})')
start = l - g
end = l
success = True
break
return success
for row in humi_loc.sort_values('dest_start').itertuples():
success = grid_search(start=row.dest_start, end=row.dest_end,
grid=[1_000_000, 100_000, 1000, 1])
if success:
print('Finished search')
break
Start search with grid=1000000
location 85000000 is the lowest which contains one of the given seeds (2605777210)
Start search with grid=100000
location 84300000 is the lowest which contains one of the given seeds (2605077210)
Start search with grid=1000
location 84207000 is the lowest which contains one of the given seeds (2604984210)
Start search with grid=1
location 84206669 is the lowest which contains one of the given seeds (2604983879)
Finished search