I try to solve this year’s Advent of Code 2024 riddles in R. This is Day 1 - see https:adventofcode.com/2024/day/1
Part 1: Locations #
Lets first read the task:
Throughout the Chief’s office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don’t miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.
There’s just one problem: by holding the two lists up side by side (your puzzle input), it quickly becomes clear that the lists aren’t very similar. Maybe you can help The Historians reconcile their lists?
For example:
3 4
4 3
2 5
1 3
3 9
3 3
Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.
Within each pair, figure out how far apart the two numbers are; you’ll need to add up all of those distances. For example, if you pair up a
3
from the left list with a7
from the right list, the distance apart is4
; if you pair up a9
with a3
, the distance apart is6
.In the example list above, the pairs and distances would be as follows:
- The smallest number in the left list is
1
, and the smallest number in the right list is3
. The distance between them is2
.- The second-smallest number in the left list is
2
, and the second-smallest number in the right list is another3
. The distance between them is1
.- The third-smallest number in both lists is
3
, so the distance between them is0
.- The next numbers to pair up are
3
and4
, a distance of1
.- The fifth-smallest numbers in each list are
3
and5
, a distance of2
.- Finally, the largest number in the left list is
4
, while the largest number in the right list is9
; these are a distance5
apart.To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is
2 + 1 + 0 + 1 + 2 + 5
, a total distance of11
!Your actual left and right lists contain many location IDs. What is the total distance between your lists?
My input file: 2024-12-01-1-aoc.txt
First, we read the data - the fread
command from the data.table
package is
more versatile than read.delim
from base
and directly reads the data as a
data.table
, which has
some benefits.
data <- data.table::fread("2024-12-01-1-aoc.txt", header = FALSE)
head(data)
V1 | V2 |
---|---|
<int> | <int> |
41226 | 69190 |
89318 | 10100 |
59419 | 23880 |
63157 | 20193 |
81510 | 22869 |
83942 | 63304 |
Now, the idea is to get the ordered versions of V1
and V2
.
v1 <- data[order(V1)]$V1
v2 <- data[order(V2)]$V2
head(v1)
- 10188
- 10314
- 10319
- 10348
- 10408
- 10668
Coming from Python, this syntax looks a bit odd. The documentation of
data.table::setorder
helps:
setorder (and setorderv) reorders the rows of a data.table based on the columns (and column
order) provided. It reorders the table by reference and is therefore very memory efficient.
Note that queries like x[order(.)] are optimised internally to use data.table's fast order.
So, data[order(V1)]
is actually short for data.table::setorder(data, V1)
.
Then, we extract the vector by name using the $
operator, which allows to
extract elements by name.
head(data.table::setorder(data, V1)$V1)
- 10188
- 10314
- 10319
- 10348
- 10408
- 10668
The actual computation is just the sum of the absolute difference:
sum(abs(v1 - v2))
3574690
Part 2: Similarities #
Lets first read the task:
This time, you’ll need to figure out exactly how often each number from the left list appears in the right list. Calculate a total similarity score by adding up each number in the left list after multiplying it by the number of times that number appears in the right list.
Here are the same example lists again:
3 4
4 3
2 5
1 3
3 9
3 3
For these example lists, here is the process of finding the similarity score:
- The first number in the left list is
3
. It appears in the right list three times, so the similarity score increases by3 * 3 = 9
.- The second number in the left list is
4
. It appears in the right list once, so the similarity score increases by4 * 1 = 4
.- The third number in the left list is
2
. It does not appear in the right list, so the similarity score does not increase (2 * 0 = 0
).- The fourth number,
1
, also does not appear in the right list.- The fifth number,
3
, appears in the right list three times; the similarity score increases by9
.- The last number,
3
, appears in the right list three times; the similarity score again increases by9
.So, for these example lists, the similarity score at the end of this process is
31
(9 + 4 + 0 + 0 + 9 + 9
).Once again consider your left and right lists. What is their similarity score?
My idea here is simple - we first count the occurrences of V2
to be able to
check V1
against them. Here, applying the table
command works like
pandas.value_counts()
and achieves this. We convert the output to a
data.frame
and assign the V1
values as row names. Note: table
converts the
values which are counted to string, e.g. "10019"
instead of 10019
.
tab <- table(v2)
v2series <- data.frame(c(tab), row.names = names(tab))
head(v2series)
v2series["10019", ]
c.tab. | |
---|---|
<int> | |
10019 | 1 |
10100 | 1 |
10206 | 1 |
10428 | 1 |
10645 | 1 |
10972 | 1 |
1
Now, we loop through V1
and try to access the frequency in V2
(by converting
to string first, via as.character
). If nothing is found in V2
, an NA
value
is returned. For everything that is not NA, we compute the similarity score,
append it to a list, and sum in the end. Note: a list in R can not directly be
summed (I don’t know why that is) - so we have to unlist
first.
sim <- list()
for (num in v1) {
myfreq <- v2series[as.character(num), ]
if (!is.na(myfreq)) {
score <- myfreq * num
sim <- append(sim, score)
}
}
sum(unlist(sim))
22565391