Advent of Code 2020 | Days 4 to 6
This post continues the Advent Of Code 2020 series an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. These are my six solutions to the ‘Advent of Code 2020’ puzzles, from day 4 to day 6, using R.
Day 4: Passport Processing
Part One
You arrive at the airport only to realize that you grabbed your North Pole Credentials instead of your passport. While these documents are extremely similar, North Pole Credentials aren’t issued by a country and therefore aren’t actually valid documentation for travel in most of the world.
It seems like you’re not the only one having problems, though; a very long line has formed for the automatic passport scanners, and the delay could upset your travel itinerary.
Due to some questionable network security, you realize you might be able to solve both of these problems at the same time.
The automatic passport scanners are slow because they’re having trouble detecting which passports have all required fields. The expected fields are as follows:
byr (Birth Year)
iyr (Issue Year)
eyr (Expiration Year)
hgt (Height)
hcl (Hair Color)
ecl (Eye Color)
pid (Passport ID)
cid (Country ID)
Passport data is validated in batch files (your puzzle input). Each passport is represented as a sequence of key:value pairs separated by spaces or newlines. Passports are separated by blank lines.
Here is an example batch file containing four passports:
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm
iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884
hcl:#cfa07d byr:1929
hcl:#ae17e1 iyr:2013
eyr:2024
ecl:brn pid:760753108 byr:1931
hgt:179cm
hcl:#cfa07d eyr:2025 pid:166559648
iyr:2011 ecl:brn hgt:59in
The first passport is valid - all eight fields are present. The second passport is invalid - it is missing hgt (the Height field).
The third passport is interesting; the only missing field is cid
, so it looks like data from North Pole Credentials, not a passport at all! Surely, nobody would mind if you made the system temporarily ignore missing cid
fields. Treat this “passport” as valid.
The fourth passport is missing two fields, cid
and byr
. Missing cid
is fine, but missing any other field is not, so this passport is invalid.
According to the above rules, your improved system would report 2 valid passports.
Count the number of valid passports - those that have all required fields. Treat cid
as optional. In your batch file, how many passports are valid?
Solution
In R, the solution is straightforward, we process the text, transforming into a list of passport, each element has the matrix of field names and values. In this first part we just count with we have enough fields in the passport register.
library(tidyverse)
library(knitr)
library(kableExtra)
# read data as text (vector of string by lines)
input <- read_lines("data/day04_input.txt")
# change a empty line to "|" (trick to split later)
input[input==""] <- "|"
# collapses the data and splits the data
# into a list of passports
passports <- input %>%
# collapses into one string
str_c(collapse = " ") %>%
# splits each passport data
str_split("\\|") %>%
unlist() %>%
# for each passport splits the fields
str_split(" ") %>%
# pre-process the fields into a matrix (field, value)
map(function(line){
# clean empty values
line[line!=""] %>%
str_split(":", simplify = T) %>%
return()
})
# function to check if the passport register has enough fields
is.valid.passdata <- function(pass.data){
fields <- pass.data[,1]
# has all fields
if (length(fields) == 8) return(T)
# has all fields but cid
if (length(fields)==7 & length(fields[fields=="cid"])==0 ) return(T)
return(F)
}
# checks all passport data and counts the valid ones
passports %>%
map_lgl(is.valid.passdata) %>%
sum()
## [1] 254
Part two
The line is moving more quickly now, but you overhear airport security talking about how passports with invalid data are getting through. Better add some data validation, quick!
You can continue to ignore the cid field, but each other field has strict rules about what values are valid for automatic validation:
byr (Birth Year) - four digits; at least 1920 and at most 2002.
iyr (Issue Year) - four digits; at least 2010 and at most 2020.
eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
hgt (Height) - a number followed by either cm or in:
- If cm, the number must be at least 150 and at most 193.
- If in, the number must be at least 59 and at most 76.
hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
pid (Passport ID) - a nine-digit number, including leading zeroes.
cid (Country ID) - ignored, missing or not.
Your job is to count the passports where all required fields are both present and valid according to the above rules. Here are some example values:
byr valid: 2002
byr invalid: 2003
hgt valid: 60in
hgt valid: 190cm
hgt invalid: 190in
hgt invalid: 190
hcl valid: #123abc
hcl invalid: #123abz
hcl invalid: 123abc
ecl valid: brn
ecl invalid: wat
pid valid: 000000001
pid invalid: 0123456789
Here are some invalid passports:
eyr:1972 cid:100
hcl:#18171d ecl:amb hgt:170 pid:186cm iyr:2018 byr:1926
iyr:2019
hcl:#602927 eyr:1967 hgt:170cm
ecl:grn pid:012533040 byr:1946
hcl:dab227 iyr:2012
ecl:brn hgt:182cm pid:021572410 eyr:2020 byr:1992 cid:277
hgt:59cm ecl:zzz
eyr:2038 hcl:74454a iyr:2023
pid:3556412378 byr:2007
Here are some valid passports:
pid:087499704 hgt:74in ecl:grn iyr:2012 eyr:2030 byr:1980
hcl:#623a2f
eyr:2029 ecl:blu cid:129 byr:1989
iyr:2014 pid:896056539 hcl:#a97842 hgt:165cm
hcl:#888785
hgt:164cm byr:2001 iyr:2015 cid:88
pid:545766238 ecl:hzl
eyr:2022
iyr:2010 hgt:158cm hcl:#b6652a ecl:blu byr:1944 eyr:2021 pid:093154719
Count the number of valid passports - those that have all required fields and valid values. Continue to treat cid as optional. In your batch file, how many passports are valid?
Solution
Same approach as part one, but here we’ll have to verify each passport field, in this time we transform the input text into a full data frame (tibble) with a passport each row and its fields in column (tidy data). After that we verify the validation of each field (at same time in the whole data.frame).
# reads data as text (vector of string by lines)
input <- read_lines("data/day04_input.txt")
# change a empty line to "|" (trick to split later)
input[input==""] <- "|"
# collapse the string and splits the data
# into a list of fields x value
passports <- input %>%
# collapses into one string
str_c(collapse = " ") %>%
# splits each passport data
str_split("\\|") %>%
unlist() %>%
# for each passport splits the fields
str_split(" ") %>%
# pre-process the fields into a tibble
map(function(line){
# remove empty strings and split into "name":"value pair matrix
pd <- str_split(line[line!=""], ":", simplify = T)
# return a list of fields
pd[,2] %>%
# build an tibble with the fields and values of the passport
split(1:length(.)) %>%
set_names(pd[,1]) %>%
as_tibble() %>%
return()
})
# puts the data into a tibble
pass.check <- tibble(
pdata = passports
) %>%
# unnests
unnest(pdata)
# let's see what we got
head(pass.check) %>%
kable() %>%
kable_styling(font_size=10)
iyr | ecl | hgt | pid | byr | hcl | eyr | cid |
---|---|---|---|---|---|---|---|
2010 | gry | 181cm | 591597745 | 1920 | #6b5442 | 2029 | 123 |
2016 | amb | 177cm | 404183620 | 1927 | #602927 | 2020 | 223 |
2014 | hzl | 166cm | 594143498 | 1998 | #a97842 | 2030 | 178 |
2018 | hzl | 157cm | 795349208 | NA | #de745c | 2024 | NA |
2018 | hzl | 159cm | 364060467 | 1978 | #18171d | 2025 | 117 |
2012 | amb | 182cm | 374679609 | 1925 | #cfa07d | 2020 | 338 |
# validates each field rule
pass.check <- pass.check %>%
# byr (Birth Year) - four digits; at least 1920 and at most 2002.
mutate( byr = as.integer(byr) ) %>%
mutate( chk.byr = (!is.na(byr) & byr>=1920 & byr<=2002) ) %>%
# iyr (Issue Year) - four digits; at least 2010 and at most 2020.
mutate( iyr = as.integer(iyr) ) %>%
mutate( chk.iyr = (!is.na(iyr) & iyr>=2010 & iyr<=2020) ) %>%
# eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
mutate( eyr = as.integer(eyr) ) %>%
mutate( chk.eyr = (!is.na(eyr) & eyr>=2020 & eyr<=2030)) %>%
# hgt (Height) - a number followed by either cm or in:
mutate( hgt.fmt.chk = (!is.na(hgt) & str_detect(hgt,"[0-9]+(in|cm)")) ) %>%
# separates value of unit to check the ranges
mutate( hgt.val = as.integer(str_extract(hgt,"[0-9]+")),
hgt.unit = str_extract(hgt,"(in|cm)")) %>%
mutate( chk.hgt = case_when(
# invalid format or empty field
!hgt.fmt.chk ~ F,
# - If cm, the number must be at least 150 and at most 193.
hgt.unit == "cm" ~ (hgt.val>=150 & hgt.val<=193),
# - If in, the number must be at least 59 and at most 76.
hgt.unit == "in" ~ (hgt.val>=59 & hgt.val<=76),
T ~ F # otherwise?
)) %>%
# hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
mutate( chk.hcl = (!is.na(hcl) & str_detect(hcl, "#[0-9a-f]{6}") & nchar(hcl)==7) ) %>%
# ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
mutate( chk.ecl = (!is.na(ecl) & ecl %in% c("amb","blu","brn","gry","grn","hzl","oth"))) %>%
# pid (Passport ID) - a nine-digit number, including leading zeroes.
mutate( chk.pid = (!is.na(pid) & str_detect(pid,"[0-9]{9}") * nchar(pid)==9) ) %>%
# cid (Country ID) - ignored, missing or not.
# check all policies
mutate( is.valid = (chk.byr & chk.iyr & chk.eyr & chk.hgt & chk.hcl & chk.ecl & chk.pid))
# let's see what we got
head(pass.check) %>%
kable() %>%
kable_styling(font_size=8)
iyr | ecl | hgt | pid | byr | hcl | eyr | cid | chk.byr | chk.iyr | chk.eyr | hgt.fmt.chk | hgt.val | hgt.unit | chk.hgt | chk.hcl | chk.ecl | chk.pid | is.valid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 | gry | 181cm | 591597745 | 1920 | #6b5442 | 2029 | 123 | TRUE | TRUE | TRUE | TRUE | 181 | cm | TRUE | TRUE | TRUE | TRUE | TRUE |
2016 | amb | 177cm | 404183620 | 1927 | #602927 | 2020 | 223 | TRUE | TRUE | TRUE | TRUE | 177 | cm | TRUE | TRUE | TRUE | TRUE | TRUE |
2014 | hzl | 166cm | 594143498 | 1998 | #a97842 | 2030 | 178 | TRUE | TRUE | TRUE | TRUE | 166 | cm | TRUE | TRUE | TRUE | TRUE | TRUE |
2018 | hzl | 157cm | 795349208 | NA | #de745c | 2024 | NA | FALSE | TRUE | TRUE | TRUE | 157 | cm | TRUE | TRUE | TRUE | TRUE | FALSE |
2018 | hzl | 159cm | 364060467 | 1978 | #18171d | 2025 | 117 | TRUE | TRUE | TRUE | TRUE | 159 | cm | TRUE | TRUE | TRUE | TRUE | TRUE |
2012 | amb | 182cm | 374679609 | 1925 | #cfa07d | 2020 | 338 | TRUE | TRUE | TRUE | TRUE | 182 | cm | TRUE | TRUE | TRUE | TRUE | TRUE |
# how many is valid?
sum(1*pass.check$is.valid)
## [1] 184
Day 5: Binary Boarding
Part 1
You board your plane only to discover a new problem: you dropped your boarding pass! You aren’t sure which seat is yours, and all of the flight attendants are busy with the flood of people that suddenly made it through passport control.
You write a quick program to use your phone’s camera to scan all of the nearby boarding passes (your puzzle input); perhaps you can find your seat through process of elimination.
Instead of zones or groups, this airline uses binary space partitioning to seat people. A seat might be specified like FBFBBFFRLR
, where F
means “front
”, B
means “back
”, L
means “left
”, and R
means “right
”.
The first 7
characters will either be F
or B
; these specify exactly one of the 128 rows on the plane
(numbered 0 through 127). Each letter tells you which half of a region the given seat is in
. Start with the whole list of rows; the first letter indicates whether the seat is in the front (0 through 63) or the back (64 through 127). The next letter indicates which half of that region the seat is in, and so on until you’re left with exactly one row.
For example, consider just the first seven characters of FBFBBFFRLR
:
Start by considering the whole range, rows 0 through 127.
F means to take the lower half, keeping rows 0 through 63.
B means to take the upper half, keeping rows 32 through 63.
F means to take the lower half, keeping rows 32 through 47.
B means to take the upper half, keeping rows 40 through 47.
B keeps rows 44 through 47.
F keeps rows 44 through 45.
The final F keeps the lower of the two, row 44.
The last three
characters will be either L
or R
; these specify exactly one of the 8 columns of seats
on the plane (numbered 0 through 7). The same process as above proceeds again, this time with only three steps. L
means to keep the lower half
, while R
means to keep the upper half
.
For example, consider just the last 3 characters of FBFBBFFRLR
:
Start by considering the whole range, columns 0 through 7.
R means to take the upper half, keeping columns 4 through 7.
L means to take the lower half, keeping columns 4 through 5.
The final R keeps the upper of the two, column 5.
So, decoding FBFBBFFRLR reveals that it is the seat at row 44, column 5.
Every seat also has a unique seat ID
: multiply the row by 8, then add the column. In this example, the seat has ID 44 * 8 + 5 = 357
.
Here are some other boarding passes:
BFFFBBFRRR: row 70, column 7, seat ID 567.
FFFBBBFRRR: row 14, column 7, seat ID 119.
BBFFBBFRLL: row 102, column 4, seat ID 820.
As a sanity check, look through your list of boarding passes. What is the highest seat ID on a boarding pass?
Solution
The solution consist in interpret the seat code as a series of step-instruction (B, F, L, R…) to choose parts of array, representing the rows and columns in the airplane. The code bellow is already factorized, so I programmed a function that receives the sequence of steps (for columns or rows) and a full map of positions (columns or rows) and iterate in the steps to find the exact position.
# function to find the row (or the column) in the code instruction
# steps - array of instruction (BFLR...)
# pos.map - array to be used in the search
# lower.code - with char in the step must be interpreted as "get the lower half" of the map
findInMap <- function(steps, pos.map, lower.code){
# for each char in the steps sequence
for (step in steps){
# we must get the lower or higher half of the map
if(step==lower.code){
index <- 1:(length(pos.map)/2)
} else {
index <- ((length(pos.map)/2)+1):length(pos.map)
}
# selects the correct half
pos.map <- pos.map[index]
}
# returns the value found
return(pos.map)
}
# auxiliary funcion to transform a seat CODE into a seat ID
calcSeatId <- function(code){
# transforms the string into a char vector
seat.code <- unlist(strsplit(code,""))
# the first 7 chars identifies the row
seat.row.code <- seat.code[1:7]
# the last 3 chars identifies the column
seat.col.code <- seat.code[8:10]
# finds the right row
rnum <- findInMap(seat.row.code, 0:127, "F")
# finds the right col
cnum <- findInMap(seat.col.code, 0:7, "L")
# calculates the id
return(rnum*8+cnum)
}
# reads the input data
input <- readr::read_lines("data/day05_input.txt")
# apply the calcSeatId in the codes
ids <- sapply(input, calcSeatId)
# get the highest seat ID on a boarding pass...
max(ids)
## [1] 880
Part Two
Ding! The “fasten seat belt” signs have turned on. Time to find your seat.
It’s a completely full flight, so your seat should be the only missing boarding pass in your list. However, there’s a catch: some of the seats at the very front and back of the plane don’t exist on this aircraft, so they’ll be missing from your list as well.
Your seat wasn’t at the very front or back, though; the seats with IDs +1 and -1 from yours will be in your list.
What is the ID of your seat?
Solution
It’s a simple problem, we just verify which Seat ID is missing in the boarding pass sequence, the fligh is full so, the missing one is our number.
# what are the initial and final seat code numbers?
min.seat <- min(ids)
max.seat <- max(ids)
# generate the whole sequence between
full.seats <- min.seat:max.seat
# check if there is a difference between the full sequence and the ids we have calculated
setdiff(full.seats, ids)
## [1] 731
Day 6: Custom Customs
Part One
As your flight approaches the regional airport where you’ll switch to a much larger plane, customs declaration forms are distributed to the passengers.
The form asks a series of 26 yes-or-no questions marked a through z. All you need to do is identify the questions for which anyone in your group answers “yes”. Since your group is just you, this doesn’t take very long.
However, the person sitting next to you seems to be experiencing a language barrier and asks if you can help. For each of the people in their group, you write down the questions for which they answer “yes”, one per line. For example:
abcx
abcy
abcz
In this group, there are 6 questions to which anyone answered “yes”: a, b, c, x, y, and z. (Duplicate answers to the same question don’t count extra; each question counts at most once.)
Another group asks for your help, then another, and eventually you’ve collected answers from every group on the plane (your puzzle input). Each group’s answers are separated by a blank line, and within each group, each person’s answers are on a single line. For example:
abc
a
b
c
ab
ac
a
a
a
a
b
This list represents answers from five groups:
The first group contains one person who answered "yes" to 3 questions: a, b, and c.
The second group contains three people; combined, they answered "yes" to 3 questions: a, b, and c.
The third group contains two people; combined, they answered "yes" to 3 questions: a, b, and c.
The fourth group contains four people; combined, they answered "yes" to only 1 question, a.
The last group contains one person who answered "yes" to only 1 question, b.
In this example, the sum of these counts is 3 + 3 + 3 + 1 + 1 = 11.
For each group, count the number of questions to which anyone answered “yes”. What is the sum of those counts
Solution
The solution for this part one is really simple, we process the text separating the group responses, for each group responses we count the distinct answers and them sum up.
# reads the input data as array of lines
input <- readr::read_lines("data/day06_input.txt")
# for each empty line we change to a "|" (trick to split in groups latter)
input[input==""] <- "|"
# process the data
input %>%
# collapses into a single string
str_c(collapse = "") %>%
# now splits in groups
str_split("\\|") %>%
unlist() %>% # a vector of groups
# for each group answer (a string of all answers)
map_int(function(.x){
# splits in a char vector
.x %>%
strsplit("") %>%
unlist() %>%
# gets the unique values
unique() %>%
# returns the count
length() %>%
return()
}) %>%
# sum them
sum()
## [1] 6763
Part Two
As you finish the last group’s customs declaration, you notice that you misread one word in the instructions:
You don’t need to identify the questions to which anyone answered “yes”; you need to identify the questions to which everyone answered “yes”!
Using the same example as above:
abc
a
b
c
ab
ac
a
a
a
a
b
This list represents answers from five groups:
In the first group, everyone (all 1 person) answered "yes" to 3 questions: a, b, and c.
In the second group, there is no question to which everyone answered "yes".
In the third group, everyone answered yes to only 1 question, a. Since some people did not answer "yes" to b or c, they don't count.
In the fourth group, everyone answered yes to only 1 question, a.
In the fifth group, everyone (all 1 person) answered "yes" to 1 question, b.
In this example, the sum of these counts is 3 + 0 + 1 + 1 + 1 = 6.
For each group, count the number of questions to which everyone answered “yes”. What is the sum of those counts?
Solution
This one is more tricky. We have to keep the individual response separated, count the occurrences of a question in each individual response in the group and then check how many occurrences are equal to the number individuals in the group.
library(tidyverse)
# reads de input file as a vector of strings
input <- readr::read_lines("data/day06_input.txt")
# replace empty lines with a separator (to split it latter)
input[input==""] = "|"
# handles the dataset to create a list of groups and individuals responses
in.data <- input %>%
# concats the strings marking the individual responses
str_c(collapse = ">") %>%
# split in vector of groups
str_split("\\|") %>%
unlist() %>%
# builds a list of groups with a vector of individual responses
map(function(.x){
resp <- unlist(str_split(.x, ">"))
return(resp[resp!=""])
})
# puts the group in a tibble
tibble(
responses = in.data
) %>%
# adds an "ID" for each group
mutate( g.id = row_number() ) %>%
# for each group create a tibble if individual question
mutate( questions = map(responses, function(.r){
tibble(
# counts the number of individual responses in each group
n.group = length(.r),
question = unlist(strsplit(.r,""))
) %>%
# counts the common question between them
add_count(question) %>%
# gets the only common question
filter(n.group==n) %>%
# returns a tibble with the questions common in the individual
# responses inside agroup
select(question) %>%
distinct() %>%
return()
})) %>%
# count the common questions for each group
unnest(questions) %>%
count(g.id) %>%
# sum then up
summarise( resp=sum(n))
## # A tibble: 1 x 1
## resp
## <int>
## 1 3512
To be continued…
I’ll make the rest of puzzles in the next days and publish them here, see you…