Contents

Advent of Code 2020 | Days 4 to 6

Contents

This post continues the Advent Of Code 2020 series an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. These are my six solutions to the ‘Advent of Code 2020’ puzzles, from day 4 to day 6, using R.

Day 4: Passport Processing

Part One

You arrive at the airport only to realize that you grabbed your North Pole Credentials instead of your passport. While these documents are extremely similar, North Pole Credentials aren’t issued by a country and therefore aren’t actually valid documentation for travel in most of the world.

It seems like you’re not the only one having problems, though; a very long line has formed for the automatic passport scanners, and the delay could upset your travel itinerary.

Due to some questionable network security, you realize you might be able to solve both of these problems at the same time.

The automatic passport scanners are slow because they’re having trouble detecting which passports have all required fields. The expected fields are as follows:

byr (Birth Year)
iyr (Issue Year)
eyr (Expiration Year)
hgt (Height)
hcl (Hair Color)
ecl (Eye Color)
pid (Passport ID)
cid (Country ID)

Passport data is validated in batch files (your puzzle input). Each passport is represented as a sequence of key:value pairs separated by spaces or newlines. Passports are separated by blank lines.

Here is an example batch file containing four passports:

ecl:gry pid:860033327 eyr:2020 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm

iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884
hcl:#cfa07d byr:1929

hcl:#ae17e1 iyr:2013
eyr:2024
ecl:brn pid:760753108 byr:1931
hgt:179cm

hcl:#cfa07d eyr:2025 pid:166559648
iyr:2011 ecl:brn hgt:59in

The first passport is valid - all eight fields are present. The second passport is invalid - it is missing hgt (the Height field).

The third passport is interesting; the only missing field is cid, so it looks like data from North Pole Credentials, not a passport at all! Surely, nobody would mind if you made the system temporarily ignore missing cid fields. Treat this “passport” as valid.

The fourth passport is missing two fields, cid and byr. Missing cid is fine, but missing any other field is not, so this passport is invalid.

According to the above rules, your improved system would report 2 valid passports.

Count the number of valid passports - those that have all required fields. Treat cid as optional. In your batch file, how many passports are valid?

Solution

In R, the solution is straightforward, we process the text, transforming into a list of passport, each element has the matrix of field names and values. In this first part we just count with we have enough fields in the passport register.

library(tidyverse)
library(knitr)
library(kableExtra)

# read data as text (vector of string by lines)
input <- read_lines("data/day04_input.txt")

# change a empty line to "|" (trick to split later)
input[input==""] <- "|"

# collapses the data and splits the data 
# into a list of passports
passports <- input %>% 
  # collapses into one string
  str_c(collapse = " ") %>% 
  # splits each passport data
  str_split("\\|") %>% 
  unlist() %>% 
  # for each passport splits the fields
  str_split(" ") %>% 
  # pre-process the fields into a matrix (field, value)
  map(function(line){
    # clean empty values
    line[line!=""] %>% 
      str_split(":", simplify = T) %>% 
      return()
  })

# function to check if the passport register has enough fields
is.valid.passdata <- function(pass.data){
  fields <- pass.data[,1]
  # has all fields
  if (length(fields) == 8) return(T)
  # has all fields but cid
  if (length(fields)==7 & length(fields[fields=="cid"])==0 ) return(T)
  return(F)
}

# checks all passport data and counts the valid ones
passports %>% 
  map_lgl(is.valid.passdata) %>% 
  sum()
## [1] 254

Part two

The line is moving more quickly now, but you overhear airport security talking about how passports with invalid data are getting through. Better add some data validation, quick!

You can continue to ignore the cid field, but each other field has strict rules about what values are valid for automatic validation:

byr (Birth Year) - four digits; at least 1920 and at most 2002.
iyr (Issue Year) - four digits; at least 2010 and at most 2020.
eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
hgt (Height) - a number followed by either cm or in:
 - If cm, the number must be at least 150 and at most 193.
 - If in, the number must be at least 59 and at most 76.
hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
pid (Passport ID) - a nine-digit number, including leading zeroes.
cid (Country ID) - ignored, missing or not.

Your job is to count the passports where all required fields are both present and valid according to the above rules. Here are some example values:

byr valid:   2002
byr invalid: 2003

hgt valid:   60in
hgt valid:   190cm
hgt invalid: 190in
hgt invalid: 190

hcl valid:   #123abc
hcl invalid: #123abz
hcl invalid: 123abc

ecl valid:   brn
ecl invalid: wat

pid valid:   000000001
pid invalid: 0123456789

Here are some invalid passports:

eyr:1972 cid:100
hcl:#18171d ecl:amb hgt:170 pid:186cm iyr:2018 byr:1926

iyr:2019
hcl:#602927 eyr:1967 hgt:170cm
ecl:grn pid:012533040 byr:1946

hcl:dab227 iyr:2012
ecl:brn hgt:182cm pid:021572410 eyr:2020 byr:1992 cid:277

hgt:59cm ecl:zzz
eyr:2038 hcl:74454a iyr:2023
pid:3556412378 byr:2007

Here are some valid passports:

pid:087499704 hgt:74in ecl:grn iyr:2012 eyr:2030 byr:1980
hcl:#623a2f

eyr:2029 ecl:blu cid:129 byr:1989
iyr:2014 pid:896056539 hcl:#a97842 hgt:165cm

hcl:#888785
hgt:164cm byr:2001 iyr:2015 cid:88
pid:545766238 ecl:hzl
eyr:2022

iyr:2010 hgt:158cm hcl:#b6652a ecl:blu byr:1944 eyr:2021 pid:093154719

Count the number of valid passports - those that have all required fields and valid values. Continue to treat cid as optional. In your batch file, how many passports are valid?

Solution

Same approach as part one, but here we’ll have to verify each passport field, in this time we transform the input text into a full data frame (tibble) with a passport each row and its fields in column (tidy data). After that we verify the validation of each field (at same time in the whole data.frame).

# reads data as text (vector of string by lines)
input <- read_lines("data/day04_input.txt")

# change a empty line to "|" (trick to split later)
input[input==""] <- "|"

# collapse the string and splits the data 
# into a list of fields x value
passports <- input %>% 
  # collapses into one string
  str_c(collapse = " ") %>% 
  # splits each passport data
  str_split("\\|") %>% 
  unlist() %>% 
  # for each passport splits the fields
  str_split(" ") %>% 
  # pre-process the fields into a tibble
  map(function(line){
    # remove empty strings and split into "name":"value pair matrix
    pd <- str_split(line[line!=""], ":", simplify = T)
    # return a list of fields
    pd[,2] %>% 
      # build an tibble with the fields and values of the passport
      split(1:length(.)) %>% 
      set_names(pd[,1]) %>% 
      as_tibble() %>% 
      return()
  })

# puts the data into a tibble
pass.check <- tibble(
    pdata = passports
  ) %>% 
  # unnests
  unnest(pdata)

# let's see what we got
head(pass.check) %>% 
  kable() %>% 
  kable_styling(font_size=10)
iyr ecl hgt pid byr hcl eyr cid
2010 gry 181cm 591597745 1920 #6b5442 2029 123
2016 amb 177cm 404183620 1927 #602927 2020 223
2014 hzl 166cm 594143498 1998 #a97842 2030 178
2018 hzl 157cm 795349208 NA #de745c 2024 NA
2018 hzl 159cm 364060467 1978 #18171d 2025 117
2012 amb 182cm 374679609 1925 #cfa07d 2020 338
# validates each field rule
pass.check <- pass.check %>% 
  # byr (Birth Year) - four digits; at least 1920 and at most 2002.
  mutate( byr = as.integer(byr) ) %>% 
  mutate( chk.byr = (!is.na(byr) & byr>=1920 & byr<=2002) ) %>% 
  # iyr (Issue Year) - four digits; at least 2010 and at most 2020.
  mutate( iyr = as.integer(iyr) ) %>% 
  mutate( chk.iyr = (!is.na(iyr) & iyr>=2010 & iyr<=2020) ) %>% 
  # eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
  mutate( eyr = as.integer(eyr) ) %>% 
  mutate( chk.eyr = (!is.na(eyr) & eyr>=2020 & eyr<=2030)) %>% 
  # hgt (Height) - a number followed by either cm or in:
  mutate( hgt.fmt.chk = (!is.na(hgt) & str_detect(hgt,"[0-9]+(in|cm)")) ) %>% 
  # separates value of unit to check the ranges
  mutate( hgt.val = as.integer(str_extract(hgt,"[0-9]+")),
          hgt.unit = str_extract(hgt,"(in|cm)")) %>% 
  mutate( chk.hgt = case_when(
    # invalid format or empty field
    !hgt.fmt.chk ~ F,
    # - If cm, the number must be at least 150 and at most 193.
    hgt.unit == "cm" ~ (hgt.val>=150 & hgt.val<=193),
    # - If in, the number must be at least 59 and at most 76.
    hgt.unit == "in" ~ (hgt.val>=59 & hgt.val<=76),
    T ~ F # otherwise?
  )) %>% 
  # hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
  mutate( chk.hcl = (!is.na(hcl) & str_detect(hcl, "#[0-9a-f]{6}") & nchar(hcl)==7) ) %>% 
  # ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
  mutate( chk.ecl = (!is.na(ecl) & ecl %in% c("amb","blu","brn","gry","grn","hzl","oth"))) %>% 
  # pid (Passport ID) - a nine-digit number, including leading zeroes.
  mutate( chk.pid = (!is.na(pid) & str_detect(pid,"[0-9]{9}") * nchar(pid)==9) ) %>%
  # cid (Country ID) - ignored, missing or not. 
  # check all policies
  mutate( is.valid = (chk.byr & chk.iyr & chk.eyr & chk.hgt & chk.hcl & chk.ecl & chk.pid))

# let's see what we got
head(pass.check) %>% 
  kable() %>% 
  kable_styling(font_size=8)
iyr ecl hgt pid byr hcl eyr cid chk.byr chk.iyr chk.eyr hgt.fmt.chk hgt.val hgt.unit chk.hgt chk.hcl chk.ecl chk.pid is.valid
2010 gry 181cm 591597745 1920 #6b5442 2029 123 TRUE TRUE TRUE TRUE 181 cm TRUE TRUE TRUE TRUE TRUE
2016 amb 177cm 404183620 1927 #602927 2020 223 TRUE TRUE TRUE TRUE 177 cm TRUE TRUE TRUE TRUE TRUE
2014 hzl 166cm 594143498 1998 #a97842 2030 178 TRUE TRUE TRUE TRUE 166 cm TRUE TRUE TRUE TRUE TRUE
2018 hzl 157cm 795349208 NA #de745c 2024 NA FALSE TRUE TRUE TRUE 157 cm TRUE TRUE TRUE TRUE FALSE
2018 hzl 159cm 364060467 1978 #18171d 2025 117 TRUE TRUE TRUE TRUE 159 cm TRUE TRUE TRUE TRUE TRUE
2012 amb 182cm 374679609 1925 #cfa07d 2020 338 TRUE TRUE TRUE TRUE 182 cm TRUE TRUE TRUE TRUE TRUE
# how many is valid?
sum(1*pass.check$is.valid)
## [1] 184

Day 5: Binary Boarding

Part 1

You board your plane only to discover a new problem: you dropped your boarding pass! You aren’t sure which seat is yours, and all of the flight attendants are busy with the flood of people that suddenly made it through passport control.

You write a quick program to use your phone’s camera to scan all of the nearby boarding passes (your puzzle input); perhaps you can find your seat through process of elimination.

Instead of zones or groups, this airline uses binary space partitioning to seat people. A seat might be specified like FBFBBFFRLR, where F means “front”, B means “back”, L means “left”, and R means “right”.

The first 7 characters will either be F or B; these specify exactly one of the 128 rows on the plane (numbered 0 through 127). Each letter tells you which half of a region the given seat is in. Start with the whole list of rows; the first letter indicates whether the seat is in the front (0 through 63) or the back (64 through 127). The next letter indicates which half of that region the seat is in, and so on until you’re left with exactly one row.

For example, consider just the first seven characters of FBFBBFFRLR:

Start by considering the whole range, rows 0 through 127.
F means to take the lower half, keeping rows 0 through 63.
B means to take the upper half, keeping rows 32 through 63.
F means to take the lower half, keeping rows 32 through 47.
B means to take the upper half, keeping rows 40 through 47.
B keeps rows 44 through 47.
F keeps rows 44 through 45.
The final F keeps the lower of the two, row 44.

The last three characters will be either L or R; these specify exactly one of the 8 columns of seats on the plane (numbered 0 through 7). The same process as above proceeds again, this time with only three steps. L means to keep the lower half, while R means to keep the upper half.

For example, consider just the last 3 characters of FBFBBFFRLR:

Start by considering the whole range, columns 0 through 7.
R means to take the upper half, keeping columns 4 through 7.
L means to take the lower half, keeping columns 4 through 5.
The final R keeps the upper of the two, column 5.
So, decoding FBFBBFFRLR reveals that it is the seat at row 44, column 5.

Every seat also has a unique seat ID: multiply the row by 8, then add the column. In this example, the seat has ID 44 * 8 + 5 = 357.

Here are some other boarding passes:

BFFFBBFRRR: row 70, column 7, seat ID 567.
FFFBBBFRRR: row 14, column 7, seat ID 119.
BBFFBBFRLL: row 102, column 4, seat ID 820.

As a sanity check, look through your list of boarding passes. What is the highest seat ID on a boarding pass?

Solution

The solution consist in interpret the seat code as a series of step-instruction (B, F, L, R…) to choose parts of array, representing the rows and columns in the airplane. The code bellow is already factorized, so I programmed a function that receives the sequence of steps (for columns or rows) and a full map of positions (columns or rows) and iterate in the steps to find the exact position.

# function to find the row (or the column) in the code instruction
# steps - array of instruction (BFLR...)
# pos.map - array to be used in the search
# lower.code - with char in the step must be interpreted as "get the lower half" of the map
findInMap <- function(steps, pos.map, lower.code){

  # for each char in the steps sequence
  for (step in steps){
    
    # we must get the lower or higher half of the map
    if(step==lower.code){
      index <- 1:(length(pos.map)/2)
    } else {
      index <- ((length(pos.map)/2)+1):length(pos.map)
    }
    
    # selects the correct half
    pos.map <- pos.map[index]
    
  }
  
  # returns the value found
  return(pos.map)
}

# auxiliary funcion to transform a seat CODE into a seat ID
calcSeatId <- function(code){
  # transforms the string into a char vector
  seat.code <- unlist(strsplit(code,""))
  # the first 7 chars identifies the row
  seat.row.code <- seat.code[1:7]
  # the last 3 chars identifies the column
  seat.col.code <- seat.code[8:10]
  
  # finds the right row
  rnum <- findInMap(seat.row.code, 0:127, "F")
  # finds the right col
  cnum <- findInMap(seat.col.code, 0:7, "L")
  # calculates the id
  return(rnum*8+cnum)
}

# reads the input data
input <- readr::read_lines("data/day05_input.txt")

# apply the calcSeatId in the codes
ids <- sapply(input, calcSeatId)

# get the highest seat ID on a boarding pass...
max(ids)
## [1] 880

Part Two

Ding! The “fasten seat belt” signs have turned on. Time to find your seat.

It’s a completely full flight, so your seat should be the only missing boarding pass in your list. However, there’s a catch: some of the seats at the very front and back of the plane don’t exist on this aircraft, so they’ll be missing from your list as well.

Your seat wasn’t at the very front or back, though; the seats with IDs +1 and -1 from yours will be in your list.

What is the ID of your seat?

Solution

It’s a simple problem, we just verify which Seat ID is missing in the boarding pass sequence, the fligh is full so, the missing one is our number.

# what are the initial and final seat code numbers?
min.seat <- min(ids)
max.seat <- max(ids)

# generate the whole sequence between
full.seats <- min.seat:max.seat

# check if there is a difference between the full sequence and the ids we have calculated 
setdiff(full.seats, ids)
## [1] 731

Day 6: Custom Customs

Part One

As your flight approaches the regional airport where you’ll switch to a much larger plane, customs declaration forms are distributed to the passengers.

The form asks a series of 26 yes-or-no questions marked a through z. All you need to do is identify the questions for which anyone in your group answers “yes”. Since your group is just you, this doesn’t take very long.

However, the person sitting next to you seems to be experiencing a language barrier and asks if you can help. For each of the people in their group, you write down the questions for which they answer “yes”, one per line. For example:

abcx
abcy
abcz

In this group, there are 6 questions to which anyone answered “yes”: a, b, c, x, y, and z. (Duplicate answers to the same question don’t count extra; each question counts at most once.)

Another group asks for your help, then another, and eventually you’ve collected answers from every group on the plane (your puzzle input). Each group’s answers are separated by a blank line, and within each group, each person’s answers are on a single line. For example:

abc

a
b
c

ab
ac

a
a
a
a

b

This list represents answers from five groups:

The first group contains one person who answered "yes" to 3 questions: a, b, and c.
The second group contains three people; combined, they answered "yes" to 3 questions: a, b, and c.
The third group contains two people; combined, they answered "yes" to 3 questions: a, b, and c.
The fourth group contains four people; combined, they answered "yes" to only 1 question, a.
The last group contains one person who answered "yes" to only 1 question, b.
In this example, the sum of these counts is 3 + 3 + 3 + 1 + 1 = 11.

For each group, count the number of questions to which anyone answered “yes”. What is the sum of those counts

Solution

The solution for this part one is really simple, we process the text separating the group responses, for each group responses we count the distinct answers and them sum up.

# reads the input data as array of lines
input <- readr::read_lines("data/day06_input.txt")

# for each empty line we change to a "|" (trick to split in groups latter)
input[input==""] <- "|"

# process the data
input %>% 
  # collapses into a single string
  str_c(collapse = "") %>% 
  # now splits in groups
  str_split("\\|") %>% 
  unlist() %>% # a vector of groups
  # for each group answer (a string of all answers)
  map_int(function(.x){
    # splits in a char vector
    .x %>% 
      strsplit("") %>% 
      unlist() %>% 
      # gets the unique values
      unique() %>% 
      # returns the count
      length() %>% 
      return()
  }) %>% 
  # sum them
  sum()
## [1] 6763

Part Two

As you finish the last group’s customs declaration, you notice that you misread one word in the instructions:

You don’t need to identify the questions to which anyone answered “yes”; you need to identify the questions to which everyone answered “yes”!

Using the same example as above:

abc

a
b
c

ab
ac

a
a
a
a

b

This list represents answers from five groups:

In the first group, everyone (all 1 person) answered "yes" to 3 questions: a, b, and c.
In the second group, there is no question to which everyone answered "yes".
In the third group, everyone answered yes to only 1 question, a. Since some people did not answer "yes" to b or c, they don't count.
In the fourth group, everyone answered yes to only 1 question, a.
In the fifth group, everyone (all 1 person) answered "yes" to 1 question, b.
In this example, the sum of these counts is 3 + 0 + 1 + 1 + 1 = 6.

For each group, count the number of questions to which everyone answered “yes”. What is the sum of those counts?

Solution

This one is more tricky. We have to keep the individual response separated, count the occurrences of a question in each individual response in the group and then check how many occurrences are equal to the number individuals in the group.

library(tidyverse)

# reads de input file as a vector of strings
input <- readr::read_lines("data/day06_input.txt")

# replace empty lines with a separator (to split it latter)
input[input==""] = "|"

# handles the dataset to create a list of groups and individuals responses
in.data <- input %>%
  # concats the strings marking the individual responses
  str_c(collapse = ">") %>% 
  # split in vector of groups
  str_split("\\|") %>% 
  unlist() %>% 
  # builds a list of groups with a vector of individual responses
  map(function(.x){
    resp <- unlist(str_split(.x, ">"))
    return(resp[resp!=""])
  })

# puts the group in a tibble
tibble(
    responses = in.data
  ) %>% 
  # adds an "ID" for each group
  mutate( g.id = row_number() ) %>% 
  # for each group create a tibble if individual question
  mutate( questions = map(responses, function(.r){
    tibble(
      # counts the number of individual responses in each group
      n.group  = length(.r),
      question = unlist(strsplit(.r,""))
      ) %>% 
        # counts the common question between them
        add_count(question) %>% 
        # gets the only common question
        filter(n.group==n) %>%
        # returns a tibble with the questions common in the individual
        # responses inside agroup
        select(question) %>% 
        distinct() %>% 
        return()
  })) %>% 
  # count the common questions for each group
  unnest(questions) %>% 
  count(g.id) %>% 
  # sum then up
  summarise( resp=sum(n))
## # A tibble: 1 x 1
##    resp
##   <int>
## 1  3512

To be continued…

I’ll make the rest of puzzles in the next days and publish them here, see you…