Contents

Forecasting Fantasy Games Using Monte Carlo Simulations

The football season is back, and with it the Fantasy Game! In this post, we will simulate the results and the scoring of my Fantasy League games. To do that, we’ll project the scoring of teams using Monte Carlo simulation with data scraped from sites that predicts players' performances. We will combine the various possible scores of a team’s players to estimate the team’s score distribution, and then compare with the opposing team, and finally compute each team’s chances of winning and losing.

Abstract

The season of American football is back, and with it the Fantasy, the already traditional online game which you bring your friends or coworkers to play together in a virtual league, where each member rosters NFL’s players on virtual teams and hoping that they will score well in their real life games. The real life player’s score goes to your virtual team score.

ffanalytics package

The PhD in clinical psychology and assistant professor Isaac Petersen author of the site Fantasy Football Analytics, who does projections and analysis of Fantasy results, did a great job with the ffanalytics package made available in GitHub.

This package does data scrapping in various sites that make predictions of player’s performances such as ESPN, CBS, Yahoo and the NFL website itself, after, applies the fantasy scoring rules (which can even be cutomized for your League) and calculates the score possible for each of the projections.

Finally, the package analyzes the points obtained by making performance projections of the results, aggregating in one vision the predictions of several sites. Isaac publishes weekly the ranking of projections by position for the games of the round, using some standards scoring rules.

With all the hard work of doing data scrapping and apply the rules of fantasy to calculate the score already made by the package, we can use these informations to project the results of teams scaled in fantasy leagues and to forecast game results, remaining only to obtain the teams and their rosters from Fantasy itself.

Fantasy API - Getting the Team’s Matchups and Rosters

In order to obtain the rounds of a fantasy league, we can use the Web API available by the Fantasy website. Although it has some depreciated methods they still work and serve the purpose of getting the information we want. In particular we need access the methods that tells us which games /league/matchups is schedule for a week. This API receives as input parameters the authentication token, the id of the league and the week of interest, returning the games scheduled for that week. We also will use the API /league/team/matchup that, in addition to the above parameters, also gets the team id to return the team roster.

We can invoke the API using the httr package and process the response json using jsonlite.

1
2
3
4
5
6
7
8
# Storing the Access Token and League ID locally
# I use a yalm file to avoid hard-code them 
# or eventually version them in the GitHub :)
library(yaml)

config <- yaml.load_file("../../config/config.yml")
leagueId <- config$leagueId
authToken <- config$authToken
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# invoking the API
library(httr)
library(glue) # to easily replace vars in the url

# league/matchups url
url <- "http://api.fantasy.nfl.com/v1/league/matchups?leagueId={leagueId}&week={week}&format=json&authToken={authToken}"
week <- 5

# call the api
resp <- httr::GET(glue(url))
1
2
# Is it ok?
resp$status_code
1
## [1] 200

Once the call response is obtained, we treat the return * json * to organize the data and obtain the team rosters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(jsonlite)
library(tidyverse)
library(kableExtra)

# to convert the json in a "tabular-tibble form"
resp %>% 
  httr::content(as="text") %>%
  fromJSON(simplifyDataFrame = T) %$%  
  leagues %$%
  matchups %>%
  .[[1]] %>% 
  jsonlite::flatten() %>% 
  as.tibble() -> matchups

 matchups %>% 
   select(awayTeam.id, awayTeam.name, homeTeam.name, homeTeam.id) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
awayTeam.id awayTeam.name homeTeam.name homeTeam.id
1 Change Robots Rio Claro Pfeiferians 6
7 NJ's Bugre Sorocaba Steelers 5
11 Campinas Giants Amparo Bikers 4
2 Sorocaba Wild Mules Indaiatuba Riders 3

We make new calls to the API to get the roster of each team in that week.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# for each teamIds in the matchup
c(matchups$awayTeam.id) %>%
  map(
    function(.teamId, .week, .leagueId, .authToken, .url) {
      # make the API call
      httr::GET(glue(.url)) %>%
        httr::content(as = "text") %>%
        fromJSON(simplifyDataFrame = T) %>% # transform response body in json
        return()
    },
    .week      = week,
    .leagueId  = leagueId,
    .authToken = authToken,
    .url       = "http://api.fantasy.nfl.com/v1/league/team/matchup?leagueId={.leagueId}&teamId={.teamId}&week={.week}&authToken={.authToken}&format=json"
  )  -> rosters.json
1
2
# this is a list with the team rosters used in this week
rosters.json[[1]]$leagues$matchup$homeTeam$name
1
## [1] "Rio Claro Pfeiferians"
1
2
3
4
5
rosters.json[[1]]$leagues$matchup$homeTeam$players[[1]] %>%
  select(id, name, position, teamAbbr) %>% 
  as.tibble() %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
id name position teamAbbr
2558125 Patrick Mahomes QB KC
2507164 Adrian Peterson RB WAS
2543773 James White RB NE
2508061 Antonio Brown WR PIT
2556370 Michael Thomas WR NO
2558266 George Kittle TE SF
2555430 Alex Collins RB BAL
1581 DeSean Jackson WR TB
2540158 Zach Ertz TE PHI
2540160 Jordan Reed TE WAS
2558063 Deshaun Watson QB HOU
2558865 Chris Carson RB SEA
2559169 Austin Ekeler RB LAC
2507232 Mason Crosby K GB
100011 Green Bay Packers DEF GB

With the team’s rosters (json format) we process the data to facilitate the handling.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# auxiliar transformation to extract team roster
extractTeam <- . %>% 
  .$players %>% 
  .[[1]] %>% 
  select( src_id=id, name, position, rosterSlot, fantasyPts ) %>%
  jsonlite::flatten() %>% 
  as.tibble() %>% 
  select(-fantasyPts.week.season, -fantasyPts.week.week ) %>% 
  rename(points = fantasyPts.week.pts) %>% 
  mutate(
    src_id = as.integer(src_id), 
    points = as.numeric(points)
  )

# extract each roster
rosters.json %>% 
  map(function(.json){
    matchup <- .json$leagues$matchup
    tibble(
      home.teamId = as.integer(matchup$homeTeam$id),
      home.name   = matchup$homeTeam$name,
      home.logo   = matchup$homeTeam$logoUrl,
      home.pts    = as.numeric(matchup$homeTeam$pts),
      home.roster = list(extractTeam(matchup$homeTeam)),
      away.teamId = as.integer(matchup$awayTeam$id),
      away.name   = matchup$awayTeam$name,
      away.logo   = matchup$awayTeam$logoUrl,
      away.pts    = as.numeric(matchup$awayTeam$pts),
      away.roster = list(extractTeam(matchup$awayTeam))
    ) %>% 
      return()
  }) %>% bind_rows() -> matchups.rosters

# check the matchups QBs for each team 
matchups.rosters %>% 
  mutate( away.qb = map(away.roster, function(roster) roster %>% filter(rosterSlot=="QB")),
          home.qb = map(home.roster, function(roster) roster %>% filter(rosterSlot=="QB")) ) %>%
  unnest(away.qb, home.qb, .sep=".") %>% 
  select(away.team = away.name, away.qb.name, home.qb.name, home.team=home.name ) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
away.team away.qb.name home.qb.name home.team
Change Robots Aaron Rodgers Patrick Mahomes Rio Claro Pfeiferians
NJ's Bugre Russell Wilson Ben Roethlisberger Sorocaba Steelers
Campinas Giants Tom Brady Drew Brees Amparo Bikers
Sorocaba Wild Mules Matt Ryan Cam Newton Indaiatuba Riders

Now we have a tibble with the games between the teams and, nested in each registry, the respective rosters. Now you will need to use ffanalytis package to get the prediction performance and score of each player.

Forecast players perform

Firstly, we will use the ffanalytics package to do the data scraping of the forecasts for each player in the league made by the main sites that follow and make this type of prediction.

1
2
3
4
library(ffanalytics)
scrap <- scrape_data(pos = c("QB", "RB", "WR", "TE", "K", "DST"),
                     season = 2018,
                     week = week)

The scrape_data function returns a list by position, with the performance projections of the players in that position. This is because the predictions for each position have different attributes, for example, Kickers are evaluated by the number of field goals and distances of the kicks, and Quaterbacks by the numbers and distances of the passes.

1
2
3
4
5
6
# Quaterback Projection Attributes
scrap$QB %>%  
  filter(player=="Drew Brees") %>% 
  select(4:10) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
player team pass_att pass_comp pass_yds pass_tds pass_int
Drew Brees NO 37.60 26.10 287.00 2.00 0.70
Drew Brees NO 37.30 26.20 272.70 1.70 0.60
Drew Brees NO 38.20 26.50 305.00 1.90 0.60
Drew Brees NO 44.00 29.00 305.00 3.00 1.00
Drew Brees NO 38.20 26.60 305.00 2.00 0.60
Drew Brees NO 39.51 25.99 309.47 2.51 0.63
Drew Brees NO NA NA 283.36 1.94 0.73
1
2
3
4
5
6
# Kickers Projection Attributes
scrap$K %>%  
  filter(player=="Justin Tucker") %>% 
  select(4:10) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
player team fg fg_att fglg xp xpatt
Justin Tucker BAL 1.80 1.90 0 2.60 2.6
Justin Tucker Bal 1.90 2.00 NA 2.20 NA
Justin Tucker BAL 2.00 2.30 NA 2.30 NA
Justin Tucker BAL 2.00 NA NA 3.00 NA
Justin Tucker BAL 1.86 2.25 NA 2.29 NA
Justin Tucker Bal NA NA NA 2.50 NA
Justin Tucker BAL NA NA NA 1.94 NA

Secondly, with projections in hand, we use ffanalytics package again to calculate how many points each player will make according with each prediction scraped from the sites. However, the package does not export the function that does the this individual calculation, but it is a necessary step to calculate the projections table that the site uses in its graphics.

But the package project is in the GitHub, so, it is possible to download the code, load the scripts directly and access the function that calculates the points per player and projection site. The function is called source_points(), and is present in the script calc_projections.R. You can load the script (and its dependencies) to invoke it directly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# function to access 'source_points' directly
playerPointsProjections <- function(.scrap, .score_rules){
  source("../ffanalytics/R/calc_projections.R")
  source("../ffanalytics/R/stats_aggregation.R")
  source("../ffanalytics/R/source_classes.R")
  source("../ffanalytics/R/custom_scoring.R")
  source("../ffanalytics/R/scoring_rules.R")
  source("../ffanalytics/R/make_scoring.R")
  source("../ffanalytics/R/recode_vars.R")
  source("../ffanalytics/R/impute_funcs.R")
  source_points(.scrap, .score_rules)
}

# customized scoring rules
source("./score_settings.R") 
players.points <- playerPointsProjections(scrap, dudes.score.settings)
pos data_src id points
K CBS 8359 10
K CBS 12956 9
K CBS 11936 9
K CBS 6789 9
K CBS 8930 9
K CBS 11384 9

Merging Rosters and Predictions

We now have the teams rosters and the scoring projections of the sites for each player, so we need to join the datasets. But to do that it is necessary to match the players' ids. If you notice the data displayed, each player’s ID is different on each of the sites, ffanalytics package names this id as src_id, but unifies the results to a unified, identificator named id.

The teams' rosters were obtained from the fantasy site, it follows the src_id identification of the NFL, to make the merge between the two dataset it will be necessary to map the src_id of the NFL to id of ffanalytics package. We can extract this ‘ids’ mapping from NFL prediction scraped data:

1
2
3
4
5
6
7
# look the presence of both ids in the projection table
scrap$WR %>% 
  filter( data_src=="NFL" ) %>% 
  select(1:4) %>% 
  head() %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
data_src id src_id player
NFL 11675 2543495 Davante Adams
NFL 12181 2552600 Nelson Agholor
NFL 10651 2530660 Kamar Aiken
NFL 11222 2540154 Keenan Allen
NFL 9308 2649 Danny Amendola
NFL 12930 2556462 Robby Anderson
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# extracting id and src_id from all positions
scrap %>%
  map(function(dft){
    dft %>% 
      filter(data_src=="NFL") %>% 
      select(id, src_id, player, team, pos) %>% 
      return()
  }) %>% 
  bind_rows() %>%
  distinct() -> players.ids
1
2
3
4
# ID mapping
head(players.ids) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
id src_id player team pos
13589 2560955 Josh Allen BUF QB
13125 2557922 C.J. Beathard SF QB
11642 2543477 Blake Bortles JAX QB
9817 497095 Sam Bradford ARI QB
5848 2504211 Tom Brady NE QB
4925 2504775 Drew Brees NO QB

Finally we can make the predictions merging of players to the team’s rankings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# nest by "id" and merge with "src_id"
players.points %>% 
  select(-pos) %>% 
  group_by(id) %>% 
  nest(.key="points.range") %>%
  # merge ID with SRC_ID
  inner_join(players.ids, by = c("id")) %>%
  select(id, src_id, player, pos, team, points.range) %>% 
  # keep only "ids" at top level
  select(id, src_id, points.range) -> players.ids.points

# auxiliary function to merge roster with player points
mergePoints <- function(.roster, .points){
  .roster %>% 
    left_join(.points, by="src_id") %>% 
    return()
}

# merge points in rosters
matchups.rosters %>% 
  mutate(
    home.roster = map(home.roster, mergePoints, .points=players.ids.points),
    away.roster = map(away.roster, mergePoints, .points=players.ids.points)
  ) -> matchups.points

Note that we are using a structure of nested data.frames, i.e., we have a matchups data.frame where each line is a match. In each match there are two rosters columns (“home” and “visitor”), these columns hold another data.frame, containing the roster itself. In this data.frame, each line is a player, and for each player there is a column called points.range which also contains another data.frame, with the prediction of each site’s player scores.

1
2
3
# "father" dataframe and the first nested column
matchups.points %>% 
  select( home.name, home.roster ) 
1
2
3
4
5
6
7
## # A tibble: 4 x 2
##   home.name             home.roster      
##   <chr>                 <list>           
## 1 Rio Claro Pfeiferians <tibble [15 x 7]>
## 2 Sorocaba Steelers     <tibble [15 x 7]>
## 3 Amparo Bikers         <tibble [15 x 7]>
## 4 Indaiatuba Riders     <tibble [15 x 7]>
1
2
# seeing the first nested data.frame
matchups.points[1,]$home.roster[[1]]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## # A tibble: 15 x 7
##     src_id name              position rosterSlot points    id points.range     
##      <int> <chr>             <chr>    <chr>       <dbl> <int> <list>           
##  1 2558125 Patrick Mahomes   QB       QB           15.8 13116 <tibble [9 x 2]> 
##  2 2507164 Adrian Peterson   RB       RB            4.2  8658 <tibble [9 x 2]> 
##  3 2543773 James White       RB       RB           13.7 11747 <tibble [9 x 2]> 
##  4 2508061 Antonio Brown     WR       WR           22.1  9988 <tibble [9 x 2]> 
##  5 2556370 Michael Thomas    WR       WR            7.4 12652 <tibble [9 x 2]> 
##  6 2558266 George Kittle     TE       TE            8.3 13299 <tibble [9 x 2]> 
##  7 2555430 Alex Collins      RB       W/R           6.6 12628 <tibble [8 x 2]> 
##  8    1581 DeSean Jackson    WR       BN            0    9075 <tibble [4 x 2]> 
##  9 2540158 Zach Ertz         TE       BN           17   11247 <tibble [10 x 2]>
## 10 2540160 Jordan Reed       TE       BN            2.1 11248 <tibble [9 x 2]> 
## 11 2558063 Deshaun Watson    QB       BN           21   13113 <tibble [9 x 2]> 
## 12 2558865 Chris Carson      RB       BN           12.7 13364 <tibble [8 x 2]> 
## 13 2559169 Austin Ekeler     RB       BN           11.9 13404 <tibble [6 x 2]> 
## 14 2507232 Mason Crosby      K        K             3    8742 <tibble [10 x 2]>
## 15  100011 Green Bay Packers DEF      DEF           2     523 <tibble [8 x 2]>
1
2
# look the second level dataframe
matchups.points[1,]$home.roster[[1]][1,]$points.range[[1]]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## # A tibble: 9 x 2
##   data_src      points
##   <chr>          <dbl>
## 1 CBS             16  
## 2 ESPN            18.7
## 3 FantasyPros     22.3
## 4 FantasySharks   19.5
## 5 FFToday         20.5
## 6 FleaFlicker     24.7
## 7 NFL             19.1
## 8 NumberFire      24.6
## 9 Yahoo           18.8

Nested data.frames is a convenient model because it allows you to keep the data together and manipulate them easily.

Monte Carlo Simulation

To simulate the result of round matches, we need to simulate the score obtained by each teams and for this we will simulate the score of the team members using Monte Carlo simulation.

To simulate the players' scores we will consider that each of the players can make one of the scores projected by the forecast sites. For simplicity, in this post, we can assume that the odds are equal for any of the projected scores.

In this case the simulation of a match using Monte Carlo then consists in:

  1. For each player of the team, draws one of the possible projected numbers
  2. We sum the players' points drawed: this will be the team score
  3. Compare the score of the home team with the away team to determine who won
  4. A win is computed for the team with the highest score

This procedure is repeated N times, simulating several matchs, to determine the chances of winning a team, we sum the total number of times in which the team was a winner and divide by the total number of simulations. Thus we will have the chances of each team winning the match, once the simulations reflect the numerous combinations of scores between players and their teams.

Note that we assume that each player has equal chance of having any of the projected scores as a simulation score, more sophisticated models could consider different ranges with different probabilities between projections, including assessing the performance history of the site, but I’ll leave this considerations to another future post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
### Auxiliary functions

# function to generate .n possible pontuations from .points.range
# it's used to generate the .n simulations to each player
simPlayer <- function(.points.range, .n){

  # just check if the points.range isn't empty
  if(is.null(.points.range)) return( vector(mode = "numeric",.n) )
  if(nrow(.points.range)==0) return( vector(mode = "numeric",.n) )

  # generate a .n vector samples from points.range
  .points.range$points %>% 
    sample(size = .n, replace = T) %>%
    return()

}

# function to add the player pontuation to the team roster dataframe
simTeam <- function(.roster, .n){
  .roster %>% 
    mutate( sim.player = map(points.range, simPlayer, .n=.n) )  %>% 
    return()
}

# this function is in charge to sum the pontuations from 
# each player to generate the .n-size vector with team pontuation
simTeamPoints <- function(.roster){
  .roster %>% 
    filter(rosterSlot!="BN") %>%  # exclude player in bench
    pull(sim.player) %>%          # get the player pontuation simulation
    bind_cols() %>%               # binds the pontuation toghether 
    as.matrix() %>%               # now we have an matrix with # players x # .n simulations
    rowSums(na.rm = T) %>%        # sum each row (simuilation) to get a .n-vector 
    return()                      # each position in this vector is a possible team pontuation
}

### Simulation Code

# number of simulations
n <- 2000

# in the matchups dataframe
matchups.points %>% 
  mutate(
    # just team nicknames to shorter legends :)
    away.nickname = gsub("([a-zA-Z\']+ )?", "", away.name),
    home.nickname = gsub("([a-zA-Z\']+ )?", "", home.name)
  ) %>% 
  mutate(
    home.roster  = map(home.roster, simTeam, .n=n), # add players simulation points
    away.roster  = map(away.roster, simTeam, .n=n), # to each roster
    home.sim.pts = map(home.roster, simTeamPoints), # computes the team simulation
    away.sim.pts = map(away.roster, simTeamPoints)  # points
  ) %>% 
  mutate( 
    home.win    = map2(home.sim.pts, away.sim.pts, function(.x,.y) (.x > .y) ), # computes the 
    away.win    = map(home.win, function(.x) (!.x)), # number of victures of each team
    home.win.prob = map_dbl(home.win, mean, na.rm = T),  # the % of victories
    away.win.prob = map_dbl(away.win, mean, na.rm = T)   # the % of victories
  ) %>%
  mutate(
    # this calculate the difference of score points in each simulation
    score.diff    = map2(home.sim.pts, away.sim.pts, function(.x,.y){.x - .y})
  ) -> simulation

Now we have a pontuation curve for each player in the roster and also the pontuation curve of each team, let’s see what are the results.

Simulation Results

Let’s compare the difference of score for each match in league, the difference of score will allow us to calculate the chances of victory that each team has, according to the amount of “winning” simulations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# return a summary as a tibble
summaryAsTibble <- . %>% summary() %>% as.list() %>% as.tibble()

# first, lets build team simulation summary
c("home","away") %>% 
  map(function(.prefix, .matchups.sim){
    .matchups.sim %>% 
      select( starts_with(.prefix)) %>% 
      set_names(gsub(pattern = paste0(.prefix,"\\."),replacement = "",x=names(.))) %>% 
      mutate( points = map(sim.pts, summaryAsTibble) ) %>% 
      select(-roster, -win) %>% 
      unnest(points, .sep=".") %>% 
      return()
  },
  .matchups.sim = simulation) %>% 
  bind_rows() -> sim.results

# visualizing the summary 
sim.results %>% 
  select(nickname, win.prob, points=points.Median) %>% 
  mutate(win.prob = win.prob * 100) %>% 
  mutate_at(2:3, round, digits=1) %>% 
  arrange(desc(points)) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
nickname win.prob points
Mules 92.2 106.6
Steelers 92.2 105.9
Robots 77.5 104.8
Giants 68.0 104.0
Bikers 32.0 99.7
Pfeiferians 22.5 98.0
Riders 7.8 94.8
Bugre 7.8 92.4

We can see the points scored and the chance of victory (win.prob). We used the median of the distribution as the best projected score (the one who divides the simulated score by 50% chance). How “safe” is the projected score? We need to visualize the distribution of possible scores to get a better view of the certainty of the projected score.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# lets plot the points distribution from simulation
library(tidybayes) # stat_intervalh
sim.results %>% 
  select( nickname, med.pts = points.Median, sim.pts ) %>% 
  mutate( 
    nickname = as.factor(nickname),
    sim.pts = map(sim.pts, base::sample, size=40) # just to reduce de number of point to be ploted
  ) %>% 
  unnest(sim.pts) %>% 
  ggplot(aes(y=reorder(nickname, med.pts))) +
  stat_intervalh(aes(x=sim.pts), .width = c(seq(.05,.95,.1))) +
  scale_color_brewer() +
  geom_point(aes(x=sim.pts), alpha=.1) +
  theme_minimal() +
  ylab("teams") + xlab("points") +
  theme(legend.position = "none")

Showing the distribution of points instead of just the most probable score it is possible to see more details about the possibles performances of a team. The same can be visualized with the chances of victory, instead of just counting the number of times that the simulations of the matches point to the victory of a team, we can visualize the distribution of the difference in the score in each game, generating a curve of probability for each possible outcome.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
simulation %>% 
  mutate(game=paste0(away.nickname, " @ ", home.nickname)) %>% 
  arrange(away.nickname) %>% 
  select(game, score.diff) %>%
  unnest() %>% 
  ggplot(aes(fill=game)) +
  geom_density(aes(score.diff), alpha=.6) +
  geom_vline(aes(xintercept=0),
             linetype=2, color="red") +
  facet_grid(rows=vars(game), switch = "x") +
  theme_minimal() +
  theme( legend.position = "bottom" )

1
2
3
4
5
6
simulation %>% 
  arrange(away.nickname) %>% 
  mutate_at(vars(away.win.prob, home.win.prob), function(x) round(100*x,1)) %>% 
  select(away.nickname, away.win.prob, home.win.prob, home.nickname)  %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
away.nickname away.win.prob home.win.prob home.nickname
Bugre 7.8 92.2 Steelers
Giants 68.0 32.0 Bikers
Mules 92.2 7.8 Riders
Robots 77.5 22.5 Pfeiferians

Conclusion

We have seen that it is possible to use NFL players' performance projections, available on various websites, to calculate fantasy scores and to simulate, using Monte Carlo, the outcame of a league games. More sophisticated simulation models can be used, taking into account the historical distribution of the accuracy of the estimates of these sites to calculate a greater number of results possibilities.

Today, in my league, before the start of the round, after waivers and the lineups, I I send a dashboard (made using RMarkdown and [Flexdashboard](https: / /rmarkdown.rstudio.com/flexdashboard/using.html)) to members with simulation results and the performance of their rosters. You can see an example of it here: http://rpubs.com/gsposito/ffsimulationDudes]. As an evolution, in the future, I may tranform this in to a ShinyApp to members be abble to simulate several different rosters combinations to choose the most promising one.

Prediction Evaluation

Before concluding it is worth comparing the simulation made with the actual scores, and evaluating how much the simulation projected came close to the obtained real result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# comparing simulated values with real values
simulation %>% 
  mutate( 
    away.win.real   = away.pts > home.pts,
    home.win.real   = home.pts > away.pts,
    score.diff.real = home.pts - away.pts,
    away.sim.pts = map_dbl(away.sim.pts, median, na.rm=T),
    home.sim.pts = map_dbl(home.sim.pts, median, na.rm=T),
    score.diff   = map_dbl(score.diff, median, na.rm=T )
  ) %>% 
  mutate_at( vars(away.win.prob, home.win.prob), function(x) round(100*x,2) )%>% 
  select( away.nickname, away.win.prob, away.win.real, away.sim.pts, away.pts, score.diff, score.diff.real, 
          home.pts, home.sim.pts, home.win.real, home.win.prob, home.nickname ) %>% 
  mutate_if(is.numeric, round, digits=1) %>% 
  arrange(away.nickname) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
away.nickname away.win.prob away.win.real away.sim.pts away.pts score.diff score.diff.real home.pts home.sim.pts home.win.real home.win.prob home.nickname
Bugre 7.8 FALSE 92.4 112.6 13.3 13.7 126.3 105.9 TRUE 92.2 Steelers
Giants 68.0 TRUE 104.0 108.7 -4.2 -11.5 97.2 99.7 FALSE 32.0 Bikers
Mules 92.2 TRUE 106.6 98.4 -12.0 -8.3 90.1 94.8 FALSE 7.8 Riders
Robots 77.5 TRUE 104.8 113.6 -6.7 -30.4 83.1 98.0 FALSE 22.5 Pfeiferians

This was good! The simulated results were satisfactorily close to those obtained in 3 of the 4 games. All victories and defeats were correctly predicted. Only one of the games got a score difference far away from the one projected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# comparing score difference
simulation %>% 
  mutate(
    game=paste0(away.nickname, " @ ", home.nickname),
    score.diff.real = home.pts - away.pts
  ) %>% 
  arrange(away.nickname) %>% 
  select(game, score.diff, score.diff.real) %>%
  unnest() %>% 
  ggplot(aes(fill=game)) +
  geom_density(aes(score.diff), alpha=.6) +
  geom_vline(aes(xintercept=score.diff.real),
             linetype=1, size=1, color="black") +
  geom_vline(aes(xintercept=0),
             linetype=2, color="red") +
  facet_grid(rows=vars(game), switch = "x") +
  theme_minimal() +
  theme( legend.position = "bottom" )

Perhaps the reason why the difference in scoring in the game between Robots and Pfeiferians has fallen so far from the most likely is also by such an unlikely event in the Packers' game against the Lions. Here’s the lineup of the house team, the one who lost:

1
2
3
4
5
6
7
# rosted home team
simulation[1,]$home.roster[[1]] %>% 
  filter(rosterSlot != "BN") %>% 
  mutate(points.sim = map_dbl(sim.player,median, na.rm=T)) %>% 
  select(name, position, points) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)
name position points
Patrick Mahomes QB 15.82
Adrian Peterson RB 4.20
James White RB 13.70
Antonio Brown WR 22.10
Michael Thomas WR 7.40
George Kittle TE 8.30
Alex Collins RB 6.60
Mason Crosby K 3.00
Green Bay Packers DEF 2.00

In this game, Mason Crosby, Packers' Kicker missed 4 fields goals and 1 extra point, with a total of 13 points, an event rare, which has not happened since 1997. If Crosby had hit the shots, which he habitually does, the score difference would be only 10 points away from the predicted score, not 23!

But after all, who wants to predict accurately all possible situations?