Forecasting Fantasy Games Using Monte Carlo Simulations

Giuliano Sposito included in data science

2018-10-28 4368 words 21 minutes

/2018-10-28-forecasting-fantasy-games-using-monte-carlo-simulations/images/fantasy_montecarlo_cover.jpg

Contents

The football season is back, and with it the Fantasy Game! In this post, we will simulate the results and the scoring of my Fantasy League games. To do that, we’ll project the scoring of teams using Monte Carlo simulation with data scraped from sites that predicts players' performances. We will combine the various possible scores of a team’s players to estimate the team’s score distribution, and then compare with the opposing team, and finally compute each team’s chances of winning and losing.

Abstract

The season of American football is back, and with it the Fantasy, the already traditional online game which you bring your friends or coworkers to play together in a virtual league, where each member rosters NFL’s players on virtual teams and hoping that they will score well in their real life games. The real life player’s score goes to your virtual team score.

ffanalytics package

The PhD in clinical psychology and assistant professor Isaac Petersen author of the site Fantasy Football Analytics, who does projections and analysis of Fantasy results, did a great job with the ffanalytics package made available in GitHub.

This package does data scrapping in various sites that make predictions of player’s performances such as ESPN, CBS, Yahoo and the NFL website itself, after, applies the fantasy scoring rules (which can even be cutomized for your League) and calculates the score possible for each of the projections.

Finally, the package analyzes the points obtained by making performance projections of the results, aggregating in one vision the predictions of several sites. Isaac publishes weekly the ranking of projections by position for the games of the round, using some standards scoring rules.

With all the hard work of doing data scrapping and apply the rules of fantasy to calculate the score already made by the package, we can use these informations to project the results of teams scaled in fantasy leagues and to forecast game results, remaining only to obtain the teams and their rosters from Fantasy itself.

Fantasy API - Getting the Team’s Matchups and Rosters

In order to obtain the rounds of a fantasy league, we can use the Web API available by the Fantasy website. Although it has some depreciated methods they still work and serve the purpose of getting the information we want. In particular we need access the methods that tells us which games /league/matchups is schedule for a week. This API receives as input parameters the authentication token, the id of the league and the week of interest, returning the games scheduled for that week. We also will use the API /league/team/matchup that, in addition to the above parameters, also gets the team id to return the team roster.

We can invoke the API using the httr package and process the response json using jsonlite.

1
2
3
4
5
6
7
8


# Storing the Access Token and League ID locally
# I use a yalm file to avoid hard-code them 
# or eventually version them in the GitHub :)
library(yaml)

config <- yaml.load_file("../../config/config.yml")
leagueId <- config$leagueId
authToken <- config$authToken

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


# invoking the API
library(httr)
library(glue) # to easily replace vars in the url

# league/matchups url
url <- "http://api.fantasy.nfl.com/v1/league/matchups?leagueId={leagueId}&week={week}&format=json&authToken={authToken}"
week <- 5

# call the api
resp <- httr::GET(glue(url))

1
2


# Is it ok?
resp$status_code

1

## [1] 200

Once the call response is obtained, we treat the return * json * to organize the data and obtain the team rosters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


library(jsonlite)
library(tidyverse)
library(kableExtra)

# to convert the json in a "tabular-tibble form"
resp %>% 
  httr::content(as="text") %>%
  fromJSON(simplifyDataFrame = T) %$%  
  leagues %$%
  matchups %>%
  .[[1]] %>% 
  jsonlite::flatten() %>% 
  as.tibble() -> matchups

 matchups %>% 
   select(awayTeam.id, awayTeam.name, homeTeam.name, homeTeam.id) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

awayTeam.id	awayTeam.name	homeTeam.name	homeTeam.id
1	Change Robots	Rio Claro Pfeiferians	6
7	NJ's Bugre	Sorocaba Steelers	5
11	Campinas Giants	Amparo Bikers	4
2	Sorocaba Wild Mules	Indaiatuba Riders	3

We make new calls to the API to get the roster of each team in that week.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# for each teamIds in the matchup
c(matchups$awayTeam.id) %>%
  map(
    function(.teamId, .week, .leagueId, .authToken, .url) {
      # make the API call
      httr::GET(glue(.url)) %>%
        httr::content(as = "text") %>%
        fromJSON(simplifyDataFrame = T) %>% # transform response body in json
        return()
    },
    .week      = week,
    .leagueId  = leagueId,
    .authToken = authToken,
    .url       = "http://api.fantasy.nfl.com/v1/league/team/matchup?leagueId={.leagueId}&teamId={.teamId}&week={.week}&authToken={.authToken}&format=json"
  )  -> rosters.json

1
2


# this is a list with the team rosters used in this week
rosters.json[[1]]$leagues$matchup$homeTeam$name

1

## [1] "Rio Claro Pfeiferians"

1
2
3
4
5


rosters.json[[1]]$leagues$matchup$homeTeam$players[[1]] %>%
  select(id, name, position, teamAbbr) %>% 
  as.tibble() %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

id	name	position	teamAbbr
2558125	Patrick Mahomes	QB	KC
2507164	Adrian Peterson	RB	WAS
2543773	James White	RB	NE
2508061	Antonio Brown	WR	PIT
2556370	Michael Thomas	WR	NO
2558266	George Kittle	TE	SF
2555430	Alex Collins	RB	BAL
1581	DeSean Jackson	WR	TB
2540158	Zach Ertz	TE	PHI
2540160	Jordan Reed	TE	WAS
2558063	Deshaun Watson	QB	HOU
2558865	Chris Carson	RB	SEA
2559169	Austin Ekeler	RB	LAC
2507232	Mason Crosby	K	GB
100011	Green Bay Packers	DEF	GB

With the team’s rosters (json format) we process the data to facilitate the handling.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41


# auxiliar transformation to extract team roster
extractTeam <- . %>% 
  .$players %>% 
  .[[1]] %>% 
  select( src_id=id, name, position, rosterSlot, fantasyPts ) %>%
  jsonlite::flatten() %>% 
  as.tibble() %>% 
  select(-fantasyPts.week.season, -fantasyPts.week.week ) %>% 
  rename(points = fantasyPts.week.pts) %>% 
  mutate(
    src_id = as.integer(src_id), 
    points = as.numeric(points)
  )

# extract each roster
rosters.json %>% 
  map(function(.json){
    matchup <- .json$leagues$matchup
    tibble(
      home.teamId = as.integer(matchup$homeTeam$id),
      home.name   = matchup$homeTeam$name,
      home.logo   = matchup$homeTeam$logoUrl,
      home.pts    = as.numeric(matchup$homeTeam$pts),
      home.roster = list(extractTeam(matchup$homeTeam)),
      away.teamId = as.integer(matchup$awayTeam$id),
      away.name   = matchup$awayTeam$name,
      away.logo   = matchup$awayTeam$logoUrl,
      away.pts    = as.numeric(matchup$awayTeam$pts),
      away.roster = list(extractTeam(matchup$awayTeam))
    ) %>% 
      return()
  }) %>% bind_rows() -> matchups.rosters

# check the matchups QBs for each team 
matchups.rosters %>% 
  mutate( away.qb = map(away.roster, function(roster) roster %>% filter(rosterSlot=="QB")),
          home.qb = map(home.roster, function(roster) roster %>% filter(rosterSlot=="QB")) ) %>%
  unnest(away.qb, home.qb, .sep=".") %>% 
  select(away.team = away.name, away.qb.name, home.qb.name, home.team=home.name ) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

away.team	away.qb.name	home.qb.name	home.team
Change Robots	Aaron Rodgers	Patrick Mahomes	Rio Claro Pfeiferians
NJ's Bugre	Russell Wilson	Ben Roethlisberger	Sorocaba Steelers
Campinas Giants	Tom Brady	Drew Brees	Amparo Bikers
Sorocaba Wild Mules	Matt Ryan	Cam Newton	Indaiatuba Riders

Now we have a tibble with the games between the teams and, nested in each registry, the respective rosters. Now you will need to use ffanalytis package to get the prediction performance and score of each player.

Forecast players perform

Firstly, we will use the ffanalytics package to do the data scraping of the forecasts for each player in the league made by the main sites that follow and make this type of prediction.

1
2
3
4


library(ffanalytics)
scrap <- scrape_data(pos = c("QB", "RB", "WR", "TE", "K", "DST"),
                     season = 2018,
                     week = week)

The scrape_data function returns a list by position, with the performance projections of the players in that position. This is because the predictions for each position have different attributes, for example, Kickers are evaluated by the number of field goals and distances of the kicks, and Quaterbacks by the numbers and distances of the passes.

1
2
3
4
5
6


# Quaterback Projection Attributes
scrap$QB %>%  
  filter(player=="Drew Brees") %>% 
  select(4:10) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

player	team	pass_att	pass_comp	pass_yds	pass_tds	pass_int
Drew Brees	NO	37.60	26.10	287.00	2.00	0.70
Drew Brees	NO	37.30	26.20	272.70	1.70	0.60
Drew Brees	NO	38.20	26.50	305.00	1.90	0.60
Drew Brees	NO	44.00	29.00	305.00	3.00	1.00
Drew Brees	NO	38.20	26.60	305.00	2.00	0.60
Drew Brees	NO	39.51	25.99	309.47	2.51	0.63
Drew Brees	NO	NA	NA	283.36	1.94	0.73

1
2
3
4
5
6


# Kickers Projection Attributes
scrap$K %>%  
  filter(player=="Justin Tucker") %>% 
  select(4:10) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

player	team	fg	fg_att	fglg	xp	xpatt
Justin Tucker	BAL	1.80	1.90	0	2.60	2.6
Justin Tucker	Bal	1.90	2.00	NA	2.20	NA
Justin Tucker	BAL	2.00	2.30	NA	2.30	NA
Justin Tucker	BAL	2.00	NA	NA	3.00	NA
Justin Tucker	BAL	1.86	2.25	NA	2.29	NA
Justin Tucker	Bal	NA	NA	NA	2.50	NA
Justin Tucker	BAL	NA	NA	NA	1.94	NA

Secondly, with projections in hand, we use ffanalytics package again to calculate how many points each player will make according with each prediction scraped from the sites. However, the package does not export the function that does the this individual calculation, but it is a necessary step to calculate the projections table that the site uses in its graphics.

But the package project is in the GitHub, so, it is possible to download the code, load the scripts directly and access the function that calculates the points per player and projection site. The function is called source_points(), and is present in the script calc_projections.R. You can load the script (and its dependencies) to invoke it directly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# function to access 'source_points' directly
playerPointsProjections <- function(.scrap, .score_rules){
  source("../ffanalytics/R/calc_projections.R")
  source("../ffanalytics/R/stats_aggregation.R")
  source("../ffanalytics/R/source_classes.R")
  source("../ffanalytics/R/custom_scoring.R")
  source("../ffanalytics/R/scoring_rules.R")
  source("../ffanalytics/R/make_scoring.R")
  source("../ffanalytics/R/recode_vars.R")
  source("../ffanalytics/R/impute_funcs.R")
  source_points(.scrap, .score_rules)
}

# customized scoring rules
source("./score_settings.R") 
players.points <- playerPointsProjections(scrap, dudes.score.settings)

pos	data_src	id	points
K	CBS	8359	10
K	CBS	12956	9
K	CBS	11936	9
K	CBS	6789	9
K	CBS	8930	9
K	CBS	11384	9

Merging Rosters and Predictions

We now have the teams rosters and the scoring projections of the sites for each player, so we need to join the datasets. But to do that it is necessary to match the players' ids. If you notice the data displayed, each player’s ID is different on each of the sites, ffanalytics package names this id as src_id, but unifies the results to a unified, identificator named id.

The teams' rosters were obtained from the fantasy site, it follows the src_id identification of the NFL, to make the merge between the two dataset it will be necessary to map the src_id of the NFL to id of ffanalytics package. We can extract this ‘ids’ mapping from NFL prediction scraped data:

1
2
3
4
5
6
7


# look the presence of both ids in the projection table
scrap$WR %>% 
  filter( data_src=="NFL" ) %>% 
  select(1:4) %>% 
  head() %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

data_src	id	src_id	player
NFL	11675	2543495	Davante Adams
NFL	12181	2552600	Nelson Agholor
NFL	10651	2530660	Kamar Aiken
NFL	11222	2540154	Keenan Allen
NFL	9308	2649	Danny Amendola
NFL	12930	2556462	Robby Anderson

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


# extracting id and src_id from all positions
scrap %>%
  map(function(dft){
    dft %>% 
      filter(data_src=="NFL") %>% 
      select(id, src_id, player, team, pos) %>% 
      return()
  }) %>% 
  bind_rows() %>%
  distinct() -> players.ids

1
2
3
4


# ID mapping
head(players.ids) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

id	src_id	player	team	pos
13589	2560955	Josh Allen	BUF	QB
13125	2557922	C.J. Beathard	SF	QB
11642	2543477	Blake Bortles	JAX	QB
9817	497095	Sam Bradford	ARI	QB
5848	2504211	Tom Brady	NE	QB
4925	2504775	Drew Brees	NO	QB

Finally we can make the predictions merging of players to the team’s rankings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


# nest by "id" and merge with "src_id"
players.points %>% 
  select(-pos) %>% 
  group_by(id) %>% 
  nest(.key="points.range") %>%
  # merge ID with SRC_ID
  inner_join(players.ids, by = c("id")) %>%
  select(id, src_id, player, pos, team, points.range) %>% 
  # keep only "ids" at top level
  select(id, src_id, points.range) -> players.ids.points

# auxiliary function to merge roster with player points
mergePoints <- function(.roster, .points){
  .roster %>% 
    left_join(.points, by="src_id") %>% 
    return()
}

# merge points in rosters
matchups.rosters %>% 
  mutate(
    home.roster = map(home.roster, mergePoints, .points=players.ids.points),
    away.roster = map(away.roster, mergePoints, .points=players.ids.points)
  ) -> matchups.points

Note that we are using a structure of nested data.frames, i.e., we have a matchups data.frame where each line is a match. In each match there are two rosters columns (“home” and “visitor”), these columns hold another data.frame, containing the roster itself. In this data.frame, each line is a player, and for each player there is a column called points.range which also contains another data.frame, with the prediction of each site’s player scores.

1
2
3


# "father" dataframe and the first nested column
matchups.points %>% 
  select( home.name, home.roster ) 

1
2
3
4
5
6
7


## # A tibble: 4 x 2
##   home.name             home.roster      
##   <chr>                 <list>           
## 1 Rio Claro Pfeiferians <tibble [15 x 7]>
## 2 Sorocaba Steelers     <tibble [15 x 7]>
## 3 Amparo Bikers         <tibble [15 x 7]>
## 4 Indaiatuba Riders     <tibble [15 x 7]>

1
2


# seeing the first nested data.frame
matchups.points[1,]$home.roster[[1]]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


## # A tibble: 15 x 7
##     src_id name              position rosterSlot points    id points.range     
##      <int> <chr>             <chr>    <chr>       <dbl> <int> <list>           
##  1 2558125 Patrick Mahomes   QB       QB           15.8 13116 <tibble [9 x 2]> 
##  2 2507164 Adrian Peterson   RB       RB            4.2  8658 <tibble [9 x 2]> 
##  3 2543773 James White       RB       RB           13.7 11747 <tibble [9 x 2]> 
##  4 2508061 Antonio Brown     WR       WR           22.1  9988 <tibble [9 x 2]> 
##  5 2556370 Michael Thomas    WR       WR            7.4 12652 <tibble [9 x 2]> 
##  6 2558266 George Kittle     TE       TE            8.3 13299 <tibble [9 x 2]> 
##  7 2555430 Alex Collins      RB       W/R           6.6 12628 <tibble [8 x 2]> 
##  8    1581 DeSean Jackson    WR       BN            0    9075 <tibble [4 x 2]> 
##  9 2540158 Zach Ertz         TE       BN           17   11247 <tibble [10 x 2]>
## 10 2540160 Jordan Reed       TE       BN            2.1 11248 <tibble [9 x 2]> 
## 11 2558063 Deshaun Watson    QB       BN           21   13113 <tibble [9 x 2]> 
## 12 2558865 Chris Carson      RB       BN           12.7 13364 <tibble [8 x 2]> 
## 13 2559169 Austin Ekeler     RB       BN           11.9 13404 <tibble [6 x 2]> 
## 14 2507232 Mason Crosby      K        K             3    8742 <tibble [10 x 2]>
## 15  100011 Green Bay Packers DEF      DEF           2     523 <tibble [8 x 2]>

1
2


# look the second level dataframe
matchups.points[1,]$home.roster[[1]][1,]$points.range[[1]]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


## # A tibble: 9 x 2
##   data_src      points
##   <chr>          <dbl>
## 1 CBS             16  
## 2 ESPN            18.7
## 3 FantasyPros     22.3
## 4 FantasySharks   19.5
## 5 FFToday         20.5
## 6 FleaFlicker     24.7
## 7 NFL             19.1
## 8 NumberFire      24.6
## 9 Yahoo           18.8

Nested data.frames is a convenient model because it allows you to keep the data together and manipulate them easily.

Monte Carlo Simulation

To simulate the result of round matches, we need to simulate the score obtained by each teams and for this we will simulate the score of the team members using Monte Carlo simulation.

To simulate the players' scores we will consider that each of the players can make one of the scores projected by the forecast sites. For simplicity, in this post, we can assume that the odds are equal for any of the projected scores.

In this case the simulation of a match using Monte Carlo then consists in:

For each player of the team, draws one of the possible projected numbers
We sum the players' points drawed: this will be the team score
Compare the score of the home team with the away team to determine who won
A win is computed for the team with the highest score

This procedure is repeated N times, simulating several matchs, to determine the chances of winning a team, we sum the total number of times in which the team was a winner and divide by the total number of simulations. Thus we will have the chances of each team winning the match, once the simulations reflect the numerous combinations of scores between players and their teams.

Note that we assume that each player has equal chance of having any of the projected scores as a simulation score, more sophisticated models could consider different ranges with different probabilities between projections, including assessing the performance history of the site, but I’ll leave this considerations to another future post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64


### Auxiliary functions

# function to generate .n possible pontuations from .points.range
# it's used to generate the .n simulations to each player
simPlayer <- function(.points.range, .n){

  # just check if the points.range isn't empty
  if(is.null(.points.range)) return( vector(mode = "numeric",.n) )
  if(nrow(.points.range)==0) return( vector(mode = "numeric",.n) )

  # generate a .n vector samples from points.range
  .points.range$points %>% 
    sample(size = .n, replace = T) %>%
    return()

}

# function to add the player pontuation to the team roster dataframe
simTeam <- function(.roster, .n){
  .roster %>% 
    mutate( sim.player = map(points.range, simPlayer, .n=.n) )  %>% 
    return()
}

# this function is in charge to sum the pontuations from 
# each player to generate the .n-size vector with team pontuation
simTeamPoints <- function(.roster){
  .roster %>% 
    filter(rosterSlot!="BN") %>%  # exclude player in bench
    pull(sim.player) %>%          # get the player pontuation simulation
    bind_cols() %>%               # binds the pontuation toghether 
    as.matrix() %>%               # now we have an matrix with # players x # .n simulations
    rowSums(na.rm = T) %>%        # sum each row (simuilation) to get a .n-vector 
    return()                      # each position in this vector is a possible team pontuation
}

### Simulation Code

# number of simulations
n <- 2000

# in the matchups dataframe
matchups.points %>% 
  mutate(
    # just team nicknames to shorter legends :)
    away.nickname = gsub("([a-zA-Z\']+ )?", "", away.name),
    home.nickname = gsub("([a-zA-Z\']+ )?", "", home.name)
  ) %>% 
  mutate(
    home.roster  = map(home.roster, simTeam, .n=n), # add players simulation points
    away.roster  = map(away.roster, simTeam, .n=n), # to each roster
    home.sim.pts = map(home.roster, simTeamPoints), # computes the team simulation
    away.sim.pts = map(away.roster, simTeamPoints)  # points
  ) %>% 
  mutate( 
    home.win    = map2(home.sim.pts, away.sim.pts, function(.x,.y) (.x > .y) ), # computes the 
    away.win    = map(home.win, function(.x) (!.x)), # number of victures of each team
    home.win.prob = map_dbl(home.win, mean, na.rm = T),  # the % of victories
    away.win.prob = map_dbl(away.win, mean, na.rm = T)   # the % of victories
  ) %>%
  mutate(
    # this calculate the difference of score points in each simulation
    score.diff    = map2(home.sim.pts, away.sim.pts, function(.x,.y){.x - .y})
  ) -> simulation

Now we have a pontuation curve for each player in the roster and also the pontuation curve of each team, let’s see what are the results.

Simulation Results

Let’s compare the difference of score for each match in league, the difference of score will allow us to calculate the chances of victory that each team has, according to the amount of “winning” simulations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


# return a summary as a tibble
summaryAsTibble <- . %>% summary() %>% as.list() %>% as.tibble()

# first, lets build team simulation summary
c("home","away") %>% 
  map(function(.prefix, .matchups.sim){
    .matchups.sim %>% 
      select( starts_with(.prefix)) %>% 
      set_names(gsub(pattern = paste0(.prefix,"\\."),replacement = "",x=names(.))) %>% 
      mutate( points = map(sim.pts, summaryAsTibble) ) %>% 
      select(-roster, -win) %>% 
      unnest(points, .sep=".") %>% 
      return()
  },
  .matchups.sim = simulation) %>% 
  bind_rows() -> sim.results

# visualizing the summary 
sim.results %>% 
  select(nickname, win.prob, points=points.Median) %>% 
  mutate(win.prob = win.prob * 100) %>% 
  mutate_at(2:3, round, digits=1) %>% 
  arrange(desc(points)) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

nickname	win.prob	points
Mules	92.2	106.6
Steelers	92.2	105.9
Robots	77.5	104.8
Giants	68.0	104.0
Bikers	32.0	99.7
Pfeiferians	22.5	98.0
Riders	7.8	94.8
Bugre	7.8	92.4

We can see the points scored and the chance of victory (win.prob). We used the median of the distribution as the best projected score (the one who divides the simulated score by 50% chance). How “safe” is the projected score? We need to visualize the distribution of possible scores to get a better view of the certainty of the projected score.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# lets plot the points distribution from simulation
library(tidybayes) # stat_intervalh
sim.results %>% 
  select( nickname, med.pts = points.Median, sim.pts ) %>% 
  mutate( 
    nickname = as.factor(nickname),
    sim.pts = map(sim.pts, base::sample, size=40) # just to reduce de number of point to be ploted
  ) %>% 
  unnest(sim.pts) %>% 
  ggplot(aes(y=reorder(nickname, med.pts))) +
  stat_intervalh(aes(x=sim.pts), .width = c(seq(.05,.95,.1))) +
  scale_color_brewer() +
  geom_point(aes(x=sim.pts), alpha=.1) +
  theme_minimal() +
  ylab("teams") + xlab("points") +
  theme(legend.position = "none")

Showing the distribution of points instead of just the most probable score it is possible to see more details about the possibles performances of a team. The same can be visualized with the chances of victory, instead of just counting the number of times that the simulations of the matches point to the victory of a team, we can visualize the distribution of the difference in the score in each game, generating a curve of probability for each possible outcome.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


simulation %>% 
  mutate(game=paste0(away.nickname, " @ ", home.nickname)) %>% 
  arrange(away.nickname) %>% 
  select(game, score.diff) %>%
  unnest() %>% 
  ggplot(aes(fill=game)) +
  geom_density(aes(score.diff), alpha=.6) +
  geom_vline(aes(xintercept=0),
             linetype=2, color="red") +
  facet_grid(rows=vars(game), switch = "x") +
  theme_minimal() +
  theme( legend.position = "bottom" )

1
2
3
4
5
6


simulation %>% 
  arrange(away.nickname) %>% 
  mutate_at(vars(away.win.prob, home.win.prob), function(x) round(100*x,1)) %>% 
  select(away.nickname, away.win.prob, home.win.prob, home.nickname)  %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

away.nickname	away.win.prob	home.win.prob	home.nickname
Bugre	7.8	92.2	Steelers
Giants	68.0	32.0	Bikers
Mules	92.2	7.8	Riders
Robots	77.5	22.5	Pfeiferians

Conclusion

We have seen that it is possible to use NFL players' performance projections, available on various websites, to calculate fantasy scores and to simulate, using Monte Carlo, the outcame of a league games. More sophisticated simulation models can be used, taking into account the historical distribution of the accuracy of the estimates of these sites to calculate a greater number of results possibilities.

Today, in my league, before the start of the round, after waivers and the lineups, I I send a dashboard (made using RMarkdown and [Flexdashboard](https: / /rmarkdown.rstudio.com/flexdashboard/using.html)) to members with simulation results and the performance of their rosters. You can see an example of it here: http://rpubs.com/gsposito/ffsimulationDudes]. As an evolution, in the future, I may tranform this in to a ShinyApp to members be abble to simulate several different rosters combinations to choose the most promising one.

Prediction Evaluation

Before concluding it is worth comparing the simulation made with the actual scores, and evaluating how much the simulation projected came close to the obtained real result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# comparing simulated values with real values
simulation %>% 
  mutate( 
    away.win.real   = away.pts > home.pts,
    home.win.real   = home.pts > away.pts,
    score.diff.real = home.pts - away.pts,
    away.sim.pts = map_dbl(away.sim.pts, median, na.rm=T),
    home.sim.pts = map_dbl(home.sim.pts, median, na.rm=T),
    score.diff   = map_dbl(score.diff, median, na.rm=T )
  ) %>% 
  mutate_at( vars(away.win.prob, home.win.prob), function(x) round(100*x,2) )%>% 
  select( away.nickname, away.win.prob, away.win.real, away.sim.pts, away.pts, score.diff, score.diff.real, 
          home.pts, home.sim.pts, home.win.real, home.win.prob, home.nickname ) %>% 
  mutate_if(is.numeric, round, digits=1) %>% 
  arrange(away.nickname) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

away.nickname	away.win.prob	away.win.real	away.sim.pts	away.pts	score.diff	score.diff.real	home.pts	home.sim.pts	home.win.real	home.win.prob	home.nickname
Bugre	7.8	FALSE	92.4	112.6	13.3	13.7	126.3	105.9	TRUE	92.2	Steelers
Giants	68.0	TRUE	104.0	108.7	-4.2	-11.5	97.2	99.7	FALSE	32.0	Bikers
Mules	92.2	TRUE	106.6	98.4	-12.0	-8.3	90.1	94.8	FALSE	7.8	Riders
Robots	77.5	TRUE	104.8	113.6	-6.7	-30.4	83.1	98.0	FALSE	22.5	Pfeiferians

This was good! The simulated results were satisfactorily close to those obtained in 3 of the 4 games. All victories and defeats were correctly predicted. Only one of the games got a score difference far away from the one projected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# comparing score difference
simulation %>% 
  mutate(
    game=paste0(away.nickname, " @ ", home.nickname),
    score.diff.real = home.pts - away.pts
  ) %>% 
  arrange(away.nickname) %>% 
  select(game, score.diff, score.diff.real) %>%
  unnest() %>% 
  ggplot(aes(fill=game)) +
  geom_density(aes(score.diff), alpha=.6) +
  geom_vline(aes(xintercept=score.diff.real),
             linetype=1, size=1, color="black") +
  geom_vline(aes(xintercept=0),
             linetype=2, color="red") +
  facet_grid(rows=vars(game), switch = "x") +
  theme_minimal() +
  theme( legend.position = "bottom" )

Perhaps the reason why the difference in scoring in the game between Robots and Pfeiferians has fallen so far from the most likely is also by such an unlikely event in the Packers' game against the Lions. Here’s the lineup of the house team, the one who lost:

1
2
3
4
5
6
7


# rosted home team
simulation[1,]$home.roster[[1]] %>% 
  filter(rosterSlot != "BN") %>% 
  mutate(points.sim = map_dbl(sim.player,median, na.rm=T)) %>% 
  select(name, position, points) %>% 
   kable() %>%
   kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11)

name	position	points
Patrick Mahomes	QB	15.82
Adrian Peterson	RB	4.20
James White	RB	13.70
Antonio Brown	WR	22.10
Michael Thomas	WR	7.40
George Kittle	TE	8.30
Alex Collins	RB	6.60
Mason Crosby	K	3.00
Green Bay Packers	DEF	2.00

In this game, Mason Crosby, Packers' Kicker missed 4 fields goals and 1 extra point, with a total of 13 points, an event rare, which has not happened since 1997. If Crosby had hit the shots, which he habitually does, the score difference would be only 10 points away from the predicted score, not 23!

But after all, who wants to predict accurately all possible situations?