The football season is back, and with it the Fantasy Game! In this post, we will simulate the results and the scoring of my Fantasy League games. To do that, we’ll project the scoring of teams using Monte Carlo simulation with data scraped from sites that predicts players' performances. We will combine the various possible scores of a team’s players to estimate the team’s score distribution, and then compare with the opposing team, and finally compute each team’s chances of winning and losing.
Abstract
The season of American football is back, and with it the Fantasy, the already traditional online game which you bring your friends or coworkers to play together in a virtual league, where each member rosters NFL’s players on virtual teams and hoping that they will score well in their real life games. The real life player’s score goes to your virtual team score.
This package does data scrapping in various sites that make predictions of player’s performances such as ESPN, CBS, Yahoo and the NFL website itself, after, applies the fantasy scoring rules (which can even be cutomized for your League) and calculates the score possible for each of the projections.
Finally, the package analyzes the points obtained by making performance projections of the results, aggregating in one vision the predictions of several sites. Isaac publishes weekly the ranking of projections by position for the games of the round, using some standards scoring rules.
With all the hard work of doing data scrapping and apply the rules of fantasy to calculate the score already made by the package, we can use these informations to project the results of teams scaled in fantasy leagues and to forecast game results, remaining only to obtain the teams and their rosters from Fantasy itself.
Fantasy API - Getting the Team’s Matchups and Rosters
In order to obtain the rounds of a fantasy league, we can use the Web API available by the Fantasy website. Although it has some depreciated methods they still work and serve the purpose of getting the information we want. In particular we need access the methods that tells us which games
/league/matchups is schedule for a week. This API receives as input parameters the authentication token, the id of the league and the week of interest, returning the games scheduled for that week. We also will use the API /league/team/matchup that, in addition to the above parameters, also gets the team id to return the team roster.
We can invoke the API using the httr package and process the response json using jsonlite.
1
2
3
4
5
6
7
8
# Storing the Access Token and League ID locally# I use a yalm file to avoid hard-code them # or eventually version them in the GitHub :)library(yaml)config<-yaml.load_file("../../config/config.yml")leagueId<-config$leagueIdauthToken<-config$authToken
1
2
3
4
5
6
7
8
9
10
# invoking the APIlibrary(httr)library(glue)# to easily replace vars in the url# league/matchups urlurl<-"http://api.fantasy.nfl.com/v1/league/matchups?leagueId={leagueId}&week={week}&format=json&authToken={authToken}"week<-5# call the apiresp<-httr::GET(glue(url))
1
2
# Is it ok?resp$status_code
1
## [1] 200
Once the call response is obtained, we treat the return * json * to organize the data and obtain the team rosters.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
library(jsonlite)library(tidyverse)library(kableExtra)# to convert the json in a "tabular-tibble form"resp%>%httr::content(as="text")%>%fromJSON(simplifyDataFrame=T)%$%leagues%$%matchups%>%.[[1]]%>%jsonlite::flatten()%>%as.tibble()->matchupsmatchups%>%select(awayTeam.id,awayTeam.name,homeTeam.name,homeTeam.id)%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
awayTeam.id
awayTeam.name
homeTeam.name
homeTeam.id
1
Change Robots
Rio Claro Pfeiferians
6
7
NJ's Bugre
Sorocaba Steelers
5
11
Campinas Giants
Amparo Bikers
4
2
Sorocaba Wild Mules
Indaiatuba Riders
3
We make new calls to the API to get the roster of each team in that week.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# for each teamIds in the matchupc(matchups$awayTeam.id)%>%map(function(.teamId,.week,.leagueId,.authToken,.url){# make the API callhttr::GET(glue(.url))%>%httr::content(as="text")%>%fromJSON(simplifyDataFrame=T)%>%# transform response body in jsonreturn()},.week=week,.leagueId=leagueId,.authToken=authToken,.url="http://api.fantasy.nfl.com/v1/league/team/matchup?leagueId={.leagueId}&teamId={.teamId}&week={.week}&authToken={.authToken}&format=json")->rosters.json
1
2
# this is a list with the team rosters used in this weekrosters.json[[1]]$leagues$matchup$homeTeam$name
# auxiliar transformation to extract team rosterextractTeam<-. %>%.$players%>%.[[1]]%>%select(src_id=id,name,position,rosterSlot,fantasyPts)%>%jsonlite::flatten()%>%as.tibble()%>%select(-fantasyPts.week.season,-fantasyPts.week.week)%>%rename(points=fantasyPts.week.pts)%>%mutate(src_id=as.integer(src_id),points=as.numeric(points))# extract each rosterrosters.json%>%map(function(.json){matchup<-.json$leagues$matchuptibble(home.teamId=as.integer(matchup$homeTeam$id),home.name=matchup$homeTeam$name,home.logo=matchup$homeTeam$logoUrl,home.pts=as.numeric(matchup$homeTeam$pts),home.roster=list(extractTeam(matchup$homeTeam)),away.teamId=as.integer(matchup$awayTeam$id),away.name=matchup$awayTeam$name,away.logo=matchup$awayTeam$logoUrl,away.pts=as.numeric(matchup$awayTeam$pts),away.roster=list(extractTeam(matchup$awayTeam)))%>%return()})%>%bind_rows()->matchups.rosters# check the matchups QBs for each team matchups.rosters%>%mutate(away.qb=map(away.roster,function(roster)roster%>%filter(rosterSlot=="QB")),home.qb=map(home.roster,function(roster)roster%>%filter(rosterSlot=="QB")))%>%unnest(away.qb,home.qb,.sep=".")%>%select(away.team=away.name,away.qb.name,home.qb.name,home.team=home.name)%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
away.team
away.qb.name
home.qb.name
home.team
Change Robots
Aaron Rodgers
Patrick Mahomes
Rio Claro Pfeiferians
NJ's Bugre
Russell Wilson
Ben Roethlisberger
Sorocaba Steelers
Campinas Giants
Tom Brady
Drew Brees
Amparo Bikers
Sorocaba Wild Mules
Matt Ryan
Cam Newton
Indaiatuba Riders
Now we have a tibble with the games between the teams and, nested in each registry, the respective rosters. Now you will need to use ffanalytis package to get the prediction performance and score of each player.
Forecast players perform
Firstly, we will use the ffanalytics package to do the data scraping of the forecasts for each player in the league made by the main sites that follow and make this type of prediction.
The scrape_data function returns a list by position, with the performance projections of the players in that position. This is because the predictions for each position have different attributes, for example, Kickers are evaluated by the number of field goals and distances of the kicks, and Quaterbacks by the numbers and distances of the passes.
Secondly, with projections in hand, we use ffanalytics package again to calculate how many points each player will make according with each prediction scraped from the sites. However, the package does not export the function that does the this individual calculation, but it is a necessary step to calculate the projections table that the site uses in its graphics.
But the package project is in the GitHub, so, it is possible to download the code, load the scripts directly and access the function that calculates the points per player and projection site. The function is called source_points(), and is present in the script calc_projections.R. You can load the script (and its dependencies) to invoke it directly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# function to access 'source_points' directlyplayerPointsProjections<-function(.scrap,.score_rules){source("../ffanalytics/R/calc_projections.R")source("../ffanalytics/R/stats_aggregation.R")source("../ffanalytics/R/source_classes.R")source("../ffanalytics/R/custom_scoring.R")source("../ffanalytics/R/scoring_rules.R")source("../ffanalytics/R/make_scoring.R")source("../ffanalytics/R/recode_vars.R")source("../ffanalytics/R/impute_funcs.R")source_points(.scrap,.score_rules)}# customized scoring rulessource("./score_settings.R")players.points<-playerPointsProjections(scrap,dudes.score.settings)
pos
data_src
id
points
K
CBS
8359
10
K
CBS
12956
9
K
CBS
11936
9
K
CBS
6789
9
K
CBS
8930
9
K
CBS
11384
9
Merging Rosters and Predictions
We now have the teams rosters and the scoring projections of the sites for each player, so we need to join the datasets. But to do that it is necessary to match the players' ids. If you notice the data displayed, each player’s ID is different on each of the sites, ffanalytics package names this id as src_id, but unifies the results to a unified, identificator named id.
The teams' rosters were obtained from the fantasy site, it follows the src_id identification of the NFL, to make the merge between the two dataset it will be necessary to map the src_id of the NFL to id of ffanalytics package. We can extract this ‘ids’ mapping from NFL prediction scraped data:
1
2
3
4
5
6
7
# look the presence of both ids in the projection tablescrap$WR%>%filter(data_src=="NFL")%>%select(1:4)%>%head()%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
data_src
id
src_id
player
NFL
11675
2543495
Davante Adams
NFL
12181
2552600
Nelson Agholor
NFL
10651
2530660
Kamar Aiken
NFL
11222
2540154
Keenan Allen
NFL
9308
2649
Danny Amendola
NFL
12930
2556462
Robby Anderson
1
2
3
4
5
6
7
8
9
10
# extracting id and src_id from all positionsscrap%>%map(function(dft){dft%>%filter(data_src=="NFL")%>%select(id,src_id,player,team,pos)%>%return()})%>%bind_rows()%>%distinct()->players.ids
1
2
3
4
# ID mappinghead(players.ids)%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
id
src_id
player
team
pos
13589
2560955
Josh Allen
BUF
QB
13125
2557922
C.J. Beathard
SF
QB
11642
2543477
Blake Bortles
JAX
QB
9817
497095
Sam Bradford
ARI
QB
5848
2504211
Tom Brady
NE
QB
4925
2504775
Drew Brees
NO
QB
Finally we can make the predictions merging of players to the team’s rankings.
# nest by "id" and merge with "src_id"players.points%>%select(-pos)%>%group_by(id)%>%nest(.key="points.range")%>%# merge ID with SRC_IDinner_join(players.ids,by=c("id"))%>%select(id,src_id,player,pos,team,points.range)%>%# keep only "ids" at top levelselect(id,src_id,points.range)->players.ids.points# auxiliary function to merge roster with player pointsmergePoints<-function(.roster,.points){.roster%>%left_join(.points,by="src_id")%>%return()}# merge points in rostersmatchups.rosters%>%mutate(home.roster=map(home.roster,mergePoints,.points=players.ids.points),away.roster=map(away.roster,mergePoints,.points=players.ids.points))->matchups.points
Note that we are using a structure of nested data.frames, i.e., we have a matchups data.frame where each line is a match. In each match there are two rosters columns (“home” and “visitor”), these columns hold another data.frame, containing the roster itself. In this data.frame, each line is a player, and for each player there is a column called points.range which also contains another data.frame, with the prediction of each site’s player scores.
1
2
3
# "father" dataframe and the first nested columnmatchups.points%>%select(home.name,home.roster)
1
2
3
4
5
6
7
## # A tibble: 4 x 2
## home.name home.roster
## <chr> <list>
## 1 Rio Claro Pfeiferians <tibble [15 x 7]>
## 2 Sorocaba Steelers <tibble [15 x 7]>
## 3 Amparo Bikers <tibble [15 x 7]>
## 4 Indaiatuba Riders <tibble [15 x 7]>
1
2
# seeing the first nested data.framematchups.points[1,]$home.roster[[1]]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## # A tibble: 15 x 7
## src_id name position rosterSlot points id points.range
## <int> <chr> <chr> <chr> <dbl> <int> <list>
## 1 2558125 Patrick Mahomes QB QB 15.8 13116 <tibble [9 x 2]>
## 2 2507164 Adrian Peterson RB RB 4.2 8658 <tibble [9 x 2]>
## 3 2543773 James White RB RB 13.7 11747 <tibble [9 x 2]>
## 4 2508061 Antonio Brown WR WR 22.1 9988 <tibble [9 x 2]>
## 5 2556370 Michael Thomas WR WR 7.4 12652 <tibble [9 x 2]>
## 6 2558266 George Kittle TE TE 8.3 13299 <tibble [9 x 2]>
## 7 2555430 Alex Collins RB W/R 6.6 12628 <tibble [8 x 2]>
## 8 1581 DeSean Jackson WR BN 0 9075 <tibble [4 x 2]>
## 9 2540158 Zach Ertz TE BN 17 11247 <tibble [10 x 2]>
## 10 2540160 Jordan Reed TE BN 2.1 11248 <tibble [9 x 2]>
## 11 2558063 Deshaun Watson QB BN 21 13113 <tibble [9 x 2]>
## 12 2558865 Chris Carson RB BN 12.7 13364 <tibble [8 x 2]>
## 13 2559169 Austin Ekeler RB BN 11.9 13404 <tibble [6 x 2]>
## 14 2507232 Mason Crosby K K 3 8742 <tibble [10 x 2]>
## 15 100011 Green Bay Packers DEF DEF 2 523 <tibble [8 x 2]>
1
2
# look the second level dataframematchups.points[1,]$home.roster[[1]][1,]$points.range[[1]]
Nested data.frames is a convenient model because it allows you to keep the data together and manipulate them easily.
Monte Carlo Simulation
To simulate the result of round matches, we need to simulate the score obtained by each teams and for this we will simulate the score of the team members using Monte Carlo simulation.
To simulate the players' scores we will consider that each of the players can make one of the scores projected by the forecast sites. For simplicity, in this post, we can assume that the odds are equal for any of the projected scores.
In this case the simulation of a match using Monte Carlo then consists in:
For each player of the team, draws one of the possible projected numbers
We sum the players' points drawed: this will be the team score
Compare the score of the home team with the away team to determine who won
A win is computed for the team with the highest score
This procedure is repeated N times, simulating several matchs, to determine the chances of winning a team, we sum the total number of times in which the team was a winner and divide by the total number of simulations. Thus we will have the chances of each team winning the match, once the simulations reflect the numerous combinations of scores between players and their teams.
Note that we assume that each player has equal chance of having any of the projected scores as a simulation score, more sophisticated models could consider different ranges with different probabilities between projections, including assessing the performance history of the site, but I’ll leave this considerations to another future post.
### Auxiliary functions# function to generate .n possible pontuations from .points.range# it's used to generate the .n simulations to each playersimPlayer<-function(.points.range,.n){# just check if the points.range isn't emptyif(is.null(.points.range))return(vector(mode="numeric",.n))if(nrow(.points.range)==0)return(vector(mode="numeric",.n))# generate a .n vector samples from points.range.points.range$points%>%sample(size=.n,replace=T)%>%return()}# function to add the player pontuation to the team roster dataframesimTeam<-function(.roster,.n){.roster%>%mutate(sim.player=map(points.range,simPlayer,.n=.n))%>%return()}# this function is in charge to sum the pontuations from # each player to generate the .n-size vector with team pontuationsimTeamPoints<-function(.roster){.roster%>%filter(rosterSlot!="BN")%>%# exclude player in benchpull(sim.player)%>%# get the player pontuation simulationbind_cols()%>%# binds the pontuation toghether as.matrix()%>%# now we have an matrix with # players x # .n simulationsrowSums(na.rm=T)%>%# sum each row (simuilation) to get a .n-vector return()# each position in this vector is a possible team pontuation}### Simulation Code# number of simulationsn<-2000# in the matchups dataframematchups.points%>%mutate(# just team nicknames to shorter legends :)away.nickname=gsub("([a-zA-Z\']+ )?","",away.name),home.nickname=gsub("([a-zA-Z\']+ )?","",home.name))%>%mutate(home.roster=map(home.roster,simTeam,.n=n),# add players simulation pointsaway.roster=map(away.roster,simTeam,.n=n),# to each rosterhome.sim.pts=map(home.roster,simTeamPoints),# computes the team simulationaway.sim.pts=map(away.roster,simTeamPoints)# points)%>%mutate(home.win=map2(home.sim.pts,away.sim.pts,function(.x,.y)(.x>.y)),# computes the away.win=map(home.win,function(.x)(!.x)),# number of victures of each teamhome.win.prob=map_dbl(home.win,mean,na.rm=T),# the % of victoriesaway.win.prob=map_dbl(away.win,mean,na.rm=T)# the % of victories)%>%mutate(# this calculate the difference of score points in each simulationscore.diff=map2(home.sim.pts,away.sim.pts,function(.x,.y){.x-.y}))->simulation
Now we have a pontuation curve for each player in the roster and also the pontuation curve of each team, let’s see what are the results.
Simulation Results
Let’s compare the difference of score for each match in league, the difference of score will allow us to calculate the chances of victory that each team has, according to the amount of “winning” simulations.
# return a summary as a tibblesummaryAsTibble<-. %>%summary()%>%as.list()%>%as.tibble()# first, lets build team simulation summaryc("home","away")%>%map(function(.prefix,.matchups.sim){.matchups.sim%>%select(starts_with(.prefix))%>%set_names(gsub(pattern=paste0(.prefix,"\\."),replacement="",x=names(.)))%>%mutate(points=map(sim.pts,summaryAsTibble))%>%select(-roster,-win)%>%unnest(points,.sep=".")%>%return()},.matchups.sim=simulation)%>%bind_rows()->sim.results# visualizing the summary sim.results%>%select(nickname,win.prob,points=points.Median)%>%mutate(win.prob=win.prob*100)%>%mutate_at(2:3,round,digits=1)%>%arrange(desc(points))%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
nickname
win.prob
points
Mules
92.2
106.6
Steelers
92.2
105.9
Robots
77.5
104.8
Giants
68.0
104.0
Bikers
32.0
99.7
Pfeiferians
22.5
98.0
Riders
7.8
94.8
Bugre
7.8
92.4
We can see the points scored and the chance of victory (win.prob). We used the median of the distribution as the best projected score (the one who divides the simulated score by 50% chance). How “safe” is the projected score? We need to visualize the distribution of possible scores to get a better view of the certainty of the projected score.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# lets plot the points distribution from simulationlibrary(tidybayes)# stat_intervalhsim.results%>%select(nickname,med.pts=points.Median,sim.pts)%>%mutate(nickname=as.factor(nickname),sim.pts=map(sim.pts,base::sample,size=40)# just to reduce de number of point to be ploted)%>%unnest(sim.pts)%>%ggplot(aes(y=reorder(nickname,med.pts)))+stat_intervalh(aes(x=sim.pts),.width=c(seq(.05,.95,.1)))+scale_color_brewer()+geom_point(aes(x=sim.pts),alpha=.1)+theme_minimal()+ylab("teams")+xlab("points")+theme(legend.position="none")
Showing the distribution of points instead of just the most probable score it is possible to see more details about the possibles performances of a team. The same can be visualized with the chances of victory, instead of just counting the number of times that the simulations of the matches point to the victory of a team, we can visualize the distribution of the difference in the score in each game, generating a curve of probability for each possible outcome.
We have seen that it is possible to use NFL players' performance projections, available on various websites, to calculate fantasy scores and to simulate, using Monte Carlo, the outcame of a league games. More sophisticated simulation models can be used, taking into account the historical distribution of the accuracy of the estimates of these sites to calculate a greater number of results possibilities.
Today, in my league, before the start of the round, after waivers and the lineups, I I send a dashboard (made using RMarkdown and [Flexdashboard](https: / /rmarkdown.rstudio.com/flexdashboard/using.html)) to members with simulation results and the performance of their rosters. You can see an example of it here: http://rpubs.com/gsposito/ffsimulationDudes]. As an evolution, in the future, I may tranform this in to a ShinyApp to members be abble to simulate several different rosters combinations to choose the most promising one.
Prediction Evaluation
Before concluding it is worth comparing the simulation made with the actual scores, and evaluating how much the simulation projected came close to the obtained real result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# comparing simulated values with real valuessimulation%>%mutate(away.win.real=away.pts>home.pts,home.win.real=home.pts>away.pts,score.diff.real=home.pts-away.pts,away.sim.pts=map_dbl(away.sim.pts,median,na.rm=T),home.sim.pts=map_dbl(home.sim.pts,median,na.rm=T),score.diff=map_dbl(score.diff,median,na.rm=T))%>%mutate_at(vars(away.win.prob,home.win.prob),function(x)round(100*x,2))%>%select(away.nickname,away.win.prob,away.win.real,away.sim.pts,away.pts,score.diff,score.diff.real,home.pts,home.sim.pts,home.win.real,home.win.prob,home.nickname)%>%mutate_if(is.numeric,round,digits=1)%>%arrange(away.nickname)%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
away.nickname
away.win.prob
away.win.real
away.sim.pts
away.pts
score.diff
score.diff.real
home.pts
home.sim.pts
home.win.real
home.win.prob
home.nickname
Bugre
7.8
FALSE
92.4
112.6
13.3
13.7
126.3
105.9
TRUE
92.2
Steelers
Giants
68.0
TRUE
104.0
108.7
-4.2
-11.5
97.2
99.7
FALSE
32.0
Bikers
Mules
92.2
TRUE
106.6
98.4
-12.0
-8.3
90.1
94.8
FALSE
7.8
Riders
Robots
77.5
TRUE
104.8
113.6
-6.7
-30.4
83.1
98.0
FALSE
22.5
Pfeiferians
This was good! The simulated results were satisfactorily close to those obtained in 3 of the 4 games. All victories and defeats were correctly predicted. Only one of the games got a score difference far away from the one projected.
Perhaps the reason why the difference in scoring in the game between Robots and Pfeiferians has fallen so far from the most likely is also by such an unlikely event in the Packers' game against the Lions. Here’s the lineup of the house team, the one who lost:
1
2
3
4
5
6
7
# rosted home teamsimulation[1,]$home.roster[[1]]%>%filter(rosterSlot!="BN")%>%mutate(points.sim=map_dbl(sim.player,median,na.rm=T))%>%select(name,position,points)%>%kable()%>%kable_styling(bootstrap_options="striped",full_width=F,font_size=11)
name
position
points
Patrick Mahomes
QB
15.82
Adrian Peterson
RB
4.20
James White
RB
13.70
Antonio Brown
WR
22.10
Michael Thomas
WR
7.40
George Kittle
TE
8.30
Alex Collins
RB
6.60
Mason Crosby
K
3.00
Green Bay Packers
DEF
2.00
In this game, Mason Crosby, Packers' Kicker missed 4 fields goals and 1 extra point, with a total of 13 points, an event rare, which has not happened since 1997. If Crosby had hit the shots, which he habitually does, the score difference would be only 10 points away from the predicted score, not 23!