Saturday, 4 March 2017

Analyzing Jeopardy in R – The College Championship effect.



How much easier or harder is the Jeopardy! College Championship than regular Jeopardy? In the College Championship Jeopardy!. In this tournament, 1 undergraduate student from each of 15 U.S. post-secondary schools compete in a tournament of elimination rounds. The intended audience of the categories are different than it is for regular Jeopardy shows. Some of the clues referred to new and popular video games, and neologisms like 'woke'. To me, the College Championship questions were qualitatively easier, but for the sake of tracking, I want to measure that. The following method will also work if you find the College Championship questions harder than those from the regular show.

Here is a chart of my personal 'Coryat scores' since I started recording them on January 10th. Coryat scores, as found here http://www.pisspoor.com/jep.html , are a means of standardizing Jeopardy scores for home viewers. One rule of Coryat scores is that Daily Doubles are not wagered, but are treated as regular clues that a home viewer may guess on without a penalty for an incorrect answer.




The dotted lines in the chart are at a score of 24,000 and 28,000 respectively; which are considered by Karl Coryat to be the scores needed to be 'test ready' and 'show ready' respectively. Filled circles and triangles represent days in which I got the correct Final Jeopardy answer, and open shapes represent days in which I missed it. Circles represent regular show days, including Saturday re-runs. Triangles represent College Championship days.

The red curve represents an estimate of my average regular show Coryat at my current skill level. In order to make this estimate without ignoring 10 of my data points, I had to separate the 'effect' of the College Championship, which I assumed adds a flat amount to my scores. I applied this assumption to three regression models:

1. Score = α + β(CC game) + γ(Days played) + error

2. Score = α + β(CC game) + γ(sqrt(Days played)) + error

3. Score = α + β(CC game) + γ(exp(Days played/(Total Days)*log(0.70))) + error

In Model 1, I assume that improvement in mean scores is constant with each passing day.

In Model 2, I assume that improvement in mean scores diminishes with each passing day, but that it continues indefinitely. (e.g. It will take four times as much time to get twice the progress).

In Model 3, I assume improvement, but that there is some limit to my progress, and that I am started at about 28%, and am already at 70% of my limit. Under this model, my mean score will decay exponentially towards some maximum and stay there.

Each model applies a fixed additive effect for playing a College Championship game, symbolized by β. I don't know which if these models is the most correct, so instead I take an ensemble estimate using all three. The ensemble estimate is the weighted average of the three estimates of β, using weights inversely proportional to the standard error of each estimate. That is, if a model had more uncertainty about the size of CC effect, its estimate was used less in the ensemble. Each model gave a standard error of 3000-3500 points on an estimate of 4500-5000, which means that all three models agree closely and that they are considered with near equal weight. Also, those standard errors are so large that I can't claim with confidence that there even is an effect of the College Championship on scores, but that's mostly because I'm merely one at-home player with less than 40 days worth of data.

From my personal data, the College Championship adds 4282 points to my score.

Each model can also predict/fit the Coryat score from the data set for any amount of 'days played' and whether the show was a regular one or a college championship. The red curve is constructed by plugging the appropiate number of days played a 'regular show' game type into each model, and getting an ensemble estimate. Similar to the ensemble estimate for the CC effect, this is a weight mean of the fitted values, with weights inversely proportional the RMSE (root mean square error) of that model. In short, models that fit the observed data better are weighted more heavily. However, every model was weighted almost equally and had a similar estimate for my regular show mean Coryat on the most recent day.

From my personal data, I score an average of 16,957 points on a regular show.

You can use the attached code and data to recreate this graph, and you can data in this attached .csv file's format to apply the analysis to your own Coryat scores.

Download Analysis Code

Download Frontend Code

Download my Jeopardy! data

 
You need only apply your own file names and directories, copy/paste the code from the 'frontend' file into the R console, and execute it.

setwd("C:/Users/Jack/Desktop/Projects 2017")
jep = read.csv("JackJep 2017-03-03.csv", as.is=TRUE)

source("Jeopardy analysis code.txt")
jeopardy_analysis(jep,"Jack")

In this code, setwd() sets the working directory with the analysis code and the data to be analyzed. The read.csv() command loads the CSV file into R, and 'jep' is the name of that dataset. The command source() tells R to run the code in this file, which in this case is just the definition of the function jeopardy_analysis().

The function jeopardy_analysis() estimates the college championship effect, but only if there are any college championship games in the dataset. It estimates the current Coryat score of the player in the dataset 'jep'. It also creates a graph like the one you see above, using the name specified in the second argument.

Other optional arguments that you can use in the jeopardy_analysis() function include:

- potential_reached: Numeric. The proportion of your score potential you have reached. Used for model tuning. Defaults to 70%.
- thresholds: Vector of numeric values. Determines the vertical location of the dotted lines, if any, on the graph. Choose NULL for no thresholds. Defaults to c(24000,28000). More or fewer than 2 lines can be used.
- threshold_names: Vector of strings. Self explanatory. Ideally should be the same number of values as thresholds. Defaults to c("test ready","show ready")
- verbose: Boolean. Toggles the printing additional information about each model, and their weights in each ensemble. Defaults to FALSE.
- makeplot: Boolean. Toggles the creation of the plot. Defaults to TRUE.


If there is sufficient interest, I will publish updated data and analysis code after the Tournament of Champions, in which the best 15 Jeopardy contestants of the year compete, and which has more difficult clues than the regular show. This update would estimate the effect of both tournaments, as well as employ a wider ensemble of models.

Also, if other people are will to share their data, I can compute a better estimate of the College Championship effect, as well as test if the effect is non-additive. For example, does it raise a 10,000 Coryat player's score by more than it raises a 30,000 Coryat player's score? Does it reduce it for certain demographics?