Baseball was my goal from a young age, but gambling had significantly fewer barriers to entry.James Holzhauer, gambler and 33-time Jeopardy winner
- The NFL is the major US sports league whose game and season outcomes have the most potential to be influenced by individual officiating actions;
- The game actions which result in penalties in the NFL occur often enough that they could be called on nearly every play, creating significant room for ‘legitimate’ calls that would change betting outcomes;
- Media narratives for the NFL increasingly frame the sport as being about betting and gambling outcomes;
- As this has taken place, NFL media narratives are also increasingly framing outcomes as being about officiating actions; and
- We think this shift in common knowledge could change the league’s control over its own narrative.
The NFL – by far the most valuable sports league in the world – has a problem.
No, the problem isn’t Tony Corrente twerking into Bears linebacker Cassius Marsh before flamboyantly taunting him with a penalty flag for – you guessed it – taunting. OK, it is that, but not just that.
After all, complaining about officials is a pastime for every global pastime. In Germany, fans complain about Italian footballers flopping after a light breeze. Everywhere except wherever James Harden plays, fans complain about that drive-into-the-defender-and-throw-up-some-garbage-jumpshot nonsense that was ruining the game of basketball for years until they finally changed the rules this season. In every city in North America, fans complain about Ron Kulpa being assigned to their games and protest Angel Hernandez being allowed within a 30 mile radius of their stadium. And in every major metropolitan area on the planet other than greater Pittsburgh where people watch American football, they tuned in Monday Night Football last week and left muttering something about “the fix” being in.
But while the tendency of fans to cope by blaming officiating of games is the same, the NFL is different. How is it different?
- It is different because of the relative importance of each game outcome to the season outcome;
- It is different because of the relative impact of each adjudicated play on game outcome;
- It is different because of the preponderance of subjective vs. objective rules adjudication; and
- It is different because of the frequency of potential rules violations taking place on essentially every snap.
None of this is new. But the narrative environment surrounding them in 2021 IS new. Before we explore that, however, let’s do a bit more work to establish what we mean by the four premises above.
The Importance of Game Outcome to Season Outcome
The most important reason the NFL is different is also the most obvious: No major professional team sport plays games with such an impact on season outcomes. MLB plays almost ten times as many games with a best-of-multiple-games playoff format. The NBA plays more than 4 times as many games, also with a best-of playoff format. Ditto the NHL, but at about 3 times as many. The EPL plays twice as many matches. Serie A, La Liga and the Bundesliga are basically the same as the EPL. At 23 matches, Aussie rules is close. But at some risk of offending our friends down under, we observe that the Dallas Cowboys alone generate nearly as much revenue at the stadium alone during 8 home games as the entire AFL does for a full season.
And while trying to create analogs for ‘plays’ and adjudication events in soccer or hockey (or Formula One, for that matter) would be fun, we’re ultimately making an argument about US sports media narratives. I won’t insult your intelligence by calling the NCAA an amateur league, so if you want to apply just about everything said about the NFL here to NCAA College Football, be my guest. Likewise, the NHL has some interesting storylines (e.g. expanded betting in Canada, allegations of players betting on games), but it isn’t really at the scale of MLB, the NFL or the NBA. We’re going to focus on those three leagues.
And among those leagues, with 17 regular season games and a single-elimination tournament, the NFL is different first because the outcome of each game matters more to the outcome of each season.
The Impact of Adjudicated Plays to Game Outcome
The NFL is also different because the outcome of each play subject to officiating action matters more to the outcome of each game, both in the real world and in the betting world.
Now, the importance of each adjudicated play can be looked at in a couple ways. The first is the expected impact of the potential action (or inaction) of an official by generalizing the range of potential subsequent outcomes conditional on that officiating action. The second is the potential extreme impact of that officiating action in certain subsequent scenarios that it makes possible. In other words, we can estimate the effect of a bad call based on the probabilities of various certain and uncertain outcomes that will follow it. In some cases, however, the consequences of a bad call may have variance or a long tail that extend far outside of our mean expectation.
The easiest place to start is by looking at how many plays take place in a given game. An average NBA game includes about 200-210 possessions. An average MLB game includes roughly 300 pitches. An average NFL game ran about 128 snaps in 2020. While we won’t be exploring them explicitly, I’d point out that as continuous-play sports without discrete plays, both soccer and hockey are admittedly difficult comps. If a ‘play’ is a ‘shot’, then most soccer games will average 20-30 depending on the pace of the teams in question, while most hockey games will skew more towards 30. But a shot is not the only kind of play, and the success of passes (of which there are hundreds), clearances (dozens) and tackles/checks (a handful) are all part of the determination of game outcomes. If we were to include soccer in this review, I’d probably argue that the 180 or so discrete possessions in the average EPL match was the closest comparable to an NFL snap for a mean impact of adjudication perspective.
Now, it obviously isn’t as simple as saying that the NFL has fewer plays and that each play thus has a greater expected weight on game (and betting) outcomes. Differences in scoring mechanics mean that certain adjudicated plays will have very different influence on scoring and game outcome, even when we are modeling expected impact. The various kinds of penalties available and their outcomes are also quite different across sports.
Let’s start, then, with the simplest example – an NBA game, where both scoring and the result of officiating actions are heavily standardized. While it will vary significantly by player and situation, in most circumstances a shooting foul called by an official has an expected impact of between 0.70 and 1.50 points. Why? Because after taking into account the probabilities associated with 3-point shots, 2-point shots and offensive rebounds, an NBA possession at the point of a shot without a shooting foul is expected to produce about 1.12 points. Note that when the possibility of a steal is still in play, that value is slightly lower. When a shooting foul (or any foul in the bonus) is called, it removes a range of potential outcomes from the possession and replaces them with a new set.
When a shooting foul is called, we replace a shot’s value (basically a simple function of 3-point shooting percentage or 2-point shooting percentage and offensive rebound probability) with a range of outcomes driven by a function of free throw shooting percentage and shooting percentage while fouled, which is lower than typical shooting percentage. Before the outcome is known and without applying any player-specific details, when the whistle blows on a 3-point shot, we can estimate the value at around 2.6 points (based on 2020-2021 season statistics). When the whistle blows on a 2-point shot, we can estimate the value at a little over 1.8 points. The 0.70 and 1.50 values I mentioned above are the differences between what we expected before the whistle and what we expected after it blew.
As you might expect, the opportunity to draw 3-point foul calls was a magnet for shenanigans over the past several years. Before that time, fouls on those plays were very rare. But for a roughly 5-year window, James Harden, Dame Lillard and to a lesser extent a select group of other long-range shooters developed a technique for intentionally drawing fouls on 3-point plays. The result was a significant rule change for this season aimed at slowing that technique. While we are still (hopefully) in the process of reducing foul farming from beyond the arc, even now the vast majority of shooting fouls come in at the 0.70 expected impact range. The other major class of penalties – those which result in a change of possession – have a roughly 1.05 expected impact, since their function in an alternating possession game can be almost completely reduced to the elimination of the value of a possession, with the attendant possibility of a steal that was eliminated from our prior example.
So how do we put this in context? Well, the average margin of victory in an NBA game in the 2020-2021 season was 12.1 points. The average margin against the closing spread was 11.0 points. By far the two most common officiating actions would have, if called for the losing team, changed the outcome of about 4% of games. They likewise would have changed the winner on roughly 7% of bets against the spread. Even a now-rarer shooting foul on a 3-point shot would only have slightly more impact – making the game outcome figure about 6.5% or so, and the affected bets number about 9%.
% of 2021 NBA Games by Point Differential and Differential vs. Spread
Because basketball officiating is so bounded by the game’s alternating possession mechanic, it is hard for the impact of any one foul call to be much more than this. Because it involves both a point-scoring opportunity and a switcheroo on the basic alternating possession mechanic, the exceedingly rare flagrant-2 has an expected points impact of around 2.5 and a potential swing of as many as 4-6 points depending on how things play out in reality. An outright ejection has a potentially large but nearly impossible-to-quantify expected and potential impact. The way in which fouls move a player toward disqualification matters too.
But by and large, alternating possessions dominate. The only real deviation under normal circumstances has to do with differences in how each possession starts. That is, it is better to start a possession with a steal than a defensive rebound, and better to start with a defensive rebound than an inbound pass. A foul call might bleed over into the next play somewhat by changing the nature of the possession’s start. That said, based on the data slicing done by NBA stats enthusiast-cum-actuary Mike Beuoy on Inpredictable, a possession that begins on a defensive rebound is only worth about 0.04 points more than a possession that begins with an inbound pass. A steal is worth 0.19 marginal points. So the nature of possession change matters, but not very much in comparison to the effect on the scoring play itself.
The effects of officiating on the NBA are clearly not nothing. For all of the very real exceptions and rare cases above, and because being a fan of professional sports IS being mad about officiating from time to time, NBA fans are as vocal about officiating as fans of other leagues. And there are a decent number of such calls over the course of a game – about 20 per game that resulted in free throws in 2021. But all considered, relative to MLB and the NFL, the expected impact of a single officiating action (or non-action!) in basketball is pretty muted.
Baseball is a bit more complicated, even before we get into how bad calls can go very bad in a game in which possession doesn’t change after you score.
Whereas the expected impact of a single play in basketball is constrained to a small fraction of the margin of victory or margin against the spread, the wrong adjudication of a single baseball play can result in a much larger initial burst of runs or denial of runs rightfully scored. That’s mostly because there is a wider variety of types of scoring plays in baseball. While the expected impact of a basketball foul will vary mostly by the relative field goal percentage vs. free throw percentage of the aggrieved player, in baseball it will also vary significantly not only based on the players involved but as a result of the (1) count, (2) runners on base and (3) outs. More specifically, there are 288 possible base-out-count states with a different number of expected runs. If that sounds like Greek, just know that it represents all possible permutations of the possible number of balls (4), strikes (3), outs (3) and baserunners (8) that could define a particular situation (i.e. 4*3*3*8 = 288).
In terms of the impact on expected runs, the base-out-count scenario that places the highest leverage on officiating action is a full count (i.e. 3 balls and 2 strikes) with zero outs and the bases loaded (i.e. men on all three bases). The version of this scenario with two outs is only about a 20th of a run lower in expected runs, if the twelve-year old you playing whiffle ball in your backyard needs validation for his baseball fantasies. In that scenario, an officiating action which wrongly calls a ball a strike or vice versa would have a 1.78 expected run impact (for more on Umpire Runs Created, see the great work of Ethan Singer at FanGraphs). In baseball, that is a remarkable amount. In the 2021 season, for example, a single wrong call against the team losing or missing the spread in that situation would have changed the outcome of at least 28% of games. It would have a gentleman’s chance at the other 18% of games that were decided by 2 runs, as well.
% of 2021 MLB Games by Run Differential
Obviously this isn’t remotely an apples-to-apples comparison with an average field goal attempt made in basketball. The latter simply doesn’t have the scenario diversity of baseball. Also, a full count, nobody out, bases loaded situation is really rare, and the odds that it would pop up in a game in which an officiating error would change the outcome would be rarer still. About 14% of counts run full, and a bases loaded, no outs situation pops up only about every nine games or so. Furthermore, a baseball umpire may have no opportunity to influence the game at all. All the batter must do is swing to more or less take the home plate umpire out of the game. Except, you know, when they don’t swing and still get called for a swing. Sorry, Giants fans.
Yet we needn’t rely on the extreme scenarios – even in the lowest leverage situations (e.g. 0-0, no outs, nobody on base), the expected run impact of a wrong ball strike call is a comparable percentage of the average run margin (~3%) as the expected point impact of a wrong call on a jump shot or offensive charge is of the average NBA point margin (~4%). In almost all other base-out-count scenarios, the home plate umpire’s error should be expected to have a larger game impact. There is an objectivity argument to be made about how much he may realistically do so against the more subjective shooting foul or charge call in the NBA, but let’s save that for a later point in our discussion.
The one special thing about baseball relative to both basketball and football, however, is that there is no alternating possession mechanic, or rather, that the alternating possession mechanic is unrelated or inversely related to the act of scoring! When you score in baseball, it is also (except for sacrifices) a possession-sustaining play. Because a bad call hastens the end to or prolongs a theoretically infinite opportunity to score, when we switch from what we expected to what actually happened, there is no end to the tail of consequences to a bad umpire. Some of this is not so hard to imagine. A true ball four with the bases loaded and two outs that is called a strike may represent an expected swing of 1.74 runs, but scenarios in which the next guy up after the walk would have hit a grand slam are perfectly reasonable to consider. There are absolutely games in the annals of MLB history in which runs scored after the proper end to the inning are 200% or more of the average margin of victory. I believe there are also games NOT in the MLB history books in which the runs scored after a premature end of the inning would have exceeded 200% or more of the average margin of victory.
The average MLB officiating action has moderately higher expected impact than that of the average NBA officiating action. The potential impact of some MLB officiating actions is in an entirely different zip code from that of the most extreme NBA officiating actions.
Football has even more scenario diversity than baseball; it isn’t infinite, but for our purposes it might as well be, given that it is dimensioned by down-and-distance, current yard line and at certain times the time remaining on the clock. Constraining it to down-and-distance and yard line is the framework for expected points used in most traditional football analytics. Interestingly, this terminology and framework actually pre-dates most of the pioneering and more famous work by Bill James on baseball. It comes from a seminal 1970 paper co-authored by the then-quarterback of the Cincinnati Bengals Virgil Carter, along with Northwestern University systems engineering professor Robert Machol.
The basic idea is that every down-and-distance and yard line has an expected points value for the team with possession. A first-and-goal opportunity from the opponent’s one-yard line, for example, yields expected points for the offense that are a shade less than the value of a touchdown and extra point. A 4th and 4 from your own five yard-line, on the other hand, will have negative expected points, which is to say that it is more likely that the defense will be the next team to score. All of this makes it pretty easy to quantify the effect of a non-scoring play: we simply compare the expected points before the play and after the play to determine the change in probability-adjusted scoring potential, or, expected points added (EPA).
Now, because we are interested in both the expected and potential extreme effects of officiating on game outcomes, we will also borrow the term to refer to certain counterfactuals. That is, for our purposes it is not only useful to know how much expected points changed from the beginning of one play to the next, but also between a play’s outcome and the hypothetical outcome of a play if a penalty were called or not called. Since this is not intended to be an exhaustive analysis, we will be more anecdotal in our review of the points on the line in particular play situations, although if it is a subject of interest, I strongly suggest that you follow Ben Baldwin and Sebastian Carl. They are generous with their contributions to open source scraping and processing of NFL data for the broader community of analytics enthusiasts.
The long and short of it is that the ordinary flow of NFL games creates frequent and significant opportunities for a meaningful change in the expected outcome of the game. And to a far greater extent than in MLB or the NBA, officials must make decisions on every play that fall well outside the scope of ‘just’ penalties. NFL referees, umpires and line judges determine ball placement. They determine whether the ball crossed a hypothetical vertical plane. They determine whether feet were in-bounds or out-of-bounds. They determine possession or non-possession. They determine whether the ground caused a fumble. They determine if one of fifty or so different penalties took place. They can issue judgments on the game and play clock. They determine when and where momentum of a player was stopped. They determine if a “football movement” took place. They determine when the process of “giving one’s self up” took place.
But the penalty judgments do matter. A lot.
For example, the average game yields just under three offensive holding penalties. Depending on the circumstance, such a call may produce EPA of almost nothing (e.g. when the team is already backed up in a bad down-and-distance and half-the-distance doesn’t do much) or nearly a touchdown worth of expected points added when it takes points off the board on a very long play from scrimmage. While that isn’t the norm, it is not at all uncommon for an offensive holding call to cost a team 20, 30 or 40% of the margin of victory in the average NFL game. A 3-point swing in the NFL toward the losing team affects about 22% of game outcomes and bets. At 4 points that number gets closer to 30%. Consider, as well, that uncalled iterations of offensive holding on long plays represent every bit as much of a theoretical swing between reality and the counterfactual scenario in which the penalty was called.
The more egregious penalties, of course, have an even sharper impact. It is not at all extreme for Offensive Pass Interference (0.4 / game) to approach 4 or 5 points in EPA, especially because of its tendency to be called when a receiver has otherwise made a very successful play. Defensive Pass Interference (1 / game) finds its mode around 2-3 points, but is also full capable of swinging up to nearly a touchdown’s worth of EPA. An impact of 2-4 EPA is the norm for Roughing the Passer (0.5 / game). Various other unsportsmanlike conduct penalties contribute a mean of 2 EPA, with an outside chance at 4. Remember, once again, that these are the penalties that were called. We are not accounting for penalties that ought to have been called and weren’t at all.
But in the NFL, penalties are just the beginning of officiating actions. Consider the simple decision about where to spot the ball. The difference in value between a successful conversion of a 4th and short play and a failure will routinely be as much as 4 points, or 36% of the average margin of victory in an NFL game. The traditional EPA from the play will be a bit less, obviously, since expected points decline in 4th down scenarios when the higher probability of a non-conversion is explicitly accounted for. But if we are instead comparing the difference between known conversion and known non-conversion scenarios, the difference in expected points can come in at a meaningfully higher number. Bear in mind that these 4th and short scenarios occur, on average, about twice in a normal game, depending your definition of “short.”
The decision to rule something a catch, to rule something a fumble (or not a fumble!), to rule something an incomplete forward pass or a fumble, to rule someone out of bounds can, when the play (1) results in a turnover, (2) covers a significant portion of the field or (3) converts a deeply unfavorable down-and-distance, routinely produce leverage of 2-4 expected points on game outcomes. Whereas MLB puts massive expected points in the potential hands of officials on very rare occasions, and whereas the NBA puts a comparably small quantity of expected points in the hands of officials on more numerous occasions, the NFL puts the power over 15-30% of the average margin of victory and 20-40% of the average margin against the spread in the hands of officials on nearly every single play. And they positively exercise that power, for better or worse, more than a dozen times in the average game, even ignoring false negatives.
% of 2021 NFL Games by Point Differential and Differential vs. Spread
As I will continue to say, this has nothing to do with NFL referees being bad or worse than those of other leagues. This has only to do with the structure of the rules and scoring of the games themselves, and the inherent power that that those rules vest in the officials, good OR bad. But that’s not the end of it.
You see, not all rules are created equal.
The Importance of Rules Subjectivity to Outcome Determination
Subjectivity in officiating is a bit of a messy and, well, subjective topic, so it’s helpful to think of it in three dimensions:
- Does the observation by an official leave room for human interpretation of whether it warranted a particular ruling?
- If the official could have perfectly observed the event, would there still be material room for human interpretation?
- Do mechanisms for video review mitigate the effects of #1 or #2?
By and large, each of the three leagues is vulnerable to questions on the first dimension. The rules of basketball, football and baseball are enforced by humans observing what they see, and humans don’t always see things clearly when they happen quickly or at a distance.
Things begin to deviate on the second dimension. Variability in the officiating of baseball is almost entirely driven, I would argue, by problems of the first dimension; that is, MLB’s rules are largely objectively written and subjectively judged. The correct strike zone is not in question. The definition of a home run is not in question. What’s more, because the rules themselves leave little room for subjectivity in intent rather than judgment, for an entire class of potential subjective judgment errors (out and safe), the process for review and redress is very likely to lead to resolution.
The NBA sits somewhere in the middle. Like MLB, a significant share of its rules are objectively written (e.g. traveling, goaltending) and subjectively judged. Yet there are areas in which the rules themselves are necessarily subjective even before the layer of human observation. Terms like “a normal step”, “may not crowd out”, “incidental contact” and “if it does not impede…balance and/or rhythm”, “unnecessary and excessive” and “if repeated acts become a travesty” are best efforts to constrain activities in a game in which bodies come into limited contact. What constitutes incidental contact on a shooting foul is not only a question of subjectivity in observation but in the proper interpretation of the rule itself.
The NFL rulebook is on the other side of the spectrum. After a small number of what I would describe as objectively written rules (e.g. False Start, Offside, Encroachment), it includes a ghastly, if perfectly necessary, quantity of subjectively written and subjectively judged rules. Consider, by way of example, the rules for the interpretation of the aforementioned offensive holding, a penalty which results in a loss of 10 yards and whatever yardage was gained and points that were scored on the play on which it was committed.
It is a foul if an offensive blocker:
(c) Use his hands or arms to materially restrict or alter the defender’s path or angle of pursuit. It is a foul regardless of whether the blocker’s hands are inside or outside the frame of the defender’s body. Material restrictions include but are not limited to:
1. grabbing or tackling an opponent;
2. hooking, jerking, twisting, or turning him; or
3. pulling him to the ground
When a defensive player is held by an offensive player during the following situations, Offensive Holding will not be called:
- if the runner is being tackled simultaneously by another defensive player;
- if the runner simultaneously goes out of bounds;
- if a Fair Catch is made simultaneously;
- if the action clearly occurs after a forward pass has been thrown to a receiver beyond the line of scrimmage;
- if the action occurs away from the point of attack and not within close line play;
- if a free kick results in a touchback;
- if a scrimmage kick simultaneously becomes a touchback;
- if the action is part of a double-team block, unless the defender splits the double team, gets to the outside of either blocker, or is taken to the ground; or
- if, during a defensive charge, a defensive player uses a “rip” technique that puts an offensive player in a position that would normally be holding.
Interpreting Offensive Holding isn’t just a matter of a referee observing a play and evaluating what he sees against a standard. It is also a matter of the subjective interpretation of multiple difficult-to-define standards in themselves. Under one interpretation of the term, “grabbing an opponent to restrict…a defender” is literally the action taken by 100% of blockers on 100% of blocking assignments, for example. What does “away from the point of attack and not within close line play” really mean? What is the objective standard for “unless the defender splits the double team?” How far is “to the outside of either blocker?” Does “turning” a defender include moving forward and establishing a seal block? Does a sufficiently strong first punch from a left tackle constitute “jerking” a defender?
The number of rules incorporating such fundamental subjectivity is staggering. Defensive pass interference will be nullified when it is tipped at the line, but only when that tip takes place “noticeably before” the interference. As with Offensive Holding, multiple rules use the expansively arbitrary construction of “including but not limited to” to legitimize practically any officiating action. The rule for securing a catch, the infamous Tuck Rule, Intentional Grounding and Roughing the Passer all include language wide enough for the Fridge to run a fullback dive through without taking a hit.
And the rule for a quarterback being “in the grasp” makes each of those look air-tight by comparison.
Then there’s the practical common knowledge about the real-world implementation of rules. Because stiff arms are cool, running backs don’t get called for facemask penalties. Pick routes are somehow OK within some arbitrary limits. Depending on the current phase of the moon, delay of game is called somewhere between 0.1 and 4 seconds after the play clock hits zero.
For a variety of reasons, the NFL has determined to make nearly all such penalty calls non-reviewable. Beyond penalties, some of the most potentially influential plays on the field (e.g. fumble return, interception return, catch-and-run) can have significant portions made non-reviewable when the official blows the play dead. That is, a fumble ruled an incomplete pass on the field but returned much of the field for a defensive touchdown – not at all an unheard of play – can have the entire runback (worth as much as 4 or 5 expected points added) negated without possibility of review. Furthermore, the NFL’s large playing field and very large number of players in motion creates a much larger chance that reviewable plays will lack sufficient visual evidence to overturn. It’s hard to see where the nose of the ball was when the ball carrier’s knee hit the ground when there are twelve bodies on top of him.
Of course, there is also judgment-based subjectivity to consider, but that exists in all sports. NFL officials are neither more nor less human than their peers in MLB and the NBA.
Yet even on this dimension, the NFL is still different.
The Importance of Rules Violation Prevalence
Unlike MLB and the NBA, the NFL’s rules as written are violated egregiously on every play of the game.
Don’t get me wrong, traveling hasn’t been called in an NBA game since I was a kid, and I’m close enough to being in Gen-X for that to still bother me more than I care to admit. But that’s the thing: traveling happens all the time, it is never called, and so the rule is effectively understood and applied to be something other than how it is written.
That is NOT what happens in the NFL.
Based on the official definition of Offensive Holding, I defy you to find a single NFL game in which you could not justify a flag on at least 75% of snaps.
Based on the official definition of Defensive Holding, I defy you to find a single NFL game in which you could not justify a flag on at least 75% of snaps.
Based on the official definition of Defensive Pass Interference and Offensive Pass Interference, I defy you to find a single NFL game in which you could not justify a flag on at least 50% of snaps.
Based on the official definition of Illegal Use of Hands, I defy you to find a single NFL game in which you could not justify a flag on at least 90% of snaps.
Unlike traveling, each of those penalties is called with some frequency. But they simply cannot be called all the time, and so officiating in the NFL becomes a sort of triage, in which some of the most impactful decisions affecting a game pass through an informal lens of fairness, egregiousness, visibility, circumstance and “impact on the play.”
So what is the result of vaguely defined rules which are violated so frequently that strict enforcement is a literal impossibility? Arbitrariness. Randomness.
I know that of all the sections so far, this is the one that is most subjective, the most my opinion vs. the result of some generally accepted conclusion of sports analytics. It is hard to demonstrate objectively that offensive holding is happening on every play, precisely because it is hard to demonstrate based on empirical analysis of penalties flagged what actually constitutes offensive holding. Yet it is this section that is the one that I think is the most important right now.
Because the NFL is the biggest sports league in the world by a shocking margin. Because each game in the NFL matters more to the outcome of the season than in any other sport. Because each play in the NFL matters more to the outcome of each game than in any other sport. Because the inherent subjectivity of the rules gives more power to officials to determine the outcome of those high leverage plays. Because the looseness of the rules gives those officials nearly infinite cover to modify their adjudication to great effect while remaining within the margins of those excessively loose rules.
And NONE of this is even remotely new. All of this has been true for decades.
But the narrative of the NFL in 2021 is undergoing a change that is rapidly putting those features under a new microscope.
The Changing Narrative at the Intersection of the NFL and Gambling
If you watched the Steelers-Bears matchup on Monday Night Football last week, as I mentioned earlier, you might have come away muttering something about the fix being in. And for the most part, you probably meant it in the usual way that sports fans mean it, which is “I’m pretty upset about the outcome, and I think bad officiating influenced it.” If so, you weren’t alone. Unless you were watching at the Bridgetown Taphouse up in Ambridge, PA or something, in which case you probably were alone in thinking that.
Now, I think the number of sports fans who really, truly believe that officials and NFL executives are out there betting on games or intentionally changing the outcome is pretty low. I mean, give people Monday to vent and take their copium, and after that point I think the number of conspiracy theorists falls back to something in the high single digits. OK, maybe low double digits.
I also think the number who are entertaining the idea, or at least believe that others are increasingly entertaining the idea, is rising. To the latter point, Seton Hall conducted a poll in 2019 which showed that some 2/3 of Americans believe betting and gambling affects how people perceive the officiating in games.
I know that the number of people who are at least talking about it, with tongue firmly planted in cheek or otherwise, is rising.
But this is still in anecdotal land.
So instead of telling stories, let’s examine a large dataset of medium and high reach sports news, blogs and other reports for an identical period between the first game of the season and the end of Week 9, from the 2014-2015 season through the 2021-2022 season. If you’re wondering, that historical period is selected because that’s what we have access to. Across these windows, the dataset includes about 200,000 distinct articles.
Let’s begin by exploring the density and centrality of gambling-related language in the coverage of the NFL over each of the above windows. The below chart shows the frequency of the use of language we relate to gambling and betting over time. Note that we have specifically excised language relating to “Las Vegas” and derivatives of it so that this exhibit doesn’t just become an unintentional reflection of the coverage of the Raiders’ move from Oakland.
What the above graph means is that when journalists, news outlets and large-scale bloggers write about the NFL and its games, players, events and franchises, they are also framing at least part of what they are writing in terms of gambling and betting. It also means that the rate at which they are doing this has more than quadrupled in just the past 7 years.
But even this understates the influence of this language. If we use basic linguistic similarity algorithms to eliminate all potential duplicate / syndicated articles and focus our analysis on just the articles that have the most connected language; that is, the articles which don’t just mention the NFL but are ABOUT the NFL, both the concentration of articles about gambling and the breathtaking rise in its use as a framing for talking about the NFL become even more exaggerated.
My guess is that this represents a confirmation of what you already felt was happening in NFL coverage, but perhaps not from traditional news coverage itself. Maybe you have noticed that ESPN and FS1 shows have become dramatically more focused on the framing of game discussions in terms of lines and odds this season. Maybe you have noticed the “limited” array of sports-gambling related advertisements that are now running during game broadcasts. Maybe you have noticed that pre-game predictions this seasons are increasingly taking the form of betting-style against-the-spread formats. Maybe you have noticed when most early window games are in half-time that there is a prominent new “second half over-under” bug on the bottom-left of your screen. Maybe you have noticed in-game ad-bugs telling you about a really cool prop bet you can put in RIGHT NOW on Cooper Kupp.
I’m not sure if the NFL was just trying to “look” responsible by limiting the number of sports betting ads per broadcast to six, or if it legitimately thought that they could slow-play the reframing of their sport around the new national pastime of everything-speculation. It’s safe to say at this point, however, that the MGM Lion is out of the bag, and it ain’t goin’ back in any time soon.
For those who would respond that the NFL was always ultimately about gambling, well, no shit. That isn’t the point. The point is that when everybody knows that everybody knows that the NFL is about gambling, some things start to change.
One of the things that is starting to change? The coverage of officiating and penalties.
Now, I think it is less useful to simply chart the density of references to officiating and penalties, since, like players, passes, yards and scores, nearly every article about the sport will include at least some references to them (for what it’s worth, however, our measurement of that density is still up 37% from last year to this year). What we can do, however, is measure the extent to which the articles with language referencing officiating and penalties are more linguistically similar to the overall coverage of the NFL. In other words, we can chart the influence of language about officiating, officials and the impact of penalties.
The below graph charts the influence of officiating-related language on NFL narratives over time. The percentages are normalized based on the number of articles published to reflect the percentage of the linguistic influence we might expect for the average topic. Seven years ago, officiating was a slightly less than average way to frame NFL news. Today, it is nearly 20% higher than average, which in this arcane measure of ours is…a lot. A whole, whole lot.
Unless you are framing the NFL narrative as being “about Tom Brady”, “about Aaron Rodgers’ vaccination comments” or “about your Fantasy Football team”, there really isn’t an overarching story with the same gravity as “Officials are playing too large a role in today’s NFL.”
It doesn’t take a rocket scientist to see what’s happening here.
The soft underbelly of what I believe to be the greatest sport ever invented has always been the necessarily inordinate influence of officiating decisions relative to other sports, along with the near undetectability of intentional official intervention in a game outcome relative to other sports. Again, not because NFL officials are bad or worse than those of other leagues, but because it is how our favorite game is structured. Until 2021, that has been private knowledge. It was something we all independently knew.
But as we all look around and see that everyone else sees an industry that is largely about gambling, our private knowledge about that soft underbelly of the NFL is becoming public knowledge as well. Common knowledge. You don’t have to buy into our linguistic methodology to see with your own eyes that we are talking about officials a hell of a lot more BECAUSE the games are now ABOUT gambling. There is nothing in cultural and social markets that has so powerful an effect as the transformation of private knowledge into common knowledge. It is THE memetic impulse, the impetus for almost every kind of change.
This change works for me. I like betting on sports. Some of my best memories are going to Vegas with the guys for Final 4 or of spending a Sunday afternoon in the sportsbook back in the days when they still comped drinks. Even now, as a resident of Connecticut, I personally love being able to lose money on a dumb three-way parlay on some ugly dogs that almost ended up being a really, really good one.
This change works for people like Barstool Sports, whose founders have made themselves fabulously wealthy by cultivating a user base that sat at the informal intersection of gambling and sports media. And then they transformed that into a business that sat at the literal intersection of gambling and sports media when they were acquired by Penn National Gaming.
This change works – for now – for the NFL. Studies repeatedly demonstrate that bettors are more avid, more frequent, and more engaged watchers of sporting events. And as we have written in dozens of other places on this site, everything is going to have a speculation layer attached to it. They’re embracing a trend at a powerful time, and they’re absolutely going to make more money from this in the short run.
But in the longer run, I am not so sure this is going to be the kind of change that the NFL wants.
Over the coming months and years, common knowledge will make plain through anecdotal examples just how lucrative and practically undetectable the NFL rule structure (remember how many penalties can justifiably be called on every play?) might make participation in a game-fixing scheme, given the increased acceptability and embrace of gambling on the NFL. I still think that is extremely unlikely, but it doesn’t matter if it ever actually happens. What matters is what happens when members of the media and fans know that everybody else knows that the only barrier between us as fans and a corrupt game is the perfect integrity of individual officials. That’s it. I think we’re close to that point today.
And THAT common knowledge will continue to make game and league coverage about officiating in a way it hasn’t before. It will continue to make game and league coverage about gambling, even if the NFL tries to pull back from its Goodell-on-Draft-Day embrace of Las Vegas. It will bring debates about the role of officials in game outcomes from the bowels of team message boards and local bars to the most significant pages of sports journalism. It will jeopardize and swamp the narratives the NFL and its franchises want to push and promote.
The NFL has a gambling problem. And I don’t know that there’s anything that they can do about it.