Welcome to the Science of Sport, where we bring you the second, third, and fourth level of analysis you will not find anywhere else.

Be it doping in sport, hot topics like Caster Semenya or Oscar Pistorius, or the dehydration myth, we try to translate the science behind sports and sports performance.

Consider a donation if you like what you see here!

Did you know?
We published The Runner's Body in May 2009. With an average 4.4/5 stars on Amazon.com, it has been receiving positive reviews from runners and non-runners alike.

Available for the Kindle and also in the traditional paper back. It will make a great gift for the runners you know, and helps support our work here on The Science of Sport.

Friday, January 30, 2009

Tennis statistics and fandom

The epic semi-final: 5 hours 14 minutes, and a tale of the tape

Our mission here at The Science of Sport is to provide some level of insight and analysis into sports performance - an extra-ordinary view of "ordinary" sports action, if you will. But despite our efforts to remain impartial, distanced and objective when we analyse those sports, the content we cover is still inspired by the fact that we're fans - that's why running and cycling have received the bulk of our attention. I'm sure that motorsport is full of science, for example, but that's neither of our passion, so we let it slide!

But every once in a while, I'll allow myself the indulgence of a "fan post", which is to say, a post inspired more by enjoyment and less by science (though I try, if might be so presumptious as to suggest it, to write more than an adoring tribute to the athletes).

Last year in July, I almost did such a post, the day after Rafael Nadal beat Roger Federer to win Wimbledon in what was without a doubt my highlight of the sporting year (beating even the Beijing Olympics and Bolt's 100m race), and one of the greatest sports moments I've ever seen. On that occasion, I resisted, mostly because I didn't have anything but an empty fan's opinion. Tonight, I decided to go ahead anyway...! Seriously though, there are some very interesting questions that arise when you dig a little deeper behind the stats and start asking questions about the result, which is partly what this post is about.

Rafael Nadal 6-7, 6-4, 7-6, 6-7, 6-4 in 314 minutes

The second men's semi-final in Australia was a classic - the longest match in Australian Open history at 5 hours 14 minutes, and the fourth longest Grand Slam match ever, it saw Rafael Nadal (eventually) defeat a swinging Fernando Verdasco to book a place in the final against Nadal. It was epic, full of excitement, some of the best rallies you'll ever see (two in particular stand out, both won by Nadal with outrageous shots). And it's certainly worth something of a deeper look, a look at the match statistics.

A statistical overview - what numbers tell us

A total of 385 points were played - Nadal won 193 of them, Verdasco won 192. The amazing stats don't stop there: Verdasco hit an incredible 95 clean winners compared to only 52 for Nadal. But, testament to the pattern of the game, Verdasco also made 76 unforced errors compared to only 25 for Nadal. The difference was therefore + 19 for Verdasco and + 27 for Nadal, which I guess could be argued was the difference between the two.

The number of winners and unforced errors is a pretty reliable indicator of who is making the play and how the points are developing. Given nothing more than those two numbers, it is usually possible to guess what transpired. To a certain extent, that was true of the Nadal-Verdasco clash.

For long periods, Nadal was ploughing a trench about 3 m behind the baseline, as he raced from side to side chasing down Verdasco's strokes. The high risk game of Verdasco inevitably resulted in frequent errors, and Nadal was able to survive for long periods thanks to brilliant defense and the occasional slip up from Verdasco.

Would Verdasco have been more successful had he adopted a slightly more cautious approach? That's the question. I suspect not, though the temptation exists to say that a player like Federer will not miss some of the crucial shots that Verdasco did, and will put Nadal away. But then again, Federer might never get into those situations to begin with - has Nadal ever conceded 95 clean winners in a match?

Where do players win their points? What the numbers miss

However, there has to be more to it than winners and unforced errors. If you do the math, you'll work out that for Nadal, 128 of his 193 points came thanks to those winners and unforced errors (52 and 76, respectively). That leaves 65 points unaccounted for. Four of them came from double faults (the aces are included as winners, by the way). That still leaves 61 points not measured in the official statistics.

It's similar for Verdasco. He won 192 points, with 95 of them coming from his winners, 25 from Nadal's errors, and 3 from Nadal double faults. That means that he's "missing" 69 points.

Those points, of course, come from "forced errors", and that's one thing the official match stats don't capture. They don't tell you that Nadal won 61 points by forcing errors from Verdasco through his heavily weighted topspin shots, or that Verdasco won 69 points that came off forced Nadal errors and were often strokes that other players would not have reached, such is the ability of Nadal to cover the court and defend.

The importance of timing - stats need to prioritize big points

But more than this, what stats don't say is how players respond in pressure situations. The most telling fact of all in the Nadal-Verdasco match, in my opinion, is the enormous difference between the two in terms of break points created. Nadal had 20 break points in the match, Verdasco had only 4.

Nadal managed to convert on only 4 out of 20 (20%), and that was the reason for the epic match. Had Verdasco not performed so well on the major points, the match would have been over far far sooner. Nadal was, for much of the match, in control, and particularly in the fourth and fifth set, he never looked in trouble on his own serve, but put all the pressure on Verdasco.

In the match stats, you can tell this because Verdasco made 212 serves, Nadal 173. That means that 39 more points were played on the Verdasco serve than Nadal's, and this is symptomatic of who was under pressure while serving. In the fifth set alone, Nadal created 8 break point chances, while Verdasco did not see a single one. Watching the game was exciting, but I did not ever really get the feeling that Nadal was behind - he couldn't close it out, but he "felt" in control, and this was the reason.

The important, and missing statistic then, is decision-making and execution on decisive points. It would be great to see a stat of rally lengths on break-point, because then you'd see that Verdasco survived thanks to a host of two stroke rallies - his serve and an error from Nadal in response.

Tennis misses a trick

Unfortunately, tennis doesn't seem to provide those stats. The other stats I'd really like to see are game analysis statistics. In sports like soccer and rugby, performance analysis is now commonplace, and teams analyse opponents' strength and weaknesses. I'm sure the same happens in tennis, but tennis seems such a great candidate for more in-depth analysis for TV coverage. For example, I'd love to see a graphic of where Nadal is on the court when he hits the ball. You'll find that 80% of the time, he's about 3m behind the baseline. Verdasco was probably 1 m INSIDE it.

And that was a telling strategic difference - Nadal was hanging on, defending for his life, and only his ability to defend pulled him through. He needs to be more aggressive, gain the ascendancy earlier in the rally and take the initiative if he is to beat Federer. Nadal looks vulnerable every time he plays a big hitter, because he camps behind the baseline, drops too many balls short, and all it takes is a consistent performance by the opponent and he struggles, as he did last year against Andy Murray and Gilles Simon.

I've actually just emailed Hawkeye (the company that provide a lot of these graphics and stats) and asked them how I might obtain some of this information, because it would make a fascinating analysis. Where do players stand when receiving? What is the average speed of their shots, and where do they tend to hit them? Tennis really does lend itself to this, I'm very surprised that no one is jumping on it to help explain the game for TV viewers...

Looking ahead to the final

So the final will be a rematch of the Wimbledon epic. I must confess that unless Nadal changes his strategy and steps forward into the court more, he's going to lose to Federer - you can't run around 3 m behind the baseline and win all your matches. Also, Nadal has one fewer day to recover from a 5 hour marathon than Federer, who played in a 3 hour semi-final. It's actually ridiculous that they don't play the semi-finals on the same day, like at Wimbledon...

In Nadal's favour is that his forehand tends to pick out Federer's backhand (it's easier to hit cross-court, both mechanically and because it's usually over the lower part of the net), and with the spin, that works in his favour. Federer also does not seem the type to play the kind of match Verdasco did today. So the final should be a different type of match. Nadal will need it to be - he can't keep sitting back and defending his way to victory...



P.S. For our American readers who don't follow tennis (well done for reading this far!), I hope you enjoy the SuperBowl!


Anonymous said...

Dr. Ross, again, as always, a very well thought out analysis and great proverbial questions posed (ie, why AREN'T there more stats talked about, dove into on tv?).

I've been Twittering the same thing: Rafa better be able to recover from one of the greatest tennis matches I've seen in years in order to be strong against RFed who's had less match time and more rest.

BTW, the juniors' matches were some of the best in years and some of the best reported on in a long time, too. Look for the next tennis generation to be exciting.


Anonymous said...

Dr. Ross,

Thank you for he insightful approach to tennis statistics. I do have one question that is bothering me though.

How does one calculate the number of winners if historically the statistics from the grand slam websites do not track a "clear" tally of winners. It seems that it is only a stat we see while watching TV, but is lost to us in the archival records of each game.

Any insight, direction or tutelage would be helpful.

Thank you,


Ross Tucker and Jonathan Dugas said...

Hi Andre

Exactly. It's amazing how statistics seem so meaningless to tennis. They are kept, there's no doubt about that - the commentators often alluded to certain stats, like % of points won on serve, over the course of a season. So someone out there, somewhere, is tracking this stuff!

It just seems that we only get fed the bare minimum of information. Interestingly, I actually emailed Hawkeye directly to see if they could provide some of the information, and they said they don't keep it. Rather ask the International Tennis Federation!

Amazing. I can't believe these guys haven't become more stats minded. Incidentally, I was watching the SuperBowl last night, and Americans have stats on EVERYTHING. They can pull up just about any answer in a second. Tennis probably has the same stats, but I don't know where they're kept either.

I think you'll probably find some of the stats on the ITF homepage. But as I said in the article, there's a real lack of meaningful stats. Winners is the easiest to track (apart from aces, I suppose), so that's common. But there's a world of information out there and no one is digging into it.

Very frustrating.


Anonymous said...

Thanks for a very good post! Am curious where you have found all the stats that you have found, cant see them on ITF.


Anonymous said...

this is crazy just 13 hours ago i was watching the end of Nadal's match thinking "there is no way they have all the possible statistics for tennis...there seems to be a huge amount of data you could collect on players...things like how often they hit cross court hit flat balls etc. .. some one needs to make a data base. i should make it" some where in this blog thing you talked about Hawkeye and that you email them. i would love to get in contact with them so i can set up a public data base. if you could email me at lotrnerd49@yahoo.com with any places i could get information that would be helpful.
thank you,