Overview

Table of Contents

I. Summary of Project Idea

II. Research

III. Next Steps

Summary of Project Idea

Our team’s goal is to create a software that takes in an audio clip of a tennis match and detects when each point begins and ends, in addition to scoring said point. Given how the crowd is silent while the ball is in play, we will be able to identify the sounds that pertain to how the game is scored, such as a player hitting the ball across the court, the ball hitting the net, or an umpire declaring a ball to be out-of-bounds, thereby ending the point.

We deduced the following logic for the scoring of each tennis point:

Given these three audio events, we seek to determine the winner of each point and score a full game of tennis, which is scored by a sequence of 0-15-30-40 points, with a stretch goal of handling deuces and missed serves. These audio events each have unique characteristics which help us to identify them through audio processing. Below is a concept demo that connects the time-domain plot of a segment in a tennis match with its audio.

Click the lower right audio icon to enable sound.

Research

After conducting research, our team was only able to find one paper similar to this project: “Inferring the Structure of a Tennis Game using Audio Information” by Qiang Huang and Stephen Cox.

This paper uses a variety of techniques from probability and machine learning and breaks down the problem into a few simpler problems. More specifically, the article classifies certain “events”, from most broad to most specific being game points, match events, audio events, and acoustic observations. Game points are the instances for which it is the ultimate goal to detect, that is, the boundary between one point and another. Match events are major events that can occur throughout the course of a point, such as a rally, serve, and audience applause. Audio events are more minor events such as the ball being hit, and the umpire’s signal. Finally, acoustic observations are simply the raw sound data collected from the game.

The goal of the article is to determine, from the acoustic observations gathered, the points in time of the game points. This is done by maximizing the probability that a certain point in time is the game point, given the previous acoustic observations. This conditional probability can be broken down in terms of match events and audio events using the law of total probability and Bayes’ Law. The advantage of this approach is that, for example, estimating the probability that a match event is occurring, given the past audio events, is more tractable. Thus, the article builds up a hierarchical system, where starting with acoustic observations, the probability of audio events is estimated, which allows for estimation of probability of match events, which allows for determination of probability of game points.

Each of these individual probabilities is determined in slightly different ways, but the general ideas are similar. Some games are taken and used as training data for the machine learning algorithms. This training data allows for the determination of the constants associated with the models. Once a model is established, other games are used as a test to determine the model accuracy. The result was that the various steps achieved accuracy rates of anywhere from 60% to 90%.

Most of the specific techniques in the article will not be used in our project, as they are relatively complex and therefore difficult to adapt to our specific purpose. However, the idea of breaking down the problem into stages and connecting the stages with a flow chart seems a reasonable approach for our team.

Next Steps

The following is a sequence of steps taken in the project, which will be further detailed in the next pages of our website.

1. Collect audio from tennis matches (3 different samples were recommended) and load them into MATLAB/Python

2. Extract the exact audio from when a point begins to when it ends and plot its FFT and Spectrogram. Also extract sound from random points during the match and plot the FFT and Spectrogram. Compare the two results.

3. Apply a basic processing technique to the audio clip that could be useful for extracting a certain piece of data. Try and draw some conclusions from this to apply it to final design.

4. Finalize a method (or methods) for distinguishing between the sounds of the ball hitting the racquet and the player/ump/audience sounds. Also, evaluate the accuracy of this method by applying it to some game audio and manually checking the results. Further, conduct research into non-LTI systems/filters to see if they are more helpful in isolating ball hits, and attempt an implementation.

5. Write an algorithm that, given an audio signal of a game, marks out the points in the time domain where the ball is being hit and the player, umpire, or audience are making noise. This will be using the methods from step 4.

6. Explore how the points in time of the ball hit noises and player/ump/audience noises indicate the game points. More specifically, develop an algorithm which takes in the points in time of the aforementioned audio cues and outputs a guess for the points in time of the game points. The accuracy of this algorithm can be evaluated through manually checking the game points.