Results
Our software is able to successfully filter out the ambient noise, the noise that the crowd makes, and any other noise that doesn’t serve a purpose in determining the outcome of the game. Though the algorithm incorporated a number of DSP tools to isolate the desired sounds, we were forced to find alternative methods to filter the audio since direct filters also filtered the ball hits. Plots of the FFT and Spectrogram allowed us to determine attributes that ball hits and other noises of interest share, which were then used in our algorithms. Our ball hit algorithm views the ball hitting the racquet as a delta in the time domain and utilizes the tools in Matlab to sift through the audio and identify the points in the audio where there is a hit.
Figure I below shows the output of our ball hit detection algorithm on a sample rally clip. The blue plot is the unmodified data of the rally in the time domain, the overlaid red plot shows the peaks at which our algorithm detected a ball hit, and the green plot indicating the peaks at which a net hit is detected. All of the plots have been manually verified to be accurate. This rally has two ball hits followed by a net hit, and finished with the audience and umpire noise which our algorithm has properly ignored.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Figure I: Audio clip of rally with detected ball hits
Four more examples are included below. Our algorithm detects every ball hit and net hit correctly with the exception of a missing net hit in Rally 4. The sound of the net hit, which is at around 3.5 seconds in the plot, is too quiet to be detected in our algorithm.
​
​
Figure J: Rallies 2-5, which are all accurate except a missing net hit in Rally 4
The line judge calling "Out!" is also identified in our algorithm, an important sound to the logic of scoring the points. Figure K below displays the audio waveform of an out call.
​
​
​
​
​
​
​
​
Figure K: Audio clip of line umpire making an “Out” call
​
The audio clip is then used as an identifying form of an “Out” call and the audio is swept to determine where the audio matches the form of the call. The out calls are then recorded and paired with the ball strikes to determine the outcome of a point.
Because the “out” call in one of our rallies (rally 5) was buried in the sound of the audience, a modified version of the game audio was used to amplify that “out” and make it more distinct from the audience noise. This modified audio file was then passed in to our out detection algorithm.
The original and modified audio files are shown below in Figure L. Note that the only difference is the large spike in amplitude near the end of the modified file (the bottom one), where the “out” is amplified, as pointed to by a red arrow.
​
​
​
​
​
​
​
​
​
​
​
​
Figure L: Original (top) vs amplified “out” audio
​
Our team was unable to fully automate the scoring of a game as too many challenges and problems arose before the project’s deadline that we had underestimated the difficulty of -- namely, consistently identifying when the ball hits the net, handling missed serves, and automating the program to iterate over each consecutive point (also referred to as a “rally”).
​
Figure M summarizes the expected vs. actual results of the five rallies our program processed (expected / actual).
​
​
​
​
​
​
​
​
Figure M: expected/actual results of five rallies
​
An example of our Matlab code output is shown below in Figure N for Rally 1.
​
​
​
​
​
​
Figure N: Code output for Rally 1
​
Besides Rally 4, our program is able to correctly detect the sounds of a tennis match and properly score a game. However, this does require that we manually skip past the missed serves within the game, and instead feed into our algorithms the rallies themselves, after any missed serves. The logic behind missed serves is rather complicated, as they are detected as ball hits, then need to be processed for net hit and “out” detection, as well as scanning ahead to see whether there is audience noise. If we had further time to develop this method, we would identify whether a missed serve is the server’s first or second attempt by scanning to see whether there is audience noise after the serve (the audience will typically only make noise after the second failed attempt, but not after the first).
A good amount of initial progress was made on an audience detection algorithm to serve this purpose. Our program to detect audience noise looks for places in the clip where audio above a noise threshold is surrounded by more audio with the same characteristics. This logic is similar to the ball hit algorithm that looks for loud noises surrounded by quiet noises, but the applause of the crowd is an audio sequence of loud noise that slowly dies down. Using the helpful cues from the audience that this algorithm leaves us with, we could ideally determine whether the server served an ace or whether the returner won the point. Applause could also be very useful when attempting to fully automate the reading of a match since points are broken up by periods of applause. Figure O shows the output of this audience detection algorithm.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Figure O: Audience detection (orange) overlaid onto rally plot
​
​
In conclusion, our project was able to take the audio file of an individual rally and predict whether the winner of the rally was the server or the returner. Following analysis using FFT and Spectrograms, we created a program to remove noise and then search the sound vector for particular useful events, such as the sound of the ball hitting the racquet, the audience applause, the ball hitting the net, and the umpire’s “out” call. These events were then processed to obtain information on the number of times the ball was hit during the rally, allowing for us to determine the winner of the rally. The proper logic for scoring the game (0-15-30-40) is fully implemented based on the information we gathered, except for handling deuces, which added a layer of difficulty beyond what we could finish on time. We reached outside the scope of the course to develop a ball hit detection algorithm that is non-causal as it scans ahead to make predictions about the present data.
Some improvements we could make to the project include algorithm refinements and an algorithm that would be able to automatically determine the rally times, given the audio file of the entire game. This program would then be able to split the game into its individual rallies autonomously, using factors such as the elapsed time in between ball hits and audience applause sounds, as well as serve attempt identification, and determine the winner of each rally and game automatically (without needing manual adjustment of rally starts and ends). Furthermore, due to the difficulties of speech detection, the “OUT” detection algorithm was subpar in accuracy, as the correlation function proved to be ineffective at identifying similar speech. Nevertheless, our project was overall largely successful, as we were able to break down the goal into more manageable parts and devise mostly successful algorithms to solve each part.









