Transcript – Sports Data Gets a New Voice with AI Voice Featuring Corey Hill
“CH: I think we have a really good opportunity here to drive kind of a synthetic offering that has all the touch points and kind of cues of a human generated broadcast synthetically and do it at the speed of AI.”
[00:00:17] MM: Welcome to Veritone’s Adventures in AI, a worldwide podcast that dives into the many ways artificial intelligence is shaping our future for the better. I'm your host, Magen Mintchev and I’m here with Corey Hill, who is the Director of Product Management for Veriverse at Veritone. Corey is here to provide some insight into how sports uses AI voice to reach a wider global audience while keeping fans connected to the content and format that's right for them.
[00:00:45] MM: Welcome, Corey. It is great to be speaking with you on this platform.
[00:00:49] CH: Thanks, Magen. It's a pleasure to be here. Looking forward to the conversation.
[00:00:51] MM: We're talking about AI voice. It integrates real time voice cloning, hyper realistic, text to speech directly into any app product or project using an AI voice API. So, how exactly does this translate to the sports world?
[00:01:09] CH: I think really, it's about the ingestion aspect of it. So, when you really get down to it, sports is really just a series of events, relationships that are centered around the desired outcomes for that game being played. So, for example, if we take a look at soccer, it's a rich stream of metadata or facts that is really players, their relationship to the ball, other players and the relative positions on the field. So, if you think about it, a goal or goal in technical terms is really just the ball ending up in a very specific position on the field. I don't really like putting it that way as it sounds pretty boring, but that's the kind of level of granularity, that's going to enable us to drive a significant amount of magic and experience that can occur in a downstream process.
So, also make no mistake that the ability to accurately observe, record, and distribute this metadata at the speed of play is a feat of AI ingenuity in itself. Thus, we have actually decided to partner with a fantastic company called Stats Perform. And so, what these guys do as they use fantastic computer vision models to actually drive and analyze the events that I described, kind of in that soccer example of knowing which players in possession of the ball, who they're passing to, and even to a level of granularity to know which foot they're actually using to make that play, or other body parts, really, for that matter. And so, these guys collect billions of events throughout the course of sports history. That data is now being used to drive sports betting applications, fantasy sport engagement with end users and other teams. They also actually use that data to analyze performance of the players and partner with the teams themselves to drive content there.
So, I think, from that perspective, looking at sports and partnering with a company like Stats Perform, it's really a match made in heaven, for us to really be able to drive some interesting customer and user engagement through our voice capabilities.
[00:02:59] MM: Can you provide us either a real life or a hypothetical case study where AI voice capabilities are used for either sports broadcasters, content creators, even teams, leagues, or betting platforms, what does that look like from start to finish?
[00:03:17] CH: Great question, Magen. I think if we were going to leverage the soccer example, again, really, our goal is to take those facts in the streams of metadata that's coming in through a pipeline and turn that into engaging commentary or dialogue. So again, we're getting facts like player A passes with his left foot to player B. As exciting as that could sound if we kind of just repeated the facts and use that from a text to speech perspective, but we actually want to add some additional color, add some analysis of the discrete events that are occurring. So again, it's pass, but where did that past come from? Where's it going? Is that kind of an offensive play? Is the team kind of recollecting or regrouping and their defensive half to start a new attack process?
So, how we do that is we kind of start with parsing that metadata and mapping it to known values. So again, X, Y coordinates of the ball, relate to specific descriptors of where that's occurring on the field. So, it could be in the 18-yard box. It could be in the middle of the field that half or simply, maybe it's a long ball that's being passed into the right wing, and a pass that potentially goes from player one to player two and back again.
And so, our goal is to kind of take all of those relationships, that commentary, those facts and turn it into something that would be engaging. So, it's really like, “Hey, defensive player says, this isn't happening in my house. It's getting kicked back onto their offensive part of the field.” And then we use cues from this data to say, “Oh, there's a yellow card or a foul. Can we apply a specific treatment like a whistle, or even kind of a card sound that may be coming out of the the referee’s card holder?” So that we're really kind of adding and engaging through layers of additional sound effects and the actual verbiage that's coming out, and we can also use that natural language that's being generated to target and identify opportunities for very custom, very specific content insertion, or even ad insertion to help drive kind of a customized or tailored experience for an end user.
[00:05:18] MM: So, I probably should have prefaced this in the beginning, and told you that I love American football versus soccer. So, my husband would totally be head over heels in love listening to this right now. I'm going to put myself in his shoes and just pretend that I like soccer. Sorry for all the soccer lovers out there. But how is this going to affect say, his experience being the fan?
[00:05:42] CH: imagine if you're able to select your favorite teams, maybe even identify your fantasy roster for your fantasy league, identify where you are, from a location perspective, and potentially even some of your favorite personalities or voices, and we can combine those things to generate a truly custom experience for you in a football space or for you or your husband in the soccer space as well.
So, imagine your team is playing an actual an away game, so they're at the opponent’s stadium. But when they score, well, we're actually able to do is layer in the team's specific chance that would occur in their home stadium, maybe even add some additional cheering to exceptional plays, or even collective groan and a supporting crowd noise after a near shot miss in that soccer game. Same thing could be applied to the football example. When they're going after that 60-yard field goal to win the game, there's kind of some treatments that we can apply to that sound and audio to drive some anxiety or kind of really create that engaging fan content in the in the same right, or maybe a post-match day. So, in the soccer, even NFL, you typically have your game days, and there's multiple games going on. But perhaps you want kind of an audio readout of all of your fantasy teams, plays or kind of key events that have occurred over the course of that day. And we could have it presented to you in a custom voice that you want. Maybe you want your favorite musician or artist to scream out the goals or kind of lead the soccer chance, for example, to support your specific team.
Again, these could be personalities that are your favorites in real life, or they could be synthetic personalities that you've kind of cultivated through our understanding of what your specific preferences are for how you want to enjoy the game. Some of the other content around some of your history related to how you've engaged in the past as well. And so, I think there's a whole different –there's a plethora of different opportunities that are out there. I think, our goal is to really start with a great foundation here and drive kind of that endlessly customizable experience that helps you to drive engagement in other areas as well. I think to your point that this could be integrated into any application, engagement or touch point with your fans.
[00:08:06] MM: Okay. And with us, meaning me and you, being in the AI industry, we of course hear this a lot, AI for good. So, what does that mean, in terms of how we're using voice to reach a wider audience globally in the sports industry?
[00:08:22] CH: We have certain opportunities to help increase fan engagement. I think there's definitely underserved federations that don't receive the same amount of either press or just exposure that I think we can use this to drive opportunities for color commentary in places where we don't have kind of human resources to drive that content. And so, I think there's just a really good set of opportunities for visual impairment, audio description, type of support. So, maybe someone who's not able to visually review or watch a game, they're actually able to listen along and follow kind of what's happening based on the granular details we're getting from those metadata streams.
I think from that perspective, as well, there's always the localization aspect. In content, we typically focus on a few key languages, but there's significant opportunity to drive new markets and new kinds of opportunities for either advertisements or dynamic content, leveraging kind of familiar local voices or personalities that kind of increase that opportunity for engagement, without requiring to have those familiar faces or voices kind of spend a lot of time in a studio recording, repeatable sound bites, and things like that.
[00:09:39] MM: That's really cool. So, sounds like being more inclusive.
[00:09:42] CH: Absolutely.
[00:09:44] MM: Now, earlier you mentioned the Stats Perform partnership. So, what are these offerings and capabilities look like exactly?
[00:09:50] CH: Our goal is to leverage the audio that we're generating through some of our technology and kind of align that with some of the visuals assets that Stats Perform is currently providing. They have this great live action widget where essentially it shows you the position of the ball on the field, the player who's currently in possession of the ball, and kind of gives you that play by play feel. We're taking that and aligning the voice content to enrich that experience by kind of being able to give a voice to those actions and events. Then I think from the perspective of kind of the match previews, the recaps, player bios, as well as kind of the pre-match content that they're also producing, I think we have a really good opportunity here to drive kind of a synthetic offering that has all the touch points and kind of cues of a human generated broadcast synthetically and do it at the speed of AI.
[00:10:49] MM: Since this is the first-time, real time synthetic voice is being deployed in sports reporting at scale, what do you think this is going to be doing to the sports industry?
[00:10:59] CH: That's a great question. It definitely forces you to put your crystal ball and forward-looking hat on, in order to describe some of those possibilities. And I think, just from a storytelling perspective, one of the things that maybe underestimated in sports broadcast is just the number of individuals that are in that kind of that production in order to make it what everyone experiences through their TV or radio at the end of the day, right? You have a team on the back end that's kind of running statistics and getting you those match facts that come up in the middle of the commentary. Those guys, definitely, I would imagine don't have all those statistics memorized and so, there's a team supporting that.
There's a team that's supporting the audio generation and broadcasts and teams that are kind of driving the actual on field and sending people to various locations all around the globe every week in order to support these types of these broadcasts, right? And so, by identifying some of the key inputs that go into that, leveraging AI to understand and ensure that the content is being generated at the right time, and paired with other content, such as, again, ads that resonate with it. Maybe there's a yellow card given and there's an ad that has something to do with the brand, and their kind of preference for yellow, those types of opportunities are available. And again, we can do that at scale without moving humans across the globe, setting up large production facilities, and kind of really recreate that human like experience at a lower cost at the same speed that everyone would expect, as well as kind of the same kind of touch and quality that goes into those larger formats and required broadcasts.
[00:12:43] MM: Disrupting the sports industry. Is that an accurate statement, would you say?
[00:12:47] CH: That's the goal. I think that's kind of what we're all hoping for. And I think the fact that we're having this conversation shows that there's an opportunity to do so, and if we kind of do this correctly, if that's an outcome, then I think that that's going to be some of the icing on the cake of what we're going to be able to provide from a customization and scale perspective in the in the sports world.
[00:13:09] MM: Love it. So, how can people find out more information about this AI voice, especially how it now coincides with sports?
[00:13:16] CH: We're working on veritonevoice.com, which will be kind of our flagship website to help drive kind of some demos, some examples, a little bit of how we get these exciting and kind of new concepts done, as well as kind of speaks to the future of where we’re looking to go. There's also a great presence, thanks to our social media team on the various platforms, especially LinkedIn and Twitter, that you'll be able to kind of subscribe and see updates as we kind of produce these new demos and provide some really cool tangible assets to show people what's possible, and show them kind of the vision that we have moving forward. And then also, I think, it's going to be key to get feedback and kind of understand how people are perceiving these new assets and concepts, so that we can always put our best foot forward.
[00:14:05] MM: Awesome. I appreciate you being here and talking with me and the audience about AI voice and us, disrupting the sports industry. So, I appreciate you being here, Corey. Thank you.
[00:14:18] CH: Oh, the pleasure is all mine.
[00:14:20] MM: This has been another episode of Veritone’s Adventures in AI, a worldwide podcast that dives into the many ways technology and artificial intelligence is shaping our future for the better. Talk with you next time.
Director of Product Management for Veriverse, Veritone