Transcript – Unlocking Video, Audio, and Text Insights with Snowflake
Saurin Shah 00:00
Most organizations, they don't need insights from unstructured data and what we found in reality, it's like it's the exactly opposite. Because customers have a ton of unstructured data. And what they are starting to do now is actually crack open those files derive meaningful insights run machine learning on that an extract some information that can give them a competitive advantage.
Magen Mintchev 00:22
Veritone presents adventures in AI, the podcast that dives into the many ways artificial intelligence is shaping our future for the better. I'm your host, Magen Mintchev, and today we're going to hear from Saurin Shah, who's the Senior Product Manager at snowflake and also Trevor Jones, who's the Vice President of Technology partner management at veritone. Snowflake and veritone have partnered to bring best of breed data mining capabilities to audio, video and even text data. So snowflake users can now easily extract these difficult to mine data sources with veritone to digitally transform content centric processes, and uncover new data insights. And Saurin and Trevor will be discussing how modern business leaders are making better business decisions and scaling their business processes through insights from unstructured data. Let's have a listen.
Saurin Shah 01:20
Hello, everyone. My name is Saurin. I'm a product manager in snowflake. I've been with snowflake for close to about four years. And I've worked on a number of functionalities. But today we are going to talk about one of my favorite functionalities, which is unstructured data, which we just recently launched. And it became generally available about a few weeks ago. I'm super excited to talk here with Trevor about unstructured data and machine learning. And how do you unlock insights because there is there's a lot of insights that can be extracted from unstructured data. And we would love to see how
Trevor Jones 01:59
great first we'll define what we mean by unstructured data. And then we'll talk about how unstructured data is vast, how it's important and easier to mine than you might think. So But first, let's do some myth busting. What we have here is a list of myths that we pulled together talking to customers and prospects and the things that things that we've heard that, you know, we'd like to address. And so hopefully, we can give you some good sound bites, if you if you happen to run into these yourself. So we'll also see these kinds of emerges themes, you know, as we as we go through our talk. Sorry, would you like to take the first one?
Saurin Shah 02:37
Yeah, yeah, this is this is a very interesting one where like most, sometimes we have heard, like most organizations, they don't need insights from unstructured data. And what we found in reality, it's like it's the exactly opposite. Because customers have a ton of unstructured data. And we will see more in the presentation later on. But what they are starting to do now is actually crack open those files, derive meaningful insights, run machine learning on that, and extract some information that can give them a competitive advantage. And, and that has been working out great for some of the organizations, which have had a head start to do that. So So that's, that's an amazing thing that we have been seeing in the industry so far.
Trevor Jones 03:23
Great. So the next one we've got is mining unstructured data isn't ready for mass market adoption. Sure, that may have been true, you know, years ago, but when you have companies like snowflake, building unstructured data capabilities natively within their platform, you have companies like veritone, making the mining of unstructured data across, you know, video, audio and text, more declarative, more turnkey, and out of the box. You know, that's it, just that statement just just just isn't true anymore. The next one more data isn't always better. Sorry. Do you want to take this one?
Saurin Shah 03:57
Yeah, I mean, for us, I mean, we've been seeing, like, so many, like, more data is, the more the data you have, it's always better. It's, it's a challenge. Because the more data you have, you need to you need to store it, you need to monitor it, secure it govern it, you need to make sure that it's make made use of it. But the more data you have, it's always better. Because you who knows, like, you know, in future, you will have data scientists and analysts will extract amazing insights out of it. And that will be that will give you a new direction on you know, where to take your functionalities where to take your product or your business.
Trevor Jones 04:35
Right, right. And one thing that I've seen, it's interesting is the sort of a network compounding effect of adding data to any repository is that you know, any new data set that you do get adds value to the data you already have, and allows you to explore insights in new ways
Saurin Shah 04:51
Yeah, just one good example of that is the snowflake marketplace where now like, organizations have been doing like I have some data Let me take out some data from snowflake marketplace, join it with my data. And then voila, like I have a lot more insights with before, which I didn't have, or I can tap into a lot more insights after that.
Trevor Jones 05:13
Definitely. So the next one we've got is businesses need to reach match maximum analytical maturity before adopting AI and ML enable processes. And you know, I had to say on that one, that one makes me smile because you know, if you look at the the roadmaps, the ambitious roadmaps of, you know, of mining the structure, good organizations already have what's included as MDM, a master data management, and it's such an aspirational goal to master that when, you know, you may say, hey, this, this, this dependency I have is on the Holy Grail, which, you know, as you and I know, that they have data sitting on the shelf, it's unstructured that they're not making use of, and that's the low hanging fruit that really could not only help them get there, but it's, you know, it's right in front of them. So, you know, I just, I just think that dependency is often perceived, but it's, it's just not there. You want to take the last one.
Saurin Shah 06:01
Yeah, the last one is also very interesting. Some customers, some organizations we've heard saying, like, dealing with GDPR is just so hard, especially when it comes to unstructured data. And, and that's not true in in. I mean, there are challenges definitely, but there are a lot of technologies available, where if you want to search for unstructured data, if you want to search for Say my name, or get a weekend, there are technologies which can go and say like, show me all the images, which has sorted in it or show me all the documents which have drivers name in it. And then you can appropriately do all the GDPR processing that you want to do. So. So these things are these processes are becoming much and much simpler now. So moving
Trevor Jones 06:50
moving to the next topic here is the content is growing exponentially, right. So I think everyone understands that and but what we have here is just some examples of of just how vast this unstructured data landscape is. So for example, 500 hours of video are uploaded to YouTube every minute, right? That's 82 years of content per day. Now that one down here, which I like is 28 billion hours of media were watched on digital media platforms in 2020. Alone podcast which is, which is which is a major channel for Veritone. One, our AI powered ad agency, there were 2 million podcasts and 48 plus million podcast episodes in 2021.
Saurin Shah 07:28
And in terms of documents, like we have seen PDFs as one of the most common data exchange, document format, especially in the like the when it comes to unstructured text. If you think about any industry, healthcare, finance, retail, or any industry, PDFs are so common, especially when customers have documents, they scan it, they create a PDF, then they share it. And that's the most popular data exchange, I go here, the number two 2.5 trillion PDFs in the world, and they're still growing, they're still growing at a very alarming rate right now,
Trevor Jones 08:06
one of the things that we you know, we talk about all the time, it's just that, you know, the days of throwing bodies at, you know, tagging, metadata and just tagging, you know, watching videos and documenting what things are over, it's just the AI is, is required to manage to mind all of it, right. And so, you know, you look at all these, you know, linear TV and terrestrial radio, streaming audio and video, high res HD, you've got VR content. So with the emergence of, you know, of the metaverse, it's, you know, more content coming from that. And so, not to mention the PDFs. And so the what the, what, what the market is really demanding here from us, right, is AI and ML to be able to mind this, you know, talking a little bit about, you know, the competitive advantage of taking us making use of this data, here's a Deloitte study that was done with 2700 respondents, fairly recently, for, you know, line of business executives, as well as technology executives, and, and so, you know, AI translate to market leadership. So, you know, for, you know, 73% believe AI is very, are critically important to business today. Right? And not tomorrow, not three years from now, but but today, and 64%, that AI gives them an advantage over their competitors. So, you know, the ones that are leveraging are going to be the ones that win, it's just, you know, that simple, and it's 74%, that AI will be integrated into all of their enterprise applications within within three years. Right, which that that number surprised me, but you know, it's it, but it's here. Right. And, and so, and people are seeing that, and I think the, you know, some of those myths they were talking about earlier, you know, more data isn't better, or my company doesn't need this. I mean, this this data speaks for itself, right?
Saurin Shah 09:58
Yeah, absolutely. I mean, I would like to give one example, over here like I've been, I've been starting to use some newer insurance providers. And previously, I used to file claims. And now with the new insurance providers, like I just take a picture of what happened file a claim. And within seconds the claim gets approved. Like, it's, it's just mind mind boggling of how fast these companies are moving. And then that gives them a competitive advantage.
Trevor Jones 10:26
Definitely. And so kind of moving on to this as almost a moment of reflection, right. So let's, let's just think a moment about our own businesses, right, let's think about our companies. Right? And so what data assets do we have, right? So across, you know, you get video files, you have images, audio, emails, text, social media, and IoT sensor data, then you kind of start kind of thinking about, alright, well, what kind of questions can I answer, you know, by mining some of this data, right? So, you know, when does a face or an object or logo appear? You know, when is the keyword said, Who's speaking? Right? How does the customer feel? Right? What are they asking for? What's the quality of service we're providing? And then you you're looking at, okay, now, what do we do with that? Do we integrate that with our applications? Do we monitor results over real time? You can scale that up, you can scale it out? Right? And so I think that the real question here we should be asking ourselves is what data are we sitting on that we're blind to? And not making use of?
Saurin Shah 11:21
That's, that's a that's a good segue into this slide, like, what data are we sitting on? And, you know, deriving insights from data is not new. It has been done for years, right? years ago, there were table based data, data warehouses, just the data was in very structured columns, you buy a coffee from a coffee shop, and you get a record in one database, and then you run analytics on that. But it's no longer true, right? Like the evolution of the data happened. And then file based data, lakes and Lake houses. And table based data just combined together came into existence, where the data was not just a simple record, but it was a JSON, which had some nested fields and all that. And right now, as Trevor mentioned, like the data is everywhere, you just have to make sense of what all data is sitting on. Just to give you an example, let's say you are running a marketing analytics on on on, say, a group of people that their data may be the rows and columns that I was talking about their data may be semi structured, in files, or their data may be in their PDF documents, or images or videos that you may have about or customer call records that that you may have from their from the support organization. So it's really important to make sense that the data is now bottomless, you just have to collect everything together, and use all of that to run machine learning and derive insights from and that brings, you know, challenges, it's, it's hard to do that. One of the big challenges, what we saw in customers is that all of this is siloed, right? Because some are some data, as I mentioned, maybe with customer support, or some data, maybe we'll marketing organization, some data in, in the retail business unit of your organization. So to collect all of that, and to govern and put security and make sure that right, the right people have access to the data is extremely hard. And some of these are some of the challenges that we have seen. The other things is now that you have collected all that data and governed it, how do you create pipelines so that you can extract you can run machine learning on it, and you can extract meaningful information of that. So those are also one of the the one of the big challenges that we have seen.
Trevor Jones 13:45
Saurin Shah 16:29
Yeah, the topic extraction, when to which you brought about that's an important one slightly related to that is the call center logs and, you know, transcribing the call center logs and deriving sentiment analysis, like are my customers happy or not? That's another very important one, I think which some of our common customers are doing it
Trevor Jones 16:49
Definitely one of my favorites here is advertising analytics and attribution. Right. And oh, you know, one thing we're able to do is, as we ingest the terrestrial radio and podcast streams, we're able to look, we're able to actually measure and monitor measure the, the the impact of live and organic mentions, so not just the ads that were being placed, but when my brand is being spoken about, and what they're saying and the quality of those mentions, and actually be able to tie that back to performance. So So you know, as we were saying before the list goes on. So maybe maybe here we can, we can talk a little bit about the house a little bit about how the the landscape looks, the around around this, this unstructured data mining process, right. So you know, the top left there, you have, you know, veritone applications that are cognitively enabled, which also includes bespoke applications that are built upon the AI were stack, we also have what we call third party apps, which these are your line of business supporting applications, right from CRMs, like Salesforce, you know, ERP, robotic, Process automation, business, process, automation, and so on. And on the on the right you have is snowflake, right. And so snowflake in this ecosystem serves both as the backend structured data to support these applications, but also as your analytics platform, right, which are also going to be connected to your visualization tools, as well as to AI ml models for applying predictive analysis. And the list goes on and on the shareability of the data. And at the bottom, there's what you have is the is Veritone AI, where platform, right, and so the what we have here is the ability to orchestrate these AI ml models for unlocking all this data. And so, you know, down the left of the left we have is infusing that within applications. Right. And then down the right is we also have landing that metadata into into snowflake for further analysis.
Saurin Shah 18:44
Yeah, yeah, one of the key things over here is the pipelines. So if you remember, like one of the key challenges that I mentioned, was creating these complex pipelines to derive meaningful insights from that unstructured data. And with this integration, the pipelines become super easy. Just to give you an example, like, let's say you have 10,000 new new images that come in every day, snowflake will automatically pick up those 10,000 new images, they will send will send to wear it on wear it on we'll do the extraction process run machine learning on that, send the information back to snowflake, and then it will be stored on snowflake and it will be ready for your data analysts or data scientists to to play on. So that simplicity of the pipeline is actually very key. And this for this integration. Yeah, and all of this is possible mainly because of the support for unstructured data that we just recently added. Snowflake has already had structured and semi structured data support. But recently we just as I mentioned before, we just launched the the support for unstructured data. And what that includes is basically a five fold thing. Mainly you can store unstructured data in snowflake YOU CAN Capital Have all of that using directory tables. And it's all about basically, snowflake is all about centralizing all of your data and making sure that it's all governed well, and it's all stored in a single repository. So bringing that unstructured data and adding governance capabilities like role based access control, or security or scoped access is really powerful on this. The last thing, what I want to mention here is snowflake also has this marketplace where we focus on mobilizing the data from one organization to another. So secure sharing of unstructured data is a big piece of that of that functionality. And then the snowpark support for unstructured data is how we focus on processing that unstructured data and one part of that is using external functions and veritone integration comes that way. But then the another part of that is running Java UDF, and Python UDF. On processing that data. Perfect.
Trevor Jones 21:03
And moving on to, you know, we talked a little bit earlier about, you know, some of the use cases within Media and Entertainment, you know, advertising and, and just that these capabilities, you know, unlock use cases across, you know, every industry essentially. So, you know, we have here is a list of just some of our some of our favorites here, you know, so we talked a little bit about advertising for in terms of measurement of live and organic mentions. Also, you know, one thing we're working with is our advertising clients, and I hope to do a follow up series, you know, a follow up webinar on that as well to deep dive on that use case. But essentially, what we're able to do is leverage their investment into snowflake, create a borderless data situation where we can pull in that performance data, we can pull in that AI ml output together, and after an optimize and optimize campaigns more holistically, and in more real time. So we're seeing that the popularity for that pickup fast. The other the other item I mentioned, you know from before is contextual ad targeting, right? So it cookies going away, and, and the ability to be able to target ads, right? So if you think about, you're going to advertise on YouTube, right? You want to advertise on the right content, but you also want to advertise your brand in the right moments. Right. So maybe, you know, maybe a show about high school football, you know, if you're under armour, for example, you want your ad to show up during again, right. But if they're having a, you know, heartfelt, you know, dinner out with, you know, with their friends, or as a family, maybe that's where you want to place an Applebee's ad, right? So you can tell that the timing and placement of those will really enhance the efficacy of that ad. And as well as it'll make sure that your ad is being placed in a in the context that's safe for your brand. And, you know, sorry, I know you've seen, you know, some some use cases within healthcare.
Saurin Shah 22:54
Yeah. Within the healthcare also, there are very interesting use cases that we have seen, for example, some organizations take screenshots of description pains, or insurance cards, and now they want to run OCR and extract that text out of it, or to run some analytics on it. Doctor notes is also another one where you run machine learning on those Doctor nodes. And then you identify like, what are the most popular drugs that have been given by doctors? Or what are the most popular tests that have been done? On what kind of demographic so the use cases are just too many and very vast?
Trevor Jones 23:32
Yeah, and by now, and we know, one of my favorites, maybe the last one I'll mention is around conversational intelligence. One of the one of the reasons I really like this one is not only because the amount of actually recorded conversations with remote work, and all that just exploding and so there's just this mountain of content that a lot of companies are making use of, but it also kind of allows us to flex our capabilities in terms of breadth of you know, breadth of cognitive engines, right? So if you think about you know, analyzing a sales call or contact center, right, you can use you know, facial recognition to understand who's on the call to look at what the facial expressions are, you can use transcription and translation and talk about okay, what's being said in one of those words, right? You look you know, content extraction, where you can say, alright, well, what are they talking about? What features are they asking for? You know, you can you can even look at the sentiment, and is the customer that speaking happier at the beginning or the end of the call, right? And you want to see that your you know, your agents are doing a good job, you want to make sure that those interactions with your customers, whether it's sales or post sales, you want to make sure that those are those are reflecting well on your brand and you're learning as much as you can from them. In terms of next steps, so So now what right so contact veritone to discuss your unstructured needs, we have a QR code there. Download the ebook from the snowflake site on best practices around managing unstructured data veritone. Our offering is now live on the snowflake data marketplace. So for snowflake customers, be sure to check us out there. And also stay tuned for more webinars from Veritone and snowflake. I'd like to you know, we plan to do many more of these both talking about you know, use case deep dives with customers as well as talking about you know, technical deep dives on how on the how to Saren Always a pleasure. And I really appreciate you joining us today.
Saurin Shah 25:43
I'm here today with a really enjoyed this guy having this conversation and thank you.
Magen Mintchev 25:47
So there you have it. Now you have more insight into the valuable insights that are trapped within your text, audio and even video content. insights that can also help you create, manage and distribute and even monetize your content more effectively. Thank you to everyone out there for listening to adventures in AI the podcast that dives into the many ways artificial intelligence is shaping the future for the better. Talk with you next time.
Snowflake Senior Product Manager
Veritone VP Technology Partner Management