 Hi, this is Yoastapil Bhartiya and welcome to another episode of T3M, our topic of this month. And the topic of this month is data. And today we have with us Justin Coppett, product marketing manager at Akamai, Justin is here to help you on the show. Hey, thanks. It's great to be here. Yeah. And as I said, the focus is on data. But especially today, we are going to talk about big data versus small and wide data. We all know what is big data, but you know, a lot of folks, they're like, what is small data? Is it a small data, just a real data? So before we jump into a lot of use cases, where they're going, I would love to just understand when we look at data in general, of course, Linode, Akamai, you folks, predate a lot of these cloud-related modern technologies. So you have seen the evolution of data itself, your own career graph. If you look at, if you reflect on how you have seen the data evolve, because now we are not only consuming massive data, we are also generating. And data is not just dumb data, we have to extract a lot of value. AI, ML is there, machine learning is there, generative AI is there. So talk about how you have seen the evolution of data in general. It was back in, I believe, 2017 that the economist had a very poignant quote that says something about data is the new oil. And I actually had to go back and look at that, because in my head, that probably was around the mid-2000s that I heard that quote. I was really surprised to see it in 2017, because it's the mid-2000s where things started to really change. So of course, my background is in the cloud and the cloud industry. And that started to balloon in those mid-2000s because hypervisors and really things like object storage started to become more commonplace and we were able to sort of expand your infrastructure to accommodate the acquisition of big data. But then we have Web 2.0 kicking in the full gear. People of beyond 30 are going to, of course, remember the early days of the internet, things were a little strange. They were endearing, they were quirky, but they were not looking anything like they look today, where you're interacting with everything, you're commenting on everything, you're liking everything. The amount of data in the mid-2000s started to absolutely balloon. It started to absolutely skyrocket. And that's before we even started to get to smartphones. The amount of people with internet access, the amount of people with downloading apps that are some maliciously some not, they're always tracking everything that you do. We are not in any shortage of data, which is why that I always thought that I heard that quote, I remembered it. I always thought it was a little older because by 2017, we had IoT devices. We have wearables. My wife wears a garment that knows how much she sleeps. You have companies like Target that can look at your receipts, especially if you're using one of their promotional apps, they can start to tell you what types of things you purchase and recommend. Other purchases beyond that, there's the classic story of a father who got stuff in the mail addressed to his daughter about pregnancy. And I said, she's in high school, of course she's not pregnant, but they knew based on her shopping habits, they predicted it and he later apologized and said, yes, she's due in September. So the amount of data has skyrocketed the variety. One of the V's in big data is the variety. In the variety of data, we are at no shortage for being able to find information and you can really, it's up to your choice. And what to do with it is becoming more of a challenge than where to get it. Thanks, and as you're talking about some of these use cases, can you also talk about when you look at the whole AI, AML, of course, Genetic AI is there? Traditionally, of course, these, I mean, that's what it is. Data is the fuel that powers these things, without data, none of this actually matters. Talk a bit about how organizations are kind of leveraging some of these modern technologies and data so that they can once again, as you gave an example of that, I mean, there are so many different examples so that, because data itself has no value, you have to extract value from the data. So talk about how the companies are using different technologies to train and models, extract value from the data that you're collecting. So the biggest example that we all know about at this point is, of course, JET GPT and their learning models. And that is an interesting one because the amount of data they needed was, it's only in the gigabytes, I think, which is somewhere around 300 to 400 gigabytes, but that is text. So the amount of text that they had to feed this model and require a significant amount of memory and computational power, so more so than storage. So again, that's where it comes back to depending on what you're doing. But in the case of anything with text, you're still looking at gigabytes worth of data and finding it from a variety of sources to be able to train your language models to be able to respond to human. There's countless billions of different parameters. There is something along the lines of 175 billion parameters. And it was actually 570 gigabytes of data from books, sources, Wikipedia, websites, articles. One of the big things going on right now is there's kerfuffle at Reddit. People are very upset with some changes to the site. And one of the reasons is they are going to be using that site more to sell their information to training large language models. They want a piece of that. That's one of the reasons their API has changed. So that's one example. And we look into things like images, medical imaging, being able to process thousands and thousands and thousands of images, charts to be able to make a diagnosis, to be able to say this is or is not cancerous. And right now it's being able to help doctors and ideally not exclusively gone without their view. But those are the types of things that we're starting to see so far. Thanks for explaining that. Now I want to talk about like change gear and talk about something totally different. As you mentioned earlier, was which is small and wide data. As Gordon predicted by 2025, 20% of organizations will shift their focus from big data to small and wide data. I mean, of course, you all know what big data is, but a small and wide data is not very well understood. And you have also written about it. So I would like to understand from you what is a small and wide data and how different it is from big data. And then we'll talk about why and where it's going to see adoption. Sure. Yeah, it's something that's very interesting. It's becoming more of a philosophy. And there are some models that have to change to be able to accommodate this. But they're also two separate things. So we have small data, which is, of course, smaller than larger data. And we're usually talking about rule of thumb data sets with less than a thousand rows or columns and easily, easily described. That's megabytes over exabytes of data. And that's where you focus more on analytical techniques that look for useful information. So there's always more of a human element here and wide data, which, you know, these are different, but they're often used the same. It's about the variety of sources. And that's, that's where we find a good example like in retail, where you may have hundreds of thousands of customers. And maybe some of them have only shopped at your store once or twice. So you have a wide amount of sources, but the amount of data you might have from each individual one could be very small. And you're able to analyze their shopping patterns based on that. That's one example. The area can kind of start to blur. So we can see at that point where combining the two, when we're looking at larger groups of data sources with less data in each source, and quite frankly, spending a bit more time with what we have. And these can even be things like meeting notes, surveys or seats, or even medical charts. A study run by Harvard Business Review showed that having their coders actually work to train the AI on a small set actually improved larger models to begin with, so the two can work in tandem. And that's one of the interesting things that we're seeing here right now when companies are being able to get more uses in that. What kind of adoption, what kind of industries, what kind of companies are leveraging small and wide data? There's two ways of looking at it. There are companies that are looking to get into the data game, but they quite frankly just don't have the budget, the capacity to do that. It's the infrastructure costs are incredibly high, as well as the expertise that you need to do that, not to say that small and wide data analysts doesn't require expertise as well. It's a different set with some overlap. But that's one of the biggest ones, because people want to be involved and they want to get into this industry without having sort of that highest of high thresholds, where it has to be the largest of large enterprises like something like IBM with insane amounts of computational power with their hand that can be able to do it. They want to be able to make their own predictions. And the other side of that is you can use pre-trained language models that have been compiled with deep learning, and then you can use something like ImageNet, which is essentially an open repository of images that are labeled and structured in a way that are friendly to use to do things like they're called zero-point or, excuse me, zero shots and one-shot models that can take that. And instead of retraining a whole model to identify something like a new object or a new animal, you can only feed it maybe 50 images. And based on the previous one, it's able to actually make those jumps and predict something as entirely new that wasn't in the original model in the first place without rerunning an entire thing. So small and wide data is actually helping big data in addition to helping enterprises and small, medium businesses that just otherwise could not be in the game at all. So if you look at a small and wide data, it's not kind of going to be a replacement of big data. Is it going to complement it or is it filling a lot of gaps, as you said, when it comes to big data? It's going to fill gaps. It's going to complement it. And it is going to allow people who have never been able to even sniff the industry in the first place to get involved. And some really good use cases for that that are filling gaps are like forecasting natural hazards for events that rarely occur, where little to no data exists, which could also be predicting the risk of disease in a population that does not have health records or a rare disease where you just don't have a lot of information on it. These are worse, the better use of small data and the better analytical processes and philosophies that can go into that are going to start to really fill in those gaps and give us a full picture and a lot more tools to work with. Now, if you look at the current situation, of course, it was COVID pandemic, a war. Companies are becoming cost efficient. Teams are getting smaller and smaller. It's already hard to find data scientists, but to leverage. And then in the early days, we used to say, hey, it doesn't matter what you do, you have to be a software company to be able to run a business in modern days. Similarly, it's going to be the case with data that if you don't have a data strategy in place, you will not survive. So talk a bit about as the explosion of AI ML is going on there. Talk about the role small and wide data will play where companies once again, they have to be cost efficient. Unlike earlier days, they cannot just put everything on the cloud without worrying about the bills there. So they have to become efficient with everything. So talk about once again, the role of small and wide data to make once again, not only efficient, but also help developers to speed things up and also bring new services to the users. That's actually what matters for businesses. Yeah, absolutely. And it's not going to be an issue where the amount of expertise required to derive insights from this is going to sort of, it's not going to diminish. It will change a little bit and you might be able to see fewer and less be able to do a lot more with some of the existing tools because the tools are evolving. There are language models that have already been processed that you can buy or even obtain for free or they're databases or repositories like ImageNet that have a lot that you can use. They're free to developers and researchers and that's when you can get a smaller group that does something like that, zero shot or one shots modeling. They can essentially say, okay, we only need to add 50 to 60 new images here to actually get this to work. And it's already been pre-trained. So the computational use and your GPU usage goes down very much, your storage uses is next to nothing compared to anything for big data. So we're starting to see a shift in the way people think about it instead of just the blunt as much data as absolutely possible. And that's a big reason for, that's a big reason this is so attractive but we're also again seeing those gaps that are being filled for areas where we quite frankly just don't have the large data. And we're also looking at this way where it's the data is out there and it is not always cheap but it's not always good. It's always going to be very, it's always unlabeled. There's just more than most people will know what to do with out there because we're generating exobytes of data which is an insane amount to think about. There was a study that's in 2007, we had about six, well before 2007 we had about 16 exobytes of data where then we went to about 300 in 2007. So 16 before that and then up to 300. And one exabyte is 1 billion gigabytes. And now we're up to almost 7,000 as of 2022. So the amount of information out there that we're generating I think is something like most of the data that exists right now was generated within the last few years. It's just scary to think about. So it's really important to not just think about getting as much data as you can. You have to think about getting the best data, having the best set on how you want to use that and how you can use it efficiently with different models like zero shot and one shot to really get the most out of it. And there's a human element. I think ironically, I think we're gonna see more of a human element in this. People are saying, no, AI is gonna do all of it. Well, AI is doing a lot of the grunt work to provide those deep learning models that are then available for smaller teams to use to provide less information to tweak, to adjust, to add a little bit more. And they're getting a lot more bang for the buck out of that and they're being able to make a bigger difference in their projects and what they're doing. Excellent. Excellent. Again, thank you. Yeah, I always hear about that technology is going to replace, but look at the tech sector. I mean, this is one of the biggest job creators and the industry is so big and it needs more people. So it is not replacing. It will create more jobs. It just, there will always be fear. All you have to do is keep up. One last question before we wrap this up is since you talked about the people parallel, when we look at, whether we look at big data, small data, wide data technologies or there, I want to understand when we look at the whole shift left movement, which is more about security and things that are moving the developer's pipeline, what kind of trends, from the point of view of people or culture, you are saying when it comes to data because companies, once again, as they have embraced the whole depth of kind of pathologies, they should look at data also from different perspective versus silos. It used to be in the old time. That's an interesting kind of conundrum because when you talk about getting more, getting a larger variety of sources, especially for small businesses, that's when you can get into things like meeting notes, spreadsheets, all that kind of stuff to be able to make these models work. But then again, you're also, is that intrusive? At what point is it okay to derive from that wide of a group of sources? When does it get a little strange? So in some cases, I think transparency is very important to make sure you know what you're doing with your, what people's data is going to do within your company. And of course, it's going to vary so much by industry. If you're a big data company or even an analytical firm that works with smaller data, that's a little different. But for example, the medical coding study, I read the nurses and the nurse practitioners were working along with these models to be able to sort of help train them. There's a human element behinds even the largest, most sophisticated AI models right now where we have to show essentially when the systems are doing well, there is a point where it is reinforced where we have to reinforce to say your data is good. Please keep doing that. It's able to adjust its algorithm accordingly. So I think data can be siloed more for big data but for small and wide data, that's when it's going to be more, a little bit from everyone to contribute and keep moving things forward. This is a topic which I can see to talk all day but do you think that from just the big data, small data, wide data, a bit of history, evolution and where things are moving? Do you think we have captured some key point today? What are your thoughts? It's important to understand that while, as you mentioned before, a lot of people think that AI means less jobs and in some areas it doesn't, some areas it doesn't, that's historically how things have gone. I will not be so brave as to predict the future of employment but I do know that getting more out of the data you have will require more of a human element. How many jobs that translates to, I certainly don't see it being in that negative, that's for sure. And it's becoming more aligned of people who know what to do with it and what to do with the data they're getting instead of just saying I need to be the cookie monster for data just to get as much as absolutely possible and just spam it into our systems. It doesn't work for, it only works for the largest of the large companies. It doesn't make a lot of sense and you're getting a lot of stuff that is not terribly useful to you. So in that sense, I think we are gonna see more of a human element. I think we are gonna see this boom in terms of meeting people that are at least AI-friendly is the word you say, someone who uses these things as a tool and can work with them. I think about Google search, search engines in general, I go back to a time where I used Lycos once upon a time and that's a tool to find information. Just like an AI chat tool today, you don't just take that and say this is factual, I Google and find and research articles and I get them from sources and I validate them. And people who are gonna be able to work with things like chat, GTP are going to be in a better place and they're gonna be quite frankly way more efficient than they would have been beforehand. That's same with deriving information from big data. People who are gonna be able to be like those medical coders that I mentioned and help to train these models and then use them to improve their jobs are gonna be the future. Justin, thank you so much for taking time out today and of course talk about big data, small data and wide data and I would love to have you back on the show, thank you. Excellent, thank you, very happy to be here and to talk about this amazing subject.