 Okay, we're back here at Live at Strata Hadoop World. Day two of two-day, there's extensive coverage of SiliconANGLE.com, live in New York City for Strata Plus Hadoop World. This is the big data show, transformative show. I'm with Dave Vellante here. Joining me is Michelle Bailey, Chief Data Science of Stealth Startup VDP Finder, which we know a little bit about. Michelle, welcome back on theCUBE. Thanks, John. What are you seeing out here? You've had a chance to rub elbows with your fellow data scientists. We had a few data scientists on the show ourselves here at theCUBE, Jeff Hammerbocker. You had a chance to peruse the booths, packed house. Did you get an up-close view of anything? So there's a lot of, I think, differences here than what we saw, definitely at the HBaseCon, right? So, you know, if you even think back to April, I look at how different the market has become in just that sort of six-month time period. So, there's definitely a lot of developers here. There were a lot of people in the sessions that had their laptops open and they were programming, as though they were trying to watch what the presenter was saying, so there was a lot of that going on. Mashup camp. But I think there's a lot of suits here, too, right? And I think when I went up to the floor, the demarcation definitely is the analytic section from sort of what I would call the mechanics and the plumbing and the infrastructure section, right? So there's a lot of developers, I think, still tremendously interested in how they can evolve, Hadoop, how they can certainly look to a lot of real-time type of approaches with the data. So there's a lot of those guys still there, but I think what we're starting to see is really the beginning or the signal of the new application era for big data. And I think the step before that is analytics. And, you know, I feel the same way, right? I'm looking for anything I can get my hands on that really gets me to information, which really gets us to the decision and the action. So I've seen a really good mix today. I think there's a couple of companies I'm going to follow up with as a result of the conference. Digital Reasoning is going to be one of them. They had really a tremendous presentation at the end of yesterday. A guy called Rob Metcuff, not to be confused with Bob Metcuff. Yeah, Tim Estes, the CEO, he's been on theCUBE before. Yeah, he was really interested in what they were able to put forward. So they've got some really interesting approaches to machine learning around text mining that I think are probably way out in front of other things that I've seen so far. So that's something I really want to follow up on. And then I also want to follow up too with Platformer. I thought that they really looking forward to seeing what they have coming out just around data exploration. And, you know, that's an area that's really difficult, right? Just with the whole process of ETL right now and the length of time that's taking and just the ability to be able to get in and explore the data really quickly and get a sort of quick and dirty view of what's going on before I go on and do more in-depth analytics there. So they really stood out to me as a couple of companies to follow up with. And then what I also saw, which was really interesting, I thought some of the best presentations was around companies or governments that are doing work with big data. So the CTO of Chicago, the city of Chicago did a really, really good presentation. He talked a lot about how they're using geospatial technologies to improve the transit organization, how they're looking to leverage that data with their health department. Gave a really great example of how they were able to follow changes or outages in the transit system and being able to use Twitter data as more of a reliable and faster approach to understanding where the outages are in transport. And then in terms of healthcare or in the health department, being able to use Twitter as a way to reach out to people who've had food poisoning and be able to go out and then talk with the restaurants that were the culprits as part of that, which is really, really interesting ways that they've been using data. And for some of that, it's time sensitive and so real time is important. And then for a lot of other analytics, it's not time sensitive, right? And it's really much more about being able to demonstrate trends and build really good models and algorithms. What was the feeling on the floor? Obviously, besides being mega crowded, what's going on in your vibe there? What did you extract out of the noise there? What's the signal coming out of the show? So I think still a lot of, certainly a lot of activity, right? I think certainly what we're seeing is what I like to call is a lot of openness, right? So a lot of transparency about how imperfect everything is about how people are really looking to share, right? I think that's something really different about this community right now than other tech communities. And how a lot of people are actually learning from the mistakes more than they're actually learning from some of the successes at the moment. So there was a lot of that going on. Definitely a lot of interest around some of the big companies. Clearly, Qatar era had a lot of the action upstairs on the floor. But also SAP was up there. There was a lot of interest there and MAPR. And then when you went into the floor, I would say that there was more attention being paid to a lot of the smaller analytics startups that were up there, rather than necessarily infrastructure players in there. That was certainly the impression that I got as time went on. How about fellow data scientists? I don't know if you had a chance to collaborate with them. I'm sure you're running to some. But talk a little bit about what your crowd is dealing with, what the challenges are. What are the things that are most pressing that you're trying to solve? So our problems or our challenges are around text analytics. And pretty much the consensus that I took away from this conference was that there still aren't good solutions around there around text analytics for things like Twitter, for example. So there's been a lot of good work done around text analytics around more classic unstructured data, like email, for example, or documents, or things that we typically see in that realm. But Twitter remains a real challenge. It's a challenge for us as well, and solving some of the analytics problems around that. And then I think to some emphasis around real time, but I think much more around sort of data exploration and be able to get your hands around the data faster. And we saw a lot of attention being paid to tools that serve can navigate the whole ETL process. So you're going native, direct to Hadoop or HBase. So that seems to be the consensus right now, but it really depends on your point of view, right? If you're the data scientist that's more involved on the analytics side and the stats side, you've got a whole different range of problems than what we're seeing on the data scientists that I would consider more on the programming side and the comp side of things. And so I think what we're going to see is, the markets start to bifurcate a little bit, right? You've got data scientists that have very, very different problems. And that's where you see that in a conference like this, where you see such a mix of sort of younger comp side programmers here with the older, who are trying to answer sort of big, analytic questions. Nice crowd though. You got the young guns, the young guns here and you got the older dudes. Yep, I wasn't quite sure where I fit. A lot of tall guys, right? You notice, observation, all the big data, they're all like tall. Big data, people are big, like vertically big. Have you noticed that? That's true. Any other observations? Yeah, so something else that's going on that I think is going to be really interesting is I think who was describing Ken Richardson, I think was his name, but he had this whole approach around what he was calling data philanthropy. So using data to solve problems within communities or perhaps even in third world countries, he was from the UN organization. And he was really here to try and seek out people that can share their data sets. So maybe you go to Citimank and you get some information from Citimank that's been scrubbed, where obviously they maintain all the confidentiality of the data, but they can share that with the UN in ways in which they can use it perhaps in third world nations. So he talked a lot about being able to use Twitter to monitor what happens after an earthquake or being able to use Twitter for cholera outbreaks because as we know in areas like Africa, they're a lot more active on mobile devices and social media than what we even are in some of these developed countries. So he has this whole notion of data philanthropy where either you commit your time as a data scientist to a problem, you actually put together data sets. And then I saw that as well when I went to the presentation for the city of Chicago, they're also looking for people to get together and combine data sets to come up with ways to solve problems that can help at the local level. So I thought that was really pretty fascinating. And then I think some of the emerging things we're starting to hear about is things like regulatory, right? What does it mean if you're giving out these data sets or people's confidentiality, is the confidentiality around data really maintained? And so you saw some risk management type of tools out there that are really helping to solve those problems around maintaining individuals confidentiality within large data sets. Yeah, so the privacy, we were talking to Jeff Jonas of IBM about geospatial. He called geospatial super food. He's doing some Jeff Jonas's mad scientist for IBM, he's a great guy. So he was talking about how what the problem he's trying to focus on is to engineer privacy into the system. It's like security, right? Everybody says if you bolt it on, it's not going to be effective if you engineer it in. So those are some of the things that you haven't heard a lot about, but this year you're hearing a lot more of. And then of course you have the Senate, Senator Rockefeller going after Equifax and Experian and inquiring about the data sources that they're collecting. It's like such a little tiny piece of the puzzle. That's just going to, like you're hearing Sherman, it's going to explode. You know, how? I think something gets back to quality issues too, right? It's not just about confidentiality, but understanding quality. And a company is taking action from bad data. I think that at some point we should all be made aware of that, right? So again, what I've seen so far in this community is a lot of transparency, right? Sharing of the good and the bad and the ugly and being very, very open about that. And I think what we want to watch for is does that conversation get closed off? What about competition? What are you seeing about competition? Cause we were just talking with Carmella from Clear Story and she brings up this whole point that technology got us into this problem of data. So this is not like we're creating technology to solve a problem. It's a problem that needs to be solved and there's not enough engineers to solve the problem. Which that translates to definitely market opportunity. So there's more basically fruit on the trees and people that could actually eat them if you want to use that analogy of the entrepreneur's wealth opportunity. So like to create value. There's a lot of beach head for entrepreneurs to pick a position and be very lucrative and create a viable venture. So that changes the dynamics. There's no one's fighting over the same food. There's food for everyone. There's market for everyone. So what do you see now? So I think the problem is at the moment everybody's trying to measure value on how can we solve the data management problem, right? So clearly data, explosion data growth is something that's a very, very real problem for the enterprise at the moment. And what we've tended to see is that people that can come into their IT environment or IT org and solve that problem and sort of harness their hands around data. That's how they're selling the value of more of these big data solutions. I think when we need to go and where we're just starting to see this now is the values really around analytics, right? And taking your data, getting answers out of it and solving problems with it. And I think that's why you're starting to see different crowds start to come into this community right now. And that's really the right place to be able to assess value. I think what else is interesting and I've heard a little bit about this right now and I saw this in the virtualization market after the market had been moving for a couple of years was that we're starting to hear people talk about their big data environment at their company as being the environment that's actually got the attention right now and is being well managed, right? As opposed to the other environment where there's not as much attention at the moment. And a lot of that tends to still grow unattended. So I think that's something really to watch for too as we see the value story play out is that is it like virtualization where suddenly the virtual environment was where all the CIS admins were where they were spending all their time and we're actually taking care of the environment and driving things like higher availability. Once we start to see that coming into big data then you start to see much more co-opetition, right? With some of the more traditional data warehouse solutions. So that's definitely something that's just starting to emerge right now around the whole value notion. Okay, we are here inside the queue with Michelle Bailey. This is SiliconANGLE's coverage of Strata winding down day two. Great event, everyone's kind of packing their bags and ended the sessions. We'll be right back with our next guest after this short break.