 From New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Jeff Frick. Welcome back to New York City, everybody. This is Women in Tech Wednesday here at Big Data NYC. We do Big Data NYC every year, concurrent with O'Reilly, Strata, plus Adoop World. It's Big Data Week in New York. We go all week. We started Monday with an evening on deep learning and AI, co-sponsored by NVIDIA. We had a party that night, last night. We had really an awesome afternoon. We did all day CUBE interviews, and then we did a segment, really focusing on machine learning and applying analytics and cognitive. IBM came on theCUBE prior to their big announcement last night, we'll talk about that in a second. And then we hosted a panel of eight data scientists, rockstar data scientists talking about things like, is there such a thing as a citizen data scientist? How should application development and data science teams work together? What is a data scientist? And then after that, we went over and listened to Bob Picciano and his team, Riddhika Gunna and others at IBM, announced their new platform for cognitive analytics. Really trying to simplify and up-level the complexities of doing big data analytics. And then of course, IBM had a big party. Jeff Frick, my co-host is here with me. We were on the ground interviewing people at the party, data scientists, practitioners, executives. It was good. Good night. Yeah, it was a great night. And we continue to try to innovate Dave and theCUBE and to have that round table, eight people up on the set. I was kind of, it's fun to see all the faces that match the Twitter handles that we see all the time. A lot of big influencers. You know, the only thing we didn't really cover, and I was trying to get you a question in there, is kind of what about, it's still kind of how to lie with statistics, right? You know, you can pretty much always find a number to support your hypothesis. And we really didn't get too much into that. We did get into, you know, if you've got a result that just looks too good to be true, you need to go back and check. But a lot of other things came out of it that I thought were good, especially on Women in Tech Wednesday, we had 25% of the folks on the panel were women. But talking about the softer skills that one of the panelists said you used to never come up in kind of a data science conversation. You need to have softer skills to have a broader kind of experience based in which to build your algorithms and test your algorithms, which I thought was pretty interesting. The other concept that keeps coming up over and over and over again is this concept of a data scientist. Is it a unicorn? And I think it was Craig Brown brought up that really the way to think about it, it's a team sport. It's like nothing else in business. You know, you need people that have different skills that they can bring to the table. So I thought that was a pretty interesting play and I thought the round table worked out pretty well, which we've never done before on theCUBE. Yeah, it worked out great. Afterwards, we were talking to, so it was interesting, there were two out of the eight panel members were women that just happened to be 25%, which was the number I threw out about the percent of chief data officers that happened to be women. I don't know if that's the right number or not, but it was just sort of a coincidence, but we were talking to both the women on the panel, Miriam Fridell, who's with elder research, does some really interesting high-end stuff, and Jennifer Shin, who works for Nielsen. We were talking to them afterwards at the party along with Des Blanchfield and what they were saying to Peter and myself, Peter Burris, is when they go to build models, they assume that the data's not going to be there. So Jennifer right now was doing some stuff with electronic medical records and trying to predict cause and effect of health incidents and likelihood of certain diseases, and so they go into building these models with the assumption that the data's not going to be there because they're real skeptics. And then the other thing, somebody said they had a great line the other day, if you torture data long enough, it'll give you the answer, right? And so one of the aspects of being a data scientist is really not giving up on the data. I keep looking at it from different angles, and so it was quite instructive, I thought, that panel. So today we got a big day today, more wall-to-wall coverage. We did 10, 12, 15 interviews yesterday. And so one of the other big themes is this whole Hadoop ecosystem is really shifting. We used to spend all the time talking about Cloudera, MapR, and Hortonworks, and those guys are sort of like infrastructure plumbing. And a big discussion yesterday about the cloud guys trying to go sort of create an abstraction layer above their sets of services, and in many respects creating threats to guys like Cloudera, because generally Cloudera and Hortonworks, and we'll explore this today with some of the folks from Hortonworks, do a lot of on-prem stuff. They're betting on hybrid cloud, and pure cloud is going to be an interesting dynamic for those guys to compete specifically with Amazon and Microsoft. I mean, they will participate in those solutions in the marketplaces, but increasingly customers are going to consume those capabilities as services for Amazon and others. The other kind of theme that keeps coming over and over and over is before big data kind of equal to Hadoop. And there were big data shows, and it was Hadoop World and Hadoop Summit, and we're hearing more and more that Hadoop is kind of a generic term now almost for the ecosystem around big data, and I thought what was really interesting, part of Pitchiano's thing last night is he said, you know, IBM is going to innovate at the speed of open source, and we hear over and over and over at DockerCon and all these different open source shows that you just can't compete with an engaged community on speed of innovation. And so for Bob to have called out specifically and then bring up all the open source projects that they're really actively involved with, and I think it came up in one of the interviews yesterday. They're the number one contributor, I might be mixing it up for the Spark project. It really shows a bold move and kind of changing the whole narrative around big data as Hadoop to big data and analytics, and again, building off the NVIDIA panels we had the night before is a much bigger play than just Hadoop. Yeah, I don't know what that actual number, so we heard two different quasi-conflicting stats yesterday. One of the IBM folks that were number two contributed to Spark, and then twice we heard I think from Riddica, and I heard it again yesterday from Bob Pitchiano with an announcement that they were number one, I think IBM is number one in contributions around machine learning or something like that. Number two, maybe overall, who knows, we'll try to unpack that. It really doesn't matter. They're putting a lot of resources into Spark, and it's interesting, Jeff. I mean, Hadoop, we talk about all the time, and Rob Hoef just wrote an excellent article on the Hadoop, big data hasn't lived up to its promises on siliconangle.com. We talk all the time about the complexity and the barriers of absorbing, and you heard some of the guests yesterday say, well, we're not experiencing that, and of course the vendor view versus the practitioner view is sometimes not aligned. Nonetheless, when Spark came to fore, what we saw is a lot of people said, okay, I couldn't deal with Hadoop complexity. I'm gonna replace or supplant a lot of the things that I'm doing with or had intended to do with Hadoop with Spark, because it's more integrated, it's simpler for us to use, and I'm gonna go pure Spark. Now, if people have investment in Hadoop, they've gotta obviously get a return on that asset, so they're gonna continue, and likely they have the skill sets, but there's no question that this bespoke sort of Hadoop movement has spawned a lot of innovation, but it's been a challenge for practitioners to bring all that together and actually deliver value, so that's something that we've been tracking and watching, and I suspect we haven't heard the end of this. Spark is not the end of the road. When you talk to people who come out of Google and other places that are early adopters of sort of big data technologies, they said years ago, oh, MapReduce, long gone, we're not doing that anymore. Hadoop, yeah, do use it a little bit, but we've moved on, and you talk to people now, they say, yeah, in memory, Spark, yeah, we did that, but there's other stuff going on, so you know how it is. When it happens in the hyperscale guys, five years later, it hits the enterprise. Right, and Armand Rees from IBM talked about this project, their data science experience, because there's so much open source going on all the time, there's so much development, and he called it Facebook for data scientists, which was pretty funny. It was a place to bring together, as we talk about all the time, content and community and engagement, so they can accelerate their learning, because if you're a data scientist, how do you keep up with all this stuff, and get your day job done, and be a contributor, so it's a really exciting time, a challenging time, a lot of dynamic pieces in play. IBM, again, had made an announcement yesterday, they called it DataWorks, and their whole thrust was, we're going to try to simplify, we are simplifying the complexity of doing analytics, and we're bringing Watson to bear, and they gave a demo, it was actually a pretty good demo of somebody who went on a website, they wanted to go camping, where do you typically go camping, Acadia, family, how many people, blah, blah, blah, I answer a bunch of questions like a wizard, and then it suggested, okay, these are the tents, this is the equipment you need, they brought up an offer, like 10% off this whatever gas stove, or whatever it was, and I was impressed at a couple of things, first of all, they're doing this with a line of code, doing some basic visualization, I think Peter Burris made the comment that the Viz was not totally where it should be, and I would agree with that, I think there's some work to be done in visualization, but I'm assuming that they can integrate with other visualization tools like Tableau and Click and others, but nonetheless, the speed with which they were able to configure that those answers, and present those answers to the consumer, was lightning fast, it looked like it was really simple to code, single line of code, to create functions that used to take hundreds of lines of code, and like I said, lightning fast, I think of those, the wine websites, and they're very rudimentary, and okay, you want red or white, cab or that, and you sort of do these pull downs, and it sort of chunk, chunk, chunk, chunk, and then spits out a bunch of wines, it looks like we're now entering the next generation of consumer sort of friendly assistance in terms of buying, so that's one example, it's a classic retail example, but there potentially are many, many others in healthcare and finance and fraud detection, and the list is endless, so the big question is, can IBM simplify that in software to the extent that you don't have to use expensive IBM services to pull this stuff off, and now that's been IBM's business model for years, is lead with services to minimize the complexity, pay us, we'll do it for you, but that's not a great scale model, and I think IBM is really trying to drive scale through software. Right, and then there's a whole another level that came up in the panel last night, which is why are you doing it all in a drop down menu? Eventually you'll be asking, you'll be asking the computer, the computer like you do now for directions with a natural language, so the whole natural language piece of it still has a long way to go, and then again to build on all these inferences, to build in, I think the example came up, what are my sales looking like this month? What are my sales going to look like next year? And to take that type of a verbal trigger into the compute is going to be, the feature is going to be crazy. So today we'll be having a lot of guests from the big data ecosystem, both the Hadoop and Spark ecosystems, some of the management vendors, we've got some other folks coming on, the technologists are coming on, maybe talking about some of the big picture trends that we've touched on here, the future of big data, the viability of the business. So we're going to have some interesting discussions throughout the day, we're going to be here all day today, Wednesday we'll be here as well, on Thursday we're at 37 pillars on 37th street, I think it's 514, 37th street, it's a John Furrier driver from Javits Center, just up 37th street, stop by, we got signs out, say hello, and Jeff and I and Peter Burris and George Gilbert will be going all day long, so appreciate your attention, appreciate you guys watching, you can tweet us, I'm at D. Volante, at theCUBE, check out theCUBE Gems, check out SiliconANGLE.tv for all the videos, John Furrier is actually down at Splunk.conf this week, he's covering that with John Wall, so we got that event going on, and we'll share with you where theCUBE is going to be next, so really appreciate you guys watching, keep right there, we're back with our next guest right after this short break.