by OreillyMedia 3,015 views
Roger Magoulas is the director of market research at O'Reilly Media. Magoulas runs a team that is building an open source analysis infrastucture and provides analysis services, including technology trend analysis, to business decision-makers at O'Reilly and beyond.
In this video Magoulas highlights noteworthy takeaways from his attendance at Strata + Hadoop World NYC 2012.
Now in its second year in New York, the O'Reilly Strata Conference explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World to create the largest gathering of the Apache Hadoop community in the world.
Strata brings together decision makers using the raw power of big data to drive business strategy, and practitioners who collect, analyze, and manipulate that data—particularly in the worlds of finance, media, and government.
by OreillyMedia 3,499 views
Society confronts enormous challenges today: How will we feed nine billion people? How can we diagnose and treat diseases better, and more cheaply? How will we produce more energy, more cleanly, than ever before?
Big questions like these demand new approaches, and "Big Data" is a crucial of the toolkit we will use over the coming years to answer them. New algorithms, applied to much more raw data than has ever been available before, will help professionals in almost every discipline make better, more informed decisions, and will guide research and policy toward better outcomes, faster.
Born in the consumer internet, the Apache Hadoop platform has, over the last six years, become a critical piece of infrastructure for government, commercial and research organizations that need to answer big questions using Big Data. In his opening keynote, Mike will explore some revolutionary use cases form his own experiences at Cloudera and will show how building applications within a broader community and ecosystem has vast implications for the speed and depth of innovation, helping humanity to ask bigger questions and gain bigger answers.
by OreillyMedia 1,653 views
Data science is a team sport. Collaboration inside and outside your organization is the ultimate Big Data technique. Success depends on having a collaboration platform and solving the number one problem of the Big Data era: the supply and demand for data scientists. Learn how you can take action today to accelerate the success of your data science efforts.
This keynote is sponsored by Greenplum, a division of EMC
by OreillyMedia 1,036 views
The End of the Data Warehouse
Hadoop is scalable, inexpensive and can store near-infinite amounts of data. But driving it requires exotic skills and hours of batch processing to answer straightforward questions. Learn how everything is about to change.
by OreillyMedia 1,294 views
Moneyball for New York City
New York City is a complex, thriving organism. Hear how data science has played a surprising and effective role in helping the city government provide services to over 8 million people, from preventing public safety catastrophes to improving New Yorkers' quality of life.
by OreillyMedia 635 views
Much of the heavy lifting involved with Big Data projects is accessing and preparing the data for analysis or what is often referred to as data integration. This can easily consume up to 80% of a big data development effort and yet too many developers resort to reinventing the wheel by hand-coding custom connectors, data parsers, and data integration transformations. Why not leverage a metadata-driven, codeless IDE with pre-built transformations and data quality rules so that custom development time can better be spent where it's truly needed? Codeless environments have proven to be up to 10 times more productive than hand coding, are less error prone and easier to maintain. The skills are already out there and available from in-house IT or system integrators making it possible to get your projects running and into production quickly.
This keynote is sponsored by Informatica
by OreillyMedia 849 views
The Composite Database
While moving away from single powerful servers, distributed databases still tend to be monolithic solutions. But e.g. key-value storage is rapidly becoming a commodity service, on which richer databases might be built. What are the implications?
by OreillyMedia 1,859 views
Big Data Direct -- The Era of Self-driven Big Data Exploration
In recent years, "Big Data" has matured from a vague description of massive corporate data to a household term that refers to not just volume but the diversity of data and velocity of change. Today, there's a wealth of data trapped in corporate data repositories, new platforms like Hadoop, a new generation of data marketplaces and volumes generated hourly on the Web. With the opportunity for key insights that these diverse data sources present, the business user's ability to get to the data when they need it and gleam fast insights has become a massive priority. In a nutshell, easing access and analysis of both private and public data is one of the biggest opportunities ahead. New approaches to enable self-driven exploration of private and public data are necessary and will help address the critical 'last mile' problem in big data. Big Data Direct discusses the opportunity ahead for business users to intuitively and easily harness the power of private and public data for deeper customer intelligence and to identify new business opportunities.
by OreillyMedia 847 views
Bringing the 'So What' to Big Data
The onset of the Big Data phenomenon has created a unique opportunity to improve the human condition, but the challenge ahead of us is to move beyond Big Data infrastructure to real, applied, and prioritized comprehension that is morally and practically useful. This requires redirecting our collective energies toward new algorithms, more distributed systems, and purer software architectures that more optimally exploit the infrastructure to answer questions of great social and personal value. Technologies that close the "Understanding Gap" can make great strides to prevent evil, reduce suffering, and create more actualized human potential. This pursuit is more than an opportunity- it is a key responsibility for the technology community today and through at least the next decade.
by OreillyMedia 1,756 views
The Human Face of Big Data
Over the past two decades, Rick Smolan, creator of the best selling "Day in the Life" books, has produced a series of ambitious global projects in collaboration with hundreds of the world's leading photographers, writers, and graphic designers. This year Smolan invited more than 100 journalists around the globe to explore the world of Big Data. The Human Face of Big Data captures, in fascinating photographs and moving essays, an extraordinary revolution sweeping, almost invisibly, through business, academia, government, healthcare, and everyday life. Big Data is already enabling us to provide a healthier life for our children, to provide our seniors with independence while keeping them safe, to help us conserve precious resources like water and energy, to peer into our own individual genetic makeup, to create new forms of life, and soon, many predict, to reengineer our own species... and we've barely scratched the surface.
by OreillyMedia 400 views
Hadoop: Thinking Big
Most organizations have limited their thinking about Hadoop. The use cases they pursue are narrow and have only scratched the surface on how to best improve business results and gain a competitive edge.
The truth is that there are just a few obstacles to overcome and a few changes in perspective, to realizing the full potential for Hadoop.
This session will provide insights into how the combination of scale, efficiency, and analytic flexibility creates the power to expand the applications for Hadoop to transform companies as well as entire industries.
This keynote is sponsored by MapR Technologies
by OreillyMedia 1,709 views
Hadoop started as an offline, batch-processing system. It made it practical to store and process much larger datasets than before. Subsequently, more interactive, online systems emerged, integrating with Hadoop. First among these was HBase, the key/value store. Now scalable interactive query engines are beginning to join the Hadoop ecosystem. Realtime is gradually becoming a viable peer to batch in big data.
by OreillyMedia 356 views
Cloud, Mobile and Big Data -- How Analytics Provides Value to the Buzzwords
In this rapid-fire keynote, we'll introduce how virtually every new technology trend is inextricably linked -- or should be to attain maximum leverage. We'll discuss how you can ride the Big Data wave by leveraging analytics to drive superior and faster decisions -- decisions that can lead to competitive advantage. We'll discuss how you can use technologies such as cloud and mobility to spread the value of analytics pervasively across your virtual organization, and how that positively impacts your employees, customers and partners.
This keynote is sponsored by SAS
by OreillyMedia 741 views
They Don't Teach You That In School
A fireside chat with Cathy O'Neil about why universities can't make data scientists. Lots of companies want to hire data scientists, and there aren't enough to go around. Some universities are adding data science graduate departments, but they're facing an uphill battle, thanks to a lack of good data for academics, political infighting, and scalability issues.
by OreillyMedia 662 views
From Traditional Database to Big Data Platform
You need more than a database 'hammer' for today's Big Data projects. Organizations need a 'data platform' providing integrated tools to capture, store, process and present data. Without it companies can achieve -- volume, velocity, or variety -- but not all three. Join us to learn the extreme capabilities needed to distill new business signals from big data.
This keynote is sponsored by SAP
by OreillyMedia 560 views
Are We Really Winning the Information Revolution?
Samantha Ravich, former National Security Advisor to Vice President Richard Cheney, will discuss the challenges that face strategic decision makers from the wealth of data now provided by advances in technology.
by OreillyMedia 906 views
Of Rocket Ships and Washing Machines: Data Technology for People
The story of Big Data technology has centered on engines, algorithms, and statistical methods for data analysis. Less has been said -and too little has been done-regarding technology to improve the lives of data analysts. In this talk I'll highlight recent research from Berkeley and Stanford targeted at improving productivity across the data lifecycle, using technology to address the scarcest resource in Big Data: people.
by OreillyMedia 1,123 views
Josh Patterson, Principal Solutions Architect, Cloudera
by OreillyMedia 266 views
Tom Phillips, CEO, m6d
by OreillyMedia 206 views
Steve Mardenfeld, Data Engineer, Etsy
by OreillyMedia 328 views
Bitsy Bentley, Director of Data Visualization, GfK Custom Research
by OreillyMedia 175 views
Jesper Andersen, Founder, Bloom Studios
by OreillyMedia 424 views
David Boyle, SVP Insight, EMI Group
by OreillyMedia 221 views
Datacratic is a software company that applies machine learning and predictive modeling to real-time consumer behavior.
by OreillyMedia 180 views
NGDATA is the consumer intelligence company that empowers enterprises seeking greater customer lifetime value by enabling deep customer insights, personalized product offers and intimate customer experience to drive sales, and increase customer loyalty with a unique combination of interactive Big Data management and machine learning technologies in one integrated solution.
by OreillyMedia 127 views
At Red Carp Studio, we are inspired to have the same quality so we can help people and organization to reach their full potential.
What began as a better way to build a team—honesty, transparency, collaboration—soon shifted the focus on our customers, how to help them to succeed. We build simple and beautiful solutions that help people to connect, analyze, and decide.
by OreillyMedia 176 views
Privacy Analytics Inc. is a 2007 spin-off from the Electronic Health Information Laboratory (EHIL), one of the leading scientific laboratories working in the area of re-identification risk assessment and de-identification. Our team has accomplished scientists working with an experienced management team to make de-identification simple.
by OreillyMedia 117 views
How much time did you spend to figure out the size of a market opportunity?
Ever needed facts to better understand climate change?
Ever wanted to compare a company's sales evolution with a country's economic growth?
Why purchase a whole report when all you need is the page?
Video, music, images and news are so easily searchable on the web.
by OreillyMedia 129 views
by OreillyMedia 439 views
by OreillyMedia 100 views
by OreillyMedia 78 views
by OreillyMedia 142 views
by OreillyMedia 153 views
by OreillyMedia 277 views
by OreillyMedia 328 views
by OreillyMedia 326 views
Ilya Grigorik built a system hat lets him efficiently track GitHub projects. He worked with GitHub to archive public GitHub activity, and he then made that data available in raw form and through Google BigQuery. He discusses his project and its surprising results in this interview.
Related story: http://oreil.ly/RzxUVE
by OreillyMedia 121 views
by OreillyMedia 332 views
by OreillyMedia 130 views
Hjalmar Gislason,CEO, DataMarket
Hjalmar is a serial entrepreneur, founder of four startups in the gaming, mobile and web sectors since 1996. His company, DataMarket, provides information companies with tools to effectively publish their data and reach new audiences. Their data portal, DataMarket.com, may be the largest collection of open statistics and numerical data available online. DataMarket is based largely on Hjalmar's vision of the need for a global exchange for such data.
by OreillyMedia 274 views
by OreillyMedia 299 views
by OreillyMedia 88 views
by OreillyMedia 236 views
by OreillyMedia 214 views
by OreillyMedia 342 views
by OreillyMedia 68 views
Justin is a member of the Entities/Data Science team at Facebook where he helps curate and build from their rich structured object and social graphs, with a focus on location. Before joining Facebook, Justin ran the Data team at foursquare. In addition to building their core data-driven products Explore and Radar, he built a team from the ground up that consisted of Engineers and Data Scientists to solve large scale data problems as foursquare's dataset grew from half a million check-ins to over 1.5 billion. Before that, Justin worked at a hedge fund as a quantitative analyst, building custom portfolios for their asset management division and doing modeling and analysis for their risk team, specializing in high-frequency, derivatives, and commodities trading. Prior to that, he worked for Bear Stearns as a Vice President in their fixed income analyst group, building applications and models to help value agency pass-thru securities and building loan-level pricing applications and models. Justin holds a BS in Computer Science with a minor in Mathematics from the University of Rochester and has studied graduate-level Math and Computer Science at Columbia University. He is constantly chasing the biggest and most interesting datasets and trying to make amazing things happen with them.
by OreillyMedia 57 views
Director of Product Marketing
Hortonworks is a leading commercial vendor of Apache Hadoop, the preeminent open source platform for storing, managing and analyzing big data. Our distribution, Hortonworks Data Platform powered by Apache Hadoop, provides an open and stable foundation for enterprises and a growing ecosystem to build and deploy big data solutions. Hortonworks is the trusted source for information on Hadoop, and together with the Apache community, Hortonworks is making Hadoop more robust and easier to install, manage and use. Hortonworks provides unmatched technical support, training and certification programs for enterprises, systems integrators, and technology vendors. For more information, visit www.hortonworks.com.
by OreillyMedia 153 views
Vice President Technology
Versant Corporation (Nasdaq:VSNT) is an industry leader in building specialized NoSQL data management systems to enable the real-time enterprise. Using the Versant Database Engine, enterprises can handle complex information in environments that demand high performance, concurrency, and availability, significantly cut hardware and administration costs, speed and simplify development, and deliver products with a strong competitive edge. Versant's solutions are deployed in over 150,000 installations across a wide array of industries, including telecommunications, energy, financial services, transportation, manufacturing, and defense. For more than 20 years, Versant has been a trusted partner of Global 2000 companies such as Ericsson, Verizon, Siemens, and Financial Times, as well as the U.S. Government.
by OreillyMedia 81 views
Chief Executive Officer
Karmasphere, the leader in Big Data Intelligence, equips companies to unlock the power of Big Data in Hadoop, opening up a whole new world of possibilities to add value to their businesses. Karmasphere makes unparalleled collaboration possible for Big Data Analytics teams so they can discover and share new patterns, relationships, drivers and insights for any multi-structured data in a variety of Hadoop distributions including: Amazon Elastic Map Reduce, Cloudera, HortonWorks, IBM, and MapR Technologies.
by OreillyMedia 76 views
Vice President, Corporate Marketing
Actian Corporation enables organizations to transform big data into business value with data management solutions to transact, analyze, and take automated action across their business operations. Actian helps 10,000 customers worldwide take action on their big data with Action Apps, Vectorwise and Ingres. Vectorwise has pioneere, high-performance Hadoop analytics with leading social media companies such as Badoo, IsCool and NK. The Actian Hadoop Data Connector provides a rapidly implemented, robust and scalable solution that saves time and eliminates the risk of relying on custom integration code to support mission critical big data analytic applications.
by OreillyMedia 105 views
Chief Executive Officer
Digital Reasoning solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data. Our flagship product, Synthesys™, delivers data analytic solutions based on our distinctive mathematical approach to understanding natural language. With tight integration of Hadoop and Cassandra, Synthesys™ brings unstructured data analytics to unprecedented scalability. The power of Synthesys™ is not only the ability to leverage an organization's existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts.
by OreillyMedia 99 views
Rackspace Hosting is the world's leading specialist in the hosting and cloud computing industry, and the founder of OpenStack, an open source cloud platform. The San Antonio-based company provides Fanatical Support to its customers, across a portfolio of IT services, including Managed Hosting and Cloud Computing. In 2010, Rackspace was recognized by Bloomberg BusinessWeek as a Top 100 Performing Technology Company and listed on the InformationWeek 500 as one of the nation's most innovative users of business technology. The company was also positioned in the Leaders Quadrant by Gartner Inc. in the '2010 Magic Quadrant for Cloud Infrastructure as a Service and Web Hosting.' For more information, visit www.rackspace.com.
by OreillyMedia 58 views
VP Marketing, Business Development
Kognitio is driving the convergence of Big Data, in-memory analytics and cloud computing. Having delivered the first in-memory analytical platform in 1989, it was designed from the ground up to provide the highest amount of scalable compute power to allow rapid execution of complex analytical queries without the administrative overhead of manipulating data. Kognitio software runs on industry-standard x86 servers, or as an appliance, or in Kognitio Cloud, a ready-to-use analytical platform. Kognitio Cloud is a secure, private or public cloud Platform-as-a-Service (PaaS), leveraging the cloud computing model to make the Kognitio Analytical Platform available on a subscription basis. www.kognitio.com
by OreillyMedia 108 views
CEO and Founder
Datameer offers the first data analytics application built natively on Hadoop that helps end users access, analyze and visualize data of any type, size, or source. Founded by Hadoop veterans in 2009, Datameer provides unparalleled access to data with minimal IT resources. Datameer scales from a laptop to thousands of nodes and is available for all major Hadoop distributions including Apache, Cloudera, EMC, Hortonworks, IBM, MapR, Yahoo!, Amazon and Microsoft Azure.
by OreillyMedia 76 views
Alteryx provides indispensable analytic solutions for enterprise companies making critical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is a desktop-to-cloud Agile BI and analytics solution designed for Data Artisans and business leaders that brings together the market knowledge, location insight, and business intelligence today's organizations require. For more than a decade, Alteryx has enabled strategic planning executives to identify and seize market opportunities, outsmart their competitors, and drive more revenue. Customers like Experian Marketing Services and McDonald's rely on Alteryx daily for their most important decisions. Headquartered in Irvine, California, and with key offices in Boulder and in Silicon Valley, CA, Alteryx empowers 250+ customers and 200,000+ users worldwide. Get inspired today at www.alteryx.com or call 1-888-836-4274.
by OreillyMedia 62 views
Dell Inc. listens to customers and delivers innovative technology and services that give them the power to do more. Our cloud and big data solutions integrate best-in-class software, hardware and services, and enable IT efficiency, organizational effectiveness and innovation. www.Dell.com/Hadoop, www.Dell.com/PowerEdgeC
by OreillyMedia 48 views
Lord of the 1s and 0s
Pentaho is building the future of business analytics. Pentaho's open source heritage drives our continued innovation in a modern, integrated, embeddable platform built for accessing all data sources. With support for all of the leading Hadoop distributions, NoSQL databases and high performance analytic databases, Pentaho provides the broadest support for big data analytics, as well as integration and orchestration of big data and traditional sources. For more information visit pentaho.com (http://www.pentaho.com/big-data/) or call +1 866-660-7555.
by OreillyMedia 73 views
VP of Product Strategy
Cleversafe has created a breakthrough technology that solves petabyte and beyond big data storage problems. This solution drives up to 90% of the storage costs out of the business while enabling secure and reliable global access and collaboration. Cleversafe is combining the power of Cleversafe's Dispersed Storage® System with Hadoop MapReduce on the same platform. This solution replaces the Hadoop Distributed File System (HDFS) which relies on 3 copies to protect data thereby significantly improving reliability and allowing analytics at a scale previously unattainable. Many of the world's largest data repositories rely on Cleversafe for limitless data storage.
by OreillyMedia 163 views
Joshua Sullivan, VP, Booz Allen Hamilton
Booz Allen Hamilton has been at the forefront of strategy and technology consulting for nearly a century. Today, the firm provides services primarily to the US government in defense, intelligence, and civil markets, and to major corporations, institutions, and not-for-profit organizations. Booz Allen offers clients deep functional knowledge spanning strategy and organization, engineering and operations, technology, and analytics-which it combines with specialized expertise in clients' mission and domain areas to help solve their toughest problems. Booz Allen is headquartered in McLean, Virginia, employs more than 25,000 people, and had revenue of $5.59 billion for the 12 months ended March 31, 2011. To learn more, visit www.boozallen.com. (NYSE: BAH)
by OreillyMedia 199 views
Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their data. Cloudera's Distribution including Apache Hadoop (CDH), available for free at www.cloudera.com/downloads, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Support and a portfolio of software including Cloudera Management Suite. Cloudera also offers consulting services, training and certification on Apache technologies. www.cloudera.com
by OreillyMedia 120 views
Director- Web Intelligence, Product Marketing
Splunk® Inc. provides the engine for machine data™. Splunk software collects, indexes and harnesses the machine-generated big data coming from the websites, applications, servers, networks and mobile devices that power business. Splunk software enables organizations to monitor, search, analyze, visualize and act on massive streams of real-time and historical machine data. More than 4,400 enterprises, universities, government agencies and service providers in more than 80 countries use Splunk Enterprise to gain operational intelligence that deepens business and customer understanding, improves service and uptime, reduces cost and mitigates cyber-security risk. To learn more please visit www.splunk.com/company.
by OreillyMedia 812 views
Steve Francia, Chief Evangelist at 10gen, talks about alternatives to Hadoop, and what we can expect to see from MongoDB in the future.
by OreillyMedia 5,354 views
Explore the changes brought to technology and business by big data, data science, and pervasive computing with this complete video compilation of workshops and sessions from Strata Conference New York and Hadoop World 2012. With well over 100 hours of content, this video package includes the latest information on the skills, tools, and technologies you need to make data work—and build a data-driven business.
by OreillyMedia 82 views
New York City is home to data innovation across many different industries and sectors. This panel will bring together a diverse group of experts to provide a cross section of the exciting data work being done in NYC, and discuss the history and future of data innovation in their respective areas. The industries represented include: beauty/fashion, development, digital/creative, e-commerce, entertainment/media, government, and science/bio-tech.