by Kevin Schmidt, Christopher Phillips
Although you don't need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).Authors Kevin Schmidt and Christopher Phillips demonstrate best ...
Price: $28.58 | Publisher: O'Reilly Media | Release: 2013
by Mike Barlow
Five or six years ago, analysts working with big datasets made queries and got the results back overnight. The data world was revolutionized a few years ago when Hadoop and other tools made it possible to get the results from queries in minutes. But the revolution continues. Analysts now demand sub-second, near real-time query results. Fortunately, we have the tools to deliver them. This report examines too...
Publisher: O'Reilly Media | Release: 2013
Storm Real-time Processing Cookbook
by Quinton Anderson
Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.The bo...
Price: $29.99 | Publisher: Packt Publishing | Release: 2013
Fast Data Processing with Spark
by Holden Karau
Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and real-time analysis (Spark Streaming), it can be interactively used to ...
Price: $22.99 | Publisher: Packt Publishing | Release: 2013
Apache Accumulo for Developers
by Guomundur Jon Halldorsson
Accumulo is a sorted and distributed key/value store designed to handle large amounts of data. Being highly robust and scalable, its performance makes it ideal for real-time data storage. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift.Apache Accumulo for Developers is your guide to building an Accumulo cluster both as a single-node and m...
Price: $20.99 | Publisher: Packt Publishing | Release: 2013
Hadoop: The Definitive Guide, 3rd Edition
by Tom White
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form - the author's raw and unedited content - so you can take advantage of this content long before the book's official release. You'll also receive updates when significant changes are made. Ready to unleash the power of your massive dataset? With the latest edition of thi...
Price: $4.99 | Publisher: O'Reilly Media | Release: 2012
by Edward Capriolo, Dean Wampler, Jason Rutherglen
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQL dialect - HiveQL - to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem.This example-driven guide shows you how to set up and configure Hive in your environ...
Price: $24.98 | Publisher: O'Reilly Media | Release: 2012
by Eric Sammer
If you've been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configu...
Price: $4.96 | Publisher: O'Reilly Media | Release: 2012
by Donald Miner, Adam Shook
Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you're using.Each pattern is explained in context, with pitfalls and caveats clearly identified to he...
Price: $19.90 | Publisher: O'Reilly Media | Release: 2012
by Alex Holmes
Hadoop in Practice collects 85 battle-tested examples and presents them in a problem/solution format. It balances conceptual foundations with practical recipes for key problem areas like data ingress and egress, serialization, and LZO compression. You'll explore each technique step by step, learning how to build a specific solution along with the thinking that went into it. As a bonus, the book's ...
Price: $12.00 | Publisher: Manning | Release: 2012