Practical Enterprise Data Lake Insights
by Saurabh Gupta, Venkata Giri
Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues.When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go t...
Price: $24.14 | Publisher: Apress | Release: 2018
by Hien Luu
Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover resilient distributed datasets (RDDs); use Spark SQL for structured ...
Price: $25.33 | Publisher: Apress | Release: 2018
Block Trace Analysis and Storage System Optimization
by Jun Xu
Understand the fundamental factors of data storage system performance and master an essential analytical skill using block trace via applications such as MATLAB and Python tools. You will increase your productivity and learn the best techniques for doing specific tasks (such as analyzing the IO pattern in a quantitative way, identifying the storage system bottleneck, and designing the cache policy).In the n...
Price: $31.32 | Publisher: Apress | Release: 2018
by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart
Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to choose solutions using the least common denominator - either fast analytics at the cost of slow data ingestion or fast data ingestion at the cost of slow analytics. There is an answer to this problem. With the Apache Kudu column-oriented data store, you can easily perform fast analytics on fast data....
Price: $40.44 | Publisher: O'Reilly Media | Release: 2018
by Paul Rogers, Charles Givre
Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster....
Price: $42.26 | Publisher: O'Reilly Media | Release: 2018
by Syed Muhammad Fahad Akhtar
The big data architects are the “masters” of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.Big Data Architect's Handbook takes you through developing ...
Price: $54.99 | Publisher: Packt Publishing | Release: 2018
by Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias
R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.The book sta...
Price: $39.99 | Publisher: Packt Publishing | Release: 2018
by Bill Havanki
Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there's a lot more to deploying Hadoop to the public cloud than simply renting machines.This hands-on guide shows developers and systems administrators familiar with Hadoop how to install, use, an...
Price: $25.32 | Publisher: O'Reilly Media | Release: 2017
Sams Teach Yourself Hadoop in 24 Hours
by Jeffrey Aven
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each sh...
Price: $31.99 | Publisher: SAMS Publishing | Release: 2017
by Adam Gibson, Josh Patterson
Although interest in machine learning has reached a high point, lofty expectations often scuttle projects before they get very far. How can machine learning - especially deep neural networks - make a real difference in your organization? This hands-on guide not only provides the most practical information available on the subject, but also helps you get started building efficient deep learning networks.Auth...
Price: $15.99 | Publisher: O'Reilly Media | Release: 2017