Learning Apache Mahout
Acquire practical skills in Big Data Analytics and explore data science with Apache Mahout
Price | $44.99 - $65.24
|
Rating | |
Author | Chandramani Tiwary |
Publisher | Packt Publishing |
Published | 2015 |
Pages | 250 |
Language | English |
Format | Paper book / ebook (PDF) |
ISBN-10 | 1783555211 |
ISBN-13 | 9781783555215 |
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.
Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.
- Chandramani Tiwary
3 5 1
Similar Books
Learning Apache Mahout Classification
by Ashish Gupta
This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.Next, you will learn about different cla...
Price: $29.99 | Publisher: Packt Publishing | Release: 2015
by Jayani Withanawasam
Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains compli...
Price: $24.99 | Publisher: Packt Publishing | Release: 2015
Learning Apache Kafka, 2nd Edition
by Nishant Garg
Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per s...
Price: $13.07 | Publisher: Packt Publishing | Release: 2015
by Atri Sharma
Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications. Starting with the basics of Lucene and searching, you will learn about the types of queries used ...
Price: $31.61 | Publisher: Apress | Release: 2020
by Paul Rogers, Charles Givre
Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databas...
Price: $42.26 | Publisher: O'Reilly Media | Release: 2018
by Krzysztof Rakowski
With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by ...
Price: $13.46 | Publisher: Packt Publishing | Release: 2015
by Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell
Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. Th...
Price: $32.23 | Publisher: O'Reilly Media | Release: 2015
by Nick Pentreath
Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, ...
Price: $34.99 | Publisher: Packt Publishing | Release: 2015