Apache Spark 2: Data Processing and Real-Time Analytics

Master complex big data processing, stream analytics, and machine learning with Apache Spark



Bookstore > Books > Apache Spark 2: Data Processing and Real-Time Analytics

Price$49.99 - $64.68
Rating
AuthorsRomeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
PublisherPackt Publishing
Published2018
Pages616
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101789959209
ISBN-139781789959208
EBook Hardcover Paperback

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.

You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.

By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.


  1. (3 books)



Similar Books


Real Time Analytics with SAP HANA

Real Time Analytics with SAP HANA

by Vinay Singh

SAP HANA is an in-memory database created by SAP. SAP HANA breaks traditional database barriers to simplify IT landscapes, eliminating data preparation, pre-aggregation, and tuning. SAP HANA and in-memory computing allow you to instantly access huge volumes of structured and unstructured data, including text data, from different sources.S...

Price:  $31.99  |  Publisher:  Packt Publishing  |  Release:  2015

Real-time Analytics with Storm and Cassandra

Real-time Analytics with Storm and Cassandra

by Shilpi Saxena

This book will teach you how to use Storm for real-time data processing and to make your applications highly available with no downtime using Cassandra.The book starts off with the basics of Storm and its components along with setting up the environment for the execution of a Storm topology in local and distributed mode. Moving on, you wi...

Price:  $35.99  |  Publisher:  Packt Publishing  |  Release:  2015

Fast Data Processing with Spark, 2nd Edition

Fast Data Processing with Spark, 2nd Edition

by Krishna Sankar, Holden Karau

Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (G...

Price:  $23.08  |  Publisher:  Packt Publishing  |  Release:  2015

Beginning Apache Spark 2

Beginning Apache Spark 2

by Hien Luu

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover res...

Price:  $25.33  |  Publisher:  Apress  |  Release:  2018

Storm Real-time Processing Cookbook

Storm Real-time Processing Cookbook

by Quinton Anderson

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!Storm Real Time Processing Cookbook will ha...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2013

Storm Blueprints: Patterns for Distributed Real-time Computation

Storm Blueprints: Patterns for Distributed Real-time Computation

by P. Taylor Goetz, Brian O'Neill

Storm is the most popular framework for real-time stream processing. Storm provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission critical applications. It is both an integration technology as well as a data flow and control mechanism, making it the core of many big dat...

Price:  $27.07  |  Publisher:  Packt Publishing  |  Release:  2014

Apache Kafka Quick Start Guide

Apache Kafka Quick Start Guide

by Raul Estrada

Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the fly. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing p...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Real-Time Analytics

Real-Time Analytics

by Byron Ellis

Real-time analytics is the hottest topic in data analytics today. In Real-Time Analytics - expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis plat...

Price:  $33.90  |  Publisher:  Wiley  |  Release:  2014