Big Data Processing with Apache Spark
Efficiently tackle large datasets and big data analysis with Spark and Python
|Price||$29.99 - $38.32
|Author||Manuel Ignacio Franco Galeano|
|Format||Paper book / ebook (PDF)|
Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.
You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.
By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.
by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own da...
Price: $49.99 | Publisher: Packt Publishing | Release: 2018
by Naresh Kumar, Prashant Shindgikar
The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding...
Price: $31.99 | Publisher: Packt Publishing | Release: 2018
by Jillur Quddus
Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizati...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
by Vignesh Prajapati
Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New...
Price: $29.99 | Publisher: Packt Publishing | Release: 2013
by Mike Frampton
See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together.In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examin...
Price: $37.75 | Publisher: Apress | Release: 2018
by Mike Frampton
Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...
Price: $35.25 | Publisher: Packt Publishing | Release: 2015
by Mohammed Guller
This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...
Price: $32.00 | Publisher: Apress | Release: 2016
by Holden Karau
Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and ...
Price: $22.99 | Publisher: Packt Publishing | Release: 2013