Big Data Processing with Apache Spark

Efficiently tackle large datasets and big data analysis with Spark and Python



Bookstore > Books > Big Data Processing with Apache Spark

Price$29.99 - $38.32
Rating
AuthorManuel Ignacio Franco Galeano
PublisherPackt Publishing
Published2018
Pages142
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101789808812
ISBN-139781789808810
EBook Hardcover Paperback

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.

By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.




5 5 1

Similar Books


Apache Spark 2: Data Processing and Real-Time Analytics

Apache Spark 2: Data Processing and Real-Time Analytics

by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your o...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2018

Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

by Naresh Kumar, Prashant Shindgikar

The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding...

Price:  $50.55  |  Publisher:  Packt Publishing  |  Release:  2018

Machine Learning with Apache Spark Quick Start Guide

Machine Learning with Apache Spark Quick Start Guide

by Jillur Quddus

Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizati...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Modern Data Engineering with Apache Spark

Modern Data Engineering with Apache Spark

by Scott Haines

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data inges...

Price:  $46.38  |  Publisher:  Apress  |  Release:  2022

Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

by Vignesh Prajapati

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New...

Price:  $5.77  |  Publisher:  Packt Publishing  |  Release:  2013

Complete Guide to Open Source Big Data Stack

Complete Guide to Open Source Big Data Stack

by Mike Frampton

See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together.In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examin...

Price:  $30.77  |  Publisher:  Apress  |  Release:  2018

Scaling Big Data with Hadoop and Solr

Scaling Big Data with Hadoop and Solr

by Hrishikesh Vijay Karambelkar

As data grows exponentially day-by-day, extracting information becomes a tedious activity in itself. Technologies like Hadoop are trying to address some of the concerns, while Solr provides high-speed faceted search. Bringing these two technologies together is helping organizations resolve the problem of information extraction from Big Da...

Price:  $26.99  |  Publisher:  Packt Publishing  |  Release:  2013

Mastering Apache Spark

Mastering Apache Spark

by Mike Frampton

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...

Price:  $43.99  |  Publisher:  Packt Publishing  |  Release:  2015