Big Data Processing with Apache Spark

Efficiently tackle large datasets and big data analysis with Spark and Python



Bookstore > Books > Big Data Processing with Apache Spark

Price$29.99 - $38.32
Rating
AuthorManuel Ignacio Franco Galeano
PublisherPackt Publishing
Published2018
Pages142
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101789808812
ISBN-139781789808810
EBook Hardcover Paperback

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.

By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.





Similar Books


Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

by Naresh Kumar, Prashant Shindgikar

The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding...

Price:  $31.99  |  Publisher:  Packt Publishing  |  Release:  2018

Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

by Vignesh Prajapati

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New...

Price:  $26.94  |  Publisher:  Packt Publishing  |  Release:  2013

Complete Guide to Open Source Big Data Stack

Complete Guide to Open Source Big Data Stack

by Mike Frampton

See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together.In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examin...

Price:  $37.75  |  Publisher:  Apress  |  Release:  2018

Mastering Apache Spark

Mastering Apache Spark

by Mike Frampton

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...

Price:  $35.25  |  Publisher:  Packt Publishing  |  Release:  2015

Big Data Analytics with Spark

Big Data Analytics with Spark

by Mohammed Guller

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...

Price:  $32.33  |  Publisher:  Apress  |  Release:  2016

Fast Data Processing with Spark

Fast Data Processing with Spark

by Holden Karau

Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and ...

Price:  $22.99  |  Publisher:  Packt Publishing  |  Release:  2013

Fast Data Processing with Spark, 2nd Edition

Fast Data Processing with Spark, 2nd Edition

by Krishna Sankar, Holden Karau

Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (G...

Price:  $23.99  |  Publisher:  Packt Publishing  |  Release:  2015

Apache Hadoop 3 Quick Start Guide

Apache Hadoop 3 Quick Start Guide

by Hrishikesh Karambelkar

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.The book begins wit...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018