Big Data Processing with Apache Spark
Efficiently tackle large datasets and big data analysis with Spark and Python
Price | $29.99 - $38.32
|
Rating | |
Author | Manuel Ignacio Franco Galeano |
Publisher | Packt Publishing |
Published | 2018 |
Pages | 142 |
Language | English |
Format | Paper book / ebook (PDF) |
ISBN-10 | 1789808812 |
ISBN-13 | 9781789808810 |
Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.
You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.
By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.
- Manuel Ignacio Franco Galeano
5 5 1
Similar Books
Apache Spark 2: Data Processing and Real-Time Analytics
by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your o...
Price: $49.99 | Publisher: Packt Publishing | Release: 2018
Modern Big Data Processing with Hadoop
by Naresh Kumar, Prashant Shindgikar
The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding...
Price: $50.55 | Publisher: Packt Publishing | Release: 2018
Machine Learning with Apache Spark Quick Start Guide
by Jillur Quddus
Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizati...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
Modern Data Engineering with Apache Spark
by Scott Haines
Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data inges...
Price: $46.38 | Publisher: Apress | Release: 2022
Big Data Analytics with R and Hadoop
by Vignesh Prajapati
Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New...
Price: $5.77 | Publisher: Packt Publishing | Release: 2013
Complete Guide to Open Source Big Data Stack
by Mike Frampton
See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together.In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examin...
Price: $30.77 | Publisher: Apress | Release: 2018
Scaling Big Data with Hadoop and Solr
by Hrishikesh Vijay Karambelkar
As data grows exponentially day-by-day, extracting information becomes a tedious activity in itself. Technologies like Hadoop are trying to address some of the concerns, while Solr provides high-speed faceted search. Bringing these two technologies together is helping organizations resolve the problem of information extraction from Big Da...
Price: $26.99 | Publisher: Packt Publishing | Release: 2013
by Mike Frampton
Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...
Price: $43.99 | Publisher: Packt Publishing | Release: 2015