Modern Data Engineering with Apache Spark

A Hands-On Guide for Building Mission-Critical Streaming Applications



Bookstore > Books > Modern Data Engineering with Apache Spark

Price$46.38 - $54.99
Rating
AuthorScott Haines
PublisherApress
Published2022
Pages585
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101484274512
ISBN-139781484274514
EBook Hardcover Paperback

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow.

Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compile reusable applications and modules, and fully test both batch and streaming. You will also learn to containerize your applications using Docker and run and deploy your Spark applications using a variety of tools such as Apache Airflow, Docker and Kubernetes.

​Reading this book will empower you to take advantage of Apache Spark to optimize your data pipelines and teach you to craft modular and testable Spark applications. You will create and deploy mission-critical streaming spark applications in a low-stress environment that paves the way for your own path to production.




5 5 8

Similar Books


Big Data Processing with Apache Spark

Big Data Processing with Apache Spark

by Manuel Ignacio Franco Galeano

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streamin...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Data Engineering with Alteryx

Data Engineering with Alteryx

by Paul Houghton

Alteryx is a GUI-based development platform for data analytic applications.Data Engineering with Alteryx will help you leverage Alteryx's code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.This book will teach you the principles of DataOps and how they can be...

Price:  $44.99  |  Publisher:  Packt Publishing  |  Release:  2022

Apache Spark 2: Data Processing and Real-Time Analytics

Apache Spark 2: Data Processing and Real-Time Analytics

by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your o...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2018

Mastering Apache Spark

Mastering Apache Spark

by Mike Frampton

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...

Price:  $43.99  |  Publisher:  Packt Publishing  |  Release:  2015

Modern Data Access with Entity Framework Core

Modern Data Access with Entity Framework Core

by Holger Schwichtenberg

C# developers, here's your opportunity to learn the ins-and-outs of Entity Framework Core, Microsoft's recently redesigned object-relational mapper. Benefit from hands-on learning that will teach you how to tackle frustrating database challenges, such as workarounds to missing features in Entity Framework Core, and learn how to ...

Price:  $34.19  |  Publisher:  Apress  |  Release:  2018

Machine Learning with Apache Spark Quick Start Guide

Machine Learning with Apache Spark Quick Start Guide

by Jillur Quddus

Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizati...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform

by Adi Wijaya

With this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards.Starting with a quick overview of the fundamental concepts of data engin...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2022

Data Pipelines with Apache Airflow

Data Pipelines with Apache Airflow

by Bas P. Harenslak, Julian Rutger de Ruiter

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes...

Price:  $36.99  |  Publisher:  Manning  |  Release:  2021