Data Pipelines with Apache Airflow



Bookstore > Books > Data Pipelines with Apache Airflow

Price$39.99 - $53.49
Rating
AuthorsBas P. Harenslak, Julian Rutger de Ruiter
PublisherManning
Published2021
Pages480
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101617296902
ISBN-139781617296901
EBook Hardcover Paperback

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.

Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You'll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline's needs.





5 5 18

Similar Books


Big Data Processing with Apache Spark

Big Data Processing with Apache Spark

by Manuel Ignacio Franco Galeano

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Learning Apache Mahout

Learning Apache Mahout

by Chandramani Tiwary

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It i...

Price:  $35.99  |  Publisher:  Packt Publishing  |  Release:  2015

Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

by Naresh Kumar, Prashant Shindgikar

The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding...

Price:  $31.99  |  Publisher:  Packt Publishing  |  Release:  2018

Big Data Analytics with Spark

Big Data Analytics with Spark

by Mohammed Guller

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...

Price:  $29.99  |  Publisher:  Apress  |  Release:  2016

Modern Data Access with Entity Framework Core

Modern Data Access with Entity Framework Core

by Holger Schwichtenberg

C# developers, here's your opportunity to learn the ins-and-outs of Entity Framework Core, Microsoft's recently redesigned object-relational mapper. Benefit from hands-on learning that will teach you how to tackle frustrating database challenges, such as workarounds to missing features in Entity Framework Core, and learn how to optimize t...

Price:  $34.19  |  Publisher:  Apress  |  Release:  2018

Data Science on the Google Cloud Platform

Data Science on the Google Cloud Platform

by Valliappa Lakshmanan

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on...

Price:  $42.33  |  Publisher:  O'Reilly Media  |  Release:  2018

Data-oriented Development with AngularJS

Data-oriented Development with AngularJS

by Manoj Waikar

AngularJS is one of the most popular JavaScript frameworks used to write single page applications and is suitable for developing large-scale enterprise applications. With Firebase, you can easily store and sync data in real time. It has libraries for all the major web and mobile platforms (including AngularJS) and bindings for the most po...

Price:  $19.99  |  Publisher:  Packt Publishing  |  Release:  2015

Data Analysis with R

Data Analysis with R

by Tony Fischetti

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. With over 7,000 user contributed pa...

Price:  $43.99  |  Publisher:  Packt Publishing  |  Release:  2015