Apache Spark Books



Bookstore > Books > Apache Spark

Advanced Analytics with PySpark

Advanced Analytics with PySpark

by Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best pract...

Price:  $41.03  |  Publisher:  O'Reilly Media  |  Release:  2022

Modern Data Engineering with Apache Spark

Modern Data Engineering with Apache Spark

by Scott Haines

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire loc...

Price:  $46.38  |  Publisher:  Apress  |  Release:  2022

The Azure Data Lakehouse Toolkit

The Azure Data Lakehouse Toolkit

by Ron L'Esteve

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics...

Price:  $54.99  |  Publisher:  Apress  |  Release:  2022

Data Algorithms with Spark

Data Algorithms with Spark

by Mahmoud Parsian

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.In each chapter, author Mahmoud Parsian shows you how to solve a data p...

Price:  $42.95  |  Publisher:  O'Reilly Media  |  Release:  2022

Microsoft Excel VBA and Macros

Microsoft Excel VBA and Macros

by Bill Jelen, Tracy Syrstad

Use this guide to automate virtually any routine Excel task: save yourself hours, days, maybe even weeks. Make Excel do things you thought were impossible, discover macro techniques you wont find anywhere else, and create automated reports that are amazingly powerful. Bill Jelen and Tracy Syrstad help you instantly visualize information to make it actionable; capture data from anywhere, and use it anywhere;...

Price:  $43.95  |  Publisher:  Microsoft Press  |  Release:  2022

Data Science on the Google Cloud Platform, 2nd Edition

Data Science on the Google Cloud Platform, 2nd Edition

by Valliappa Lakshmanan

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP.Throughout this updated second edition, you'll work through a sample business decision by employing a variety ...

Price:  $53.26  |  Publisher:  O'Reilly Media  |  Release:  2022

Machine Learning with PySpark, 2nd Edition

Machine Learning with PySpark, 2nd Edition

by Pramod Singh

Master the new features in PySpark 3.1 to develop data-driven, intelligent applications. This updated edition covers topics ranging from building scalable machine learning models, to natural language processing, to recommender systems.Machine Learning with PySpark, Second Edition begins with the fundamentals of Apache Spark, including the latest updates to the framework. Next, you will learn the full spectr...

Price:  $49.05  |  Publisher:  Apress  |  Release:  2022

Data Analysis with Python and PySpark

Data Analysis with Python and PySpark

by Jonathan Rioux

Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you've learned, and rapidly start implementing PySpark into your data sys...

Price:  $57.69  |  Publisher:  Manning  |  Release:  2022

Grokking Streaming Systems

Grokking Streaming Systems

by Josh Fischer, Ning Wang

Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that's a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities!Streaming systems...

Price:  $59.99  |  Publisher:  Manning  |  Release:  2022

In-Memory Analytics with Apache Arrow

In-Memory Analytics with Apache Arrow

by Matthew Topol

Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily.In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow's versatility and benefits as you walk through a variety of real-world use cases. You'll cover key tasks such as enhancing data science workfl...

Price:  $44.99  |  Publisher:  Packt Publishing  |  Release:  2022

Pages: 1, 2, 3 ... 9 | Next→

Subscribe to Newsletter

Be the first to know about new IT books, upcoming releases, exclusive offers and more.