Advanced Analytics with PySpark   




by Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best pract...
Price: $35.42 | Publisher: O'Reilly Media | Release: 2022
Modern Data Engineering with Apache Spark   




by Scott Haines
Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire loc...
Price: $46.38 | Publisher: Apress | Release: 2022
The Azure Data Lakehouse Toolkit   




by Ron L'Esteve
Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics...
Price: $54.99 | Publisher: Apress | Release: 2022
by Mahmoud Parsian
Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.In each chapter, author Mahmoud Parsian shows you how to solve a data p...
Price: $42.95 | Publisher: O'Reilly Media | Release: 2022
Microsoft Excel VBA and Macros   




by Bill Jelen, Tracy Syrstad
Use this guide to automate virtually any routine Excel task: save yourself hours, days, maybe even weeks. Make Excel do things you thought were impossible, discover macro techniques you wont find anywhere else, and create automated reports that are amazingly powerful. Bill Jelen and Tracy Syrstad help you instantly visualize information to make it actionable; capture data from anywhere, and use it anywhere;...
Price: $34.39 | Publisher: Microsoft Press | Release: 2022
Data Science on the Google Cloud Platform, 2nd Edition   




by Valliappa Lakshmanan
Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP.Throughout this updated second edition, you'll work through a sample business decision by employing a variety ...
Price: $53.26 | Publisher: O'Reilly Media | Release: 2022
Machine Learning with PySpark, 2nd Edition
by Pramod Singh
Master the new features in PySpark 3.1 to develop data-driven, intelligent applications. This updated edition covers topics ranging from building scalable machine learning models, to natural language processing, to recommender systems.Machine Learning with PySpark, Second Edition begins with the fundamentals of Apache Spark, including the latest updates to the framework. Next, you will learn the full spectr...
Price: $49.05 | Publisher: Apress | Release: 2022
Data Analysis with Python and PySpark   




by Jonathan Rioux
Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you've learned, and rapidly start implementing PySpark into your data sys...
Price: $45.07 | Publisher: Manning | Release: 2022
by Josh Fischer, Ning Wang
Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that's a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities!Streaming systems...
Price: $59.99 | Publisher: Manning | Release: 2022
In-Memory Analytics with Apache Arrow   




by Matthew Topol
Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily.In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow's versatility and benefits as you walk through a variety of real-world use cases. You'll cover key tasks such as enhancing data science workfl...
Price: $44.99 | Publisher: Packt Publishing | Release: 2022