Data Algorithms with Spark
Recipes and Design Patterns for Scaling Up using PySpark
Price | $42.95 - $57.01
|
Rating | |
Author | Mahmoud Parsian |
Publisher | O'Reilly Media |
Published | 2022 |
Pages | 435 |
Language | English |
Format | Paper book / ebook (PDF) |
ISBN-10 | 1492082384 |
ISBN-13 | 9781492082385 |
Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.
In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script.
With this book, you will: Learn how to select Spark transformations for optimized solutions; Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions(); Understand data partitioning for optimized queries; Build and apply a model using PySpark design patterns; Apply motif-finding algorithms to graph data; Analyze graph data by using the GraphFrames API; Apply PySpark algorithms to clinical and genomics data; Learn how to use and apply feature engineering in ML algorithms; Understand and use practical and pragmatic data design patterns.
- Mahmoud Parsian (2 books)
3 5 6
Similar Books
Fast Data Processing with Spark
by Holden Karau
Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and ...
Price: $22.99 | Publisher: Packt Publishing | Release: 2013
by Mohammed Guller
This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...
Price: $29.99 | Publisher: Apress | Release: 2016
Fast Data Processing with Spark, 2nd Edition
by Krishna Sankar, Holden Karau
Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (G...
Price: $29.99 | Publisher: Packt Publishing | Release: 2015
Big Data Processing with Apache Spark
by Manuel Ignacio Franco Galeano
Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streamin...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.You'll start with an introduction to Spark and i...
Price: $20.00 | Publisher: O'Reilly Media | Release: 2015
Advanced Analytics with Spark, 2nd Edition
by Sandy Ryza, Uri Laserson, Josh Wills, Sean Owen
In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this ed...
Price: $29.85 | Publisher: O'Reilly Media | Release: 2017
Practical Data Science with R, 2nd Edition
by Nina Zumel, John Mount
Practical Data Science with R, Second Edition takes a practice-oriented approach to explaining basic principles in the ever expanding field of data science. You'll jump right to real-world use cases as you apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business...
Price: $39.99 | Publisher: Manning | Release: 2019
by Paul Houghton
Alteryx is a GUI-based development platform for data analytic applications.Data Engineering with Alteryx will help you leverage Alteryx's code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.This book will teach you the principles of DataOps and how they can be...
Price: $44.99 | Publisher: Packt Publishing | Release: 2022