Data Algorithms with Spark

Recipes and Design Patterns for Scaling Up using PySpark



Bookstore > Books > Data Algorithms with Spark

Price$42.95 - $57.01
Rating
AuthorMahmoud Parsian
PublisherO'Reilly Media
Published2022
Pages435
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101492082384
ISBN-139781492082385
EBook Hardcover Paperback

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.

In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script.

With this book, you will: Learn how to select Spark transformations for optimized solutions; Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions(); Understand data partitioning for optimized queries; Build and apply a model using PySpark design patterns; Apply motif-finding algorithms to graph data; Analyze graph data by using the GraphFrames API; Apply PySpark algorithms to clinical and genomics data; Learn how to use and apply feature engineering in ML algorithms; Understand and use practical and pragmatic data design patterns.


  1. (2 books)


3 5 6

Similar Books


Fast Data Processing with Spark

Fast Data Processing with Spark

by Holden Karau

Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and ...

Price:  $22.99  |  Publisher:  Packt Publishing  |  Release:  2013

Big Data Analytics with Spark

Big Data Analytics with Spark

by Mohammed Guller

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...

Price:  $29.99  |  Publisher:  Apress  |  Release:  2016

Fast Data Processing with Spark, 2nd Edition

Fast Data Processing with Spark, 2nd Edition

by Krishna Sankar, Holden Karau

Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (G...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2015

Big Data Processing with Apache Spark

Big Data Processing with Apache Spark

by Manuel Ignacio Franco Galeano

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streamin...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2018

Advanced Analytics with Spark

Advanced Analytics with Spark

by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.You'll start with an introduction to Spark and i...

Price:  $20.00  |  Publisher:  O'Reilly Media  |  Release:  2015

Advanced Analytics with Spark, 2nd Edition

Advanced Analytics with Spark, 2nd Edition

by Sandy Ryza, Uri Laserson, Josh Wills, Sean Owen

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this ed...

Price:  $29.85  |  Publisher:  O'Reilly Media  |  Release:  2017

Practical Data Science with R, 2nd Edition

Practical Data Science with R, 2nd Edition

by Nina Zumel, John Mount

Practical Data Science with R, Second Edition takes a practice-oriented approach to explaining basic principles in the ever expanding field of data science. You'll jump right to real-world use cases as you apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business...

Price:  $39.99  |  Publisher:  Manning  |  Release:  2019

Data Engineering with Alteryx

Data Engineering with Alteryx

by Paul Houghton

Alteryx is a GUI-based development platform for data analytic applications.Data Engineering with Alteryx will help you leverage Alteryx's code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.This book will teach you the principles of DataOps and how they can be...

Price:  $44.99  |  Publisher:  Packt Publishing  |  Release:  2022