Learning Spark

Lightning-Fast Big Data Analysis



Bookstore > Books > Learning Spark

Price$32.23 - $44.26
Rating
AuthorsMatei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell
PublisherO'Reilly Media
Published2015
Pages276
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101449358624
ISBN-139781449358624
EBook Hardcover Paperback

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell; Leverage Spark's powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib; Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm; Learn how to deploy interactive, batch, and streaming applications; Connect to data sources including HDFS, Hive, JSON, and S3; Master advanced topics like data partitioning and shared variables.


  1. (2 books)
  2. (5 books)


4 5 799

Similar Books


Next-Generation Machine Learning with Spark

Next-Generation Machine Learning with Spark

by Butch Quinto

Access real-world documentation and examples for the Spark platform for building large-scale, enterprise-grade machine learning applications.The past decade has seen an astonishing series of advances in machine learning. These breakthroughs are disrupting our everyday life and making an impact across every industry.Next-Generation Machine...

Price:  $26.41  |  Publisher:  Apress  |  Release:  2020

Spark in Action, 2nd Edition

Spark in Action, 2nd Edition

by Jean-Georges Perrin

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, 2nd Edition, you'll learn to take advantage of Spark's core features and incredible processing speed, with applications including real-time computation, delayed eval...

Price:  $39.99  |  Publisher:  Manning  |  Release:  2020

Sams Teach Yourself Apache Spark in 24 Hours

Sams Teach Yourself Apache Spark in 24 Hours

by Jeffrey Aven

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark's amaz...

Price:  $32.51  |  Publisher:  SAMS Publishing  |  Release:  2016

Machine Learning with Spark

Machine Learning with Spark

by Nick Pentreath

Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, ...

Price:  $34.99  |  Publisher:  Packt Publishing  |  Release:  2015

Beginning Apache Spark 2

Beginning Apache Spark 2

by Hien Luu

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discove...

Price:  $25.33  |  Publisher:  Apress  |  Release:  2018

Apache Spark 2: Data Processing and Real-Time Analytics

Apache Spark 2: Data Processing and Real-Time Analytics

by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your o...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2018

Advanced Analytics with PySpark

Advanced Analytics with PySpark

by Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics...

Price:  $35.42  |  Publisher:  O'Reilly Media  |  Release:  2022

Big Data Analytics with Spark

Big Data Analytics with Spark

by Mohammed Guller

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.Big Data Analytics w...

Price:  $29.99  |  Publisher:  Apress  |  Release:  2016