Beginning Apache Spark Using Azure Databricks

Unleashing Large Cluster Analytics in the Cloud



Bookstore > Books > Beginning Apache Spark Using Azure Databricks

Price$37.76 - $53.77
Rating
AuthorRobert Ilijason
PublisherApress
Published2020
Pages274
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101484257804
ISBN-139781484257807
EBook Hardcover Paperback

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.





Similar Books


Beginning Apache Spark 2

Beginning Apache Spark 2

by Hien Luu

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover res...

Price:  $25.33  |  Publisher:  Apress  |  Release:  2018

Mastering Apache Spark

Mastering Apache Spark

by Mike Frampton

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.This book aims to take your limited knowledge of Spark to th...

Price:  $35.25  |  Publisher:  Packt Publishing  |  Release:  2015

Apache Spark 2: Data Processing and Real-Time Analytics

Apache Spark 2: Data Processing and Real-Time Analytics

by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own da...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2018

Practical Apache Spark

Practical Apache Spark

by Subhashini Chellappan, Dharanitharan Ganesan

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also ...

Price:  $31.66  |  Publisher:  Apress  |  Release:  2018

PolyBase Revealed

PolyBase Revealed

by Kevin Feasel

Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered.PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoo...

Price:  $21.73  |  Publisher:  Apress  |  Release:  2020

High Performance Spark

High Performance Spark

by Holden Karau, Rachel Warren

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle la...

Price:  $27.31  |  Publisher:  O'Reilly Media  |  Release:  2017

Sams Teach Yourself Apache Spark in 24 Hours

Sams Teach Yourself Apache Spark in 24 Hours

by Jeffrey Aven

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark's amazing s...

Price:  $32.51  |  Publisher:  SAMS Publishing  |  Release:  2016

Beginning Apache Cassandra Development

Beginning Apache Cassandra Development

by Vivek Mishra

Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single poin...

Price:  $49.99  |  Publisher:  Apress  |  Release:  2014