High Performance Spark

Best Practices for Scaling and Optimizing Apache Spark



Bookstore > Books > High Performance Spark

Price$27.31
Rating
AuthorsHolden Karau, Rachel Warren
PublisherO'Reilly Media
Published2017
Pages175
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101491943203
ISBN-139781491995662
EBook Hardcover Paperback

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.

How Spark SQL's new interfaces improve performance over SQL's RDD data structure; The choice between data joins in Core Spark and Spark SQL; Techniques for getting the most out of standard RDD transformations; How to work around performance issues in Spark's key/value pair paradigm; Writing high-performance Spark code without Scala or the JVM; How to test for functionality and performance when applying suggested improvements; Using Spark MLlib and Spark ML machine learning libraries; Spark's Streaming components and external community packages.


  1. (4 books)



Similar Books


Clojure High Performance Programming

Clojure High Performance Programming

by Shantanu Kumar

Clojure is a young, dynamic, functional programming language that runs on the Java Virtual Machine. It is built with performance, pragmatism, and simplicity in mind. Like most general purpose languages, Clojure's features have different performance characteristics that one should know in order to write high performance code.Clojure H...

Price:  $20.99  |  Publisher:  Packt Publishing  |  Release:  2013

NGINX High Performance

NGINX High Performance

by Rahul Sharma

NGINX is one of the most common free, open source web servers. Its performance-oriented architecture and small footprint makes it an ideal choice for high-traffic websites.NGINX offers great performance and optimal resource utilization to its administrators. This practical guide walks you through how to tune one of the leading free open s...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2015

High-Performance Caching with Nginx and Nginx Plus

High-Performance Caching with Nginx and Nginx Plus

by Floyd Smith

You can cache static assets - more than half the payload needed to respond to many web requests - and even application?generated web pages (whether partial or complete). And you can use cache clusters and microcaching to increase the caching capability of your web applications while simplifying implementation and reducing operational comp...

Publisher:  Self-publishing  |  Release:  2017

Apache Solr High Performance

Apache Solr High Performance

by Surendra Mohan

Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techni...

Price:  $20.99  |  Publisher:  Packt Publishing  |  Release:  2014

PostgreSQL 10 High Performance

PostgreSQL 10 High Performance

by Ibrar Ahmed, Gregory Smith, Enrico Pirozzi

PostgreSQL database servers have a common set of problems that they encounter as their usage gets heavier and requirements get more demanding. Peek into the future of your PostgreSQL 10 database's problems today. Know the warning signs to look for and how to avoid the most common issues before they even happen.Surprisingly, most Post...

Price:  $44.99  |  Publisher:  Packt Publishing  |  Release:  2018

Mastering High Performance with Kotlin

Mastering High Performance with Kotlin

by Igor Kucherenko

The ease with which we write applications has been increasing, but with it comes the need to address their performance. A balancing act between easily implementing complex applications and keeping their performance optimal is a present-day requirement In this book, we explore how to achieve this crucial balance, while developing and deplo...

Price:  $44.57  |  Publisher:  Packt Publishing  |  Release:  2018

The NGINX Real-Time API Handbook

The NGINX Real-Time API Handbook

by Karthik Krishnaswamy, Alessandro Fael Garcia

Discover how to deliver reliable, high-performance APIs with our NGINX Real-Time API Handbook. Compiled by leading experts on real-time API management, this handbook is a comprehensive guide to reducing latency in your applications and APIs without making any compromises. Learn why now, more than ever, your APIs need to perform in real ti...

Publisher:  Self-publishing  |  Release:  2020

High Performance MySQL, 2nd Edition

High Performance MySQL, 2nd Edition

by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy D. Zawodny, Arjen Lentz, Derek J. Balling

High Performance MySQL is the definitive guide to building fast, reliable systems with MySQL. Written by noted experts with years of real-world experience building very large systems, this book covers every aspect of MySQL performance in detail, and focuses on robustness, security, and data integrity. Learn advanced techniques in depth so...

Price:  $6.29  |  Publisher:  O'Reilly Media  |  Release:  2008