High Performance Spark
Best Practices for Scaling and Optimizing Apache Spark
|Authors||Holden Karau, Rachel Warren|
|Format||Paper book / ebook (PDF)|
Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.
Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.
How Spark SQL's new interfaces improve performance over SQL's RDD data structure; The choice between data joins in Core Spark and Spark SQL; Techniques for getting the most out of standard RDD transformations; How to work around performance issues in Spark's key/value pair paradigm; Writing high-performance Spark code without Scala or the JVM; How to test for functionality and performance when applying suggested improvements; Using Spark MLlib and Spark ML machine learning libraries; Spark's Streaming components and external community packages.
by Shantanu Kumar
Clojure is a young, dynamic, functional programming language that runs on the Java Virtual Machine. It is built with performance, pragmatism, and simplicity in mind. Like most general purpose languages, Clojure's features have different performance characteristics that one should know in order to write high performance code.Clojure H...
Price: $20.99 | Publisher: Packt Publishing | Release: 2013
by Rahul Sharma
NGINX is one of the most common free, open source web servers. Its performance-oriented architecture and small footprint makes it an ideal choice for high-traffic websites.NGINX offers great performance and optimal resource utilization to its administrators. This practical guide walks you through how to tune one of the leading free open s...
Price: $29.99 | Publisher: Packt Publishing | Release: 2015
by Floyd Smith
You can cache static assets - more than half the payload needed to respond to many web requests - and even application?generated web pages (whether partial or complete). And you can use cache clusters and microcaching to increase the caching capability of your web applications while simplifying implementation and reducing operational comp...
Publisher: Self-publishing | Release: 2017
by Surendra Mohan
Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techni...
Price: $20.99 | Publisher: Packt Publishing | Release: 2014
by Ibrar Ahmed, Gregory Smith, Enrico Pirozzi
PostgreSQL database servers have a common set of problems that they encounter as their usage gets heavier and requirements get more demanding. Peek into the future of your PostgreSQL 10 database's problems today. Know the warning signs to look for and how to avoid the most common issues before they even happen.Surprisingly, most Post...
Price: $44.99 | Publisher: Packt Publishing | Release: 2018
by Igor Kucherenko
The ease with which we write applications has been increasing, but with it comes the need to address their performance. A balancing act between easily implementing complex applications and keeping their performance optimal is a present-day requirement In this book, we explore how to achieve this crucial balance, while developing and deplo...
Price: $44.57 | Publisher: Packt Publishing | Release: 2018
by Karthik Krishnaswamy, Alessandro Fael Garcia
Discover how to deliver reliable, high-performance APIs with our NGINX Real-Time API Handbook. Compiled by leading experts on real-time API management, this handbook is a comprehensive guide to reducing latency in your applications and APIs without making any compromises. Learn why now, more than ever, your APIs need to perform in real ti...
Publisher: Self-publishing | Release: 2020
by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy D. Zawodny, Arjen Lentz, Derek J. Balling
High Performance MySQL is the definitive guide to building fast, reliable systems with MySQL. Written by noted experts with years of real-world experience building very large systems, this book covers every aspect of MySQL performance in detail, and focuses on robustness, security, and data integrity. Learn advanced techniques in depth so...
Price: $6.29 | Publisher: O'Reilly Media | Release: 2008