Apache Spark 2: Data Processing and Real-Time Analytics
by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.You will w...
Price: $49.99 | Publisher: Packt Publishing | Release: 2018
by Butch Quinto
Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used fo...
Price: $33.51 | Publisher: Apress | Release: 2018
by Hien Luu
Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover resilient distributed datasets (RDDs); use Spark SQL for structured ...
Price: $25.33 | Publisher: Apress | Release: 2018
by Subhashini Chellappan, Dharanitharan Ganesan
Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You...
Price: $31.66 | Publisher: Apress | Release: 2018
Stream Processing with Apache Flink
by Fabian Hueske, Vasiliki Kalavri
Get started with Apache Flink, the open source framework that enables you to process streaming data - such as user interactions, sensor data, and machine logs - as it arrives. With this practical guide, you'll learn how to use Apache Flink's stream processing APIs to implement, continuously run, and maintain real-world applications.Authors Fabian Hueske, one of Flink's creators, and Vasia Kal...
Price: $32.99 | Publisher: O'Reilly Media | Release: 2018
by Paul Rogers, Charles Givre
Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster....
Price: $42.26 | Publisher: O'Reilly Media | Release: 2018
FREE EBOOK - Designing Event-Driven Systems
by Ben Stopford
Many forces affect software today: larger datasets, geographical disparities, complex company structures, and the growing need to be fast and nimble in the face of change. Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, bu...
Publisher: O'Reilly Media | Release: 2018
Apache Superset Quick Start Guide
by Shashank Shekhar
Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset.First, we look at the fundamentals ...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
Apache Hadoop 3 Quick Start Guide
by Hrishikesh Karambelkar
Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.The book begins with an overview of big data and Apache Hadoop. Then, you will set up a p...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
Mastering Apache Cassandra 3.x, 3rd Edition
by Aaron Ploetz, Tejaswi Malepati, Nishant Neeraj
With ever-increasing rates of data creation, the demand for storing data fast and reliably becomes a need. Apache Cassandra is the perfect choice for building fault-tolerant and scalable databases. Mastering Apache Cassandra 3.x teaches you how to build and architect your clusters, configure and work with your nodes, and program in a high-throughput environment, helping you understand the power of Cassandra...
Price: $39.99 | Publisher: Packt Publishing | Release: 2018