Learning Apache Drill
Query and Analyze Structured Data
|Price||$45.05 - $63.85
|Authors||Paul Rogers, Charles Givre|
|Format||Paper book / ebook (PDF)|
Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.
In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Drill helps you analyze data more effectively to drive down time to insight.
Use Drill to clean, prepare, and summarize delimited data for further analysis; Query file types including logfiles, Parquet, JSON, and other complex formats; Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL; Connect to Drill programmatically using a variety of languages; Use Drill even with challenging or ambiguous file formats; Perform sophisticated analysis by extending Drill’s functionality with user-defined functions; Facilitate data analysis for network security, image metadata, and machine learning.
by Nishant Garg
Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per s...
Price: $20.99 | Publisher: Packt Publishing | Release: 2015
by Chandramani Tiwary
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It i...
Price: $35.99 | Publisher: Packt Publishing | Release: 2015
by Ashish Gupta
This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.Next, you will learn about different cla...
Price: $17.99 | Publisher: Packt Publishing | Release: 2015
by Krzysztof Rakowski
With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by ...
Price: $27.99 | Publisher: Packt Publishing | Release: 2015
by Jayani Withanawasam
Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains compli...
Price: $19.99 | Publisher: Packt Publishing | Release: 2015
by Hien Luu
Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover res...
Price: $25.33 | Publisher: Apress | Release: 2018
by Nick Pentreath
Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, ...
Price: $29.99 | Publisher: Packt Publishing | Release: 2015
by Patrick R. Nicolas
The discovery of information through data clustering and classification is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, engineering designs, biometrics, and trading strategies, to detection of genetic anomalies.The book begins with an introduction to the...
Price: $35.99 | Publisher: Packt Publishing | Release: 2014