Learning Apache Drill

Query and Analyze Structured Data



Bookstore > Books > Learning Apache Drill

Price$53.50 - $63.85
Rating
AuthorsPaul Rogers, Charles Givre
PublisherO'Reilly Media
Published2018
Pages332
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101492032794
ISBN-139781492032793
EBook Hardcover Paperback

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.

In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Drill helps you analyze data more effectively to drive down time to insight.

Use Drill to clean, prepare, and summarize delimited data for further analysis; Query file types including logfiles, Parquet, JSON, and other complex formats; Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL; Connect to Drill programmatically using a variety of languages; Use Drill even with challenging or ambiguous file formats; Perform sophisticated analysis by extending Drill’s functionality with user-defined functions; Facilitate data analysis for network security, image metadata, and machine learning.





Similar Books


Learning Apache Kafka, 2nd Edition

Learning Apache Kafka, 2nd Edition

by Nishant Garg

Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per s...

Price:  $20.99  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Mahout

Learning Apache Mahout

by Chandramani Tiwary

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It i...

Price:  $35.99  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Mahout Classification

Learning Apache Mahout Classification

by Ashish Gupta

This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.Next, you will learn about different cla...

Price:  $17.99  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Thrift

Learning Apache Thrift

by Krzysztof Rakowski

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by ...

Price:  $27.99  |  Publisher:  Packt Publishing  |  Release:  2015

Apache Mahout Essentials

Apache Mahout Essentials

by Jayani Withanawasam

Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains compli...

Price:  $19.99  |  Publisher:  Packt Publishing  |  Release:  2015

Beginning Apache Spark 2

Beginning Apache Spark 2

by Hien Luu

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover res...

Price:  $25.33  |  Publisher:  Apress  |  Release:  2018

Machine Learning with Spark

Machine Learning with Spark

by Nick Pentreath

Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, ...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2015

Scala for Machine Learning

Scala for Machine Learning

by Patrick R. Nicolas

The discovery of information through data clustering and classification is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, engineering designs, biometrics, and trading strategies, to detection of genetic anomalies.The book begins with an introduction to the...

Price:  $35.99  |  Publisher:  Packt Publishing  |  Release:  2014