Learning Apache Drill

Query and Analyze Structured Data



Bookstore > Books > Learning Apache Drill

Price$42.26 - $63.85
Rating
AuthorsPaul Rogers, Charles Givre
PublisherO'Reilly Media
Published2018
Pages332
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101492032794
ISBN-139781492032793
EBook Hardcover Paperback

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.

In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Drill helps you analyze data more effectively to drive down time to insight.

Use Drill to clean, prepare, and summarize delimited data for further analysis; Query file types including logfiles, Parquet, JSON, and other complex formats; Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL; Connect to Drill programmatically using a variety of languages; Use Drill even with challenging or ambiguous file formats; Perform sophisticated analysis by extending Drill’s functionality with user-defined functions; Facilitate data analysis for network security, image metadata, and machine learning.




4 5 9

Similar Books


Learning Apache Kafka, 2nd Edition

Learning Apache Kafka, 2nd Edition

by Nishant Garg

Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per s...

Price:  $13.07  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Mahout

Learning Apache Mahout

by Chandramani Tiwary

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It i...

Price:  $44.99  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Mahout Classification

Learning Apache Mahout Classification

by Ashish Gupta

This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model evaluation techniques, we will explore Apache Mahout and learn why it is a good choice for classification.Next, you will learn about different cla...

Price:  $29.99  |  Publisher:  Packt Publishing  |  Release:  2015

Learning Apache Thrift

Learning Apache Thrift

by Krzysztof Rakowski

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by ...

Price:  $13.46  |  Publisher:  Packt Publishing  |  Release:  2015

Machine Learning with Spark

Machine Learning with Spark

by Nick Pentreath

Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, ...

Price:  $34.99  |  Publisher:  Packt Publishing  |  Release:  2015

Apache Mahout Essentials

Apache Mahout Essentials

by Jayani Withanawasam

Apache Mahout is a scalable machine learning library with algorithms for clustering, classification, and recommendations. It empowers users to analyze patterns in large, diverse, and complex datasets faster and more scalably.This book is an all-inclusive guide to analyzing large and complex datasets using Apache Mahout. It explains compli...

Price:  $24.99  |  Publisher:  Packt Publishing  |  Release:  2015

Apache Spark 2: Data Processing and Real-Time Analytics

Apache Spark 2: Data Processing and Real-Time Analytics

by Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your o...

Price:  $49.99  |  Publisher:  Packt Publishing  |  Release:  2018

Learning Spark

Learning Spark

by Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. Th...

Price:  $32.23  |  Publisher:  O'Reilly Media  |  Release:  2015