Data Analysis with Python and PySpark
by Jonathan Rioux
Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you've learned, and rapidly start implementing PySpark into your data sys...
Price: $45.07 | Publisher: Manning | Release: 2022
by Kevin Feasel
Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered.PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and mo...
Price: $21.73 | Publisher: Apress | Release: 2020
Beginning Apache Spark Using Azure Databricks
by Robert Ilijason
Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incremental...
Price: $32.32 | Publisher: Apress | Release: 2020
by Jean-Georges Perrin
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, 2nd Edition, you'll learn to take advantage of Spark's core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in ente...
Price: $35.89 | Publisher: Manning | Release: 2020
Mastering Large Datasets with Python
by John T. Wolohan
Modern data science solutions need to be clean, easy to read, and scalable. In Mastering Large Datasets with Python, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You'll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as...
Price: $39.99 | Publisher: Manning | Release: 2020
FREE EBOOK - Hadoop for Windows Succinctly
by Dave Vickers
Author Dave Vickers provides a thorough guide to using Hadoop directly on Windows operating systems. From a conceptual overview to practical examples, Hadoop for Windows Succinctly is a valuable resource for developers....
Publisher: Syncfusion | Release: 2019
Modern Big Data Processing with Hadoop
by Naresh Kumar, Prashant Shindgikar
The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding of the data lifecycle management with Hadoop, followed by modeling of...
Price: $50.55 | Publisher: Packt Publishing | Release: 2018
Apache Hadoop 3 Quick Start Guide
by Hrishikesh Karambelkar
Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.The book begins with an overview of big data and Apache Hadoop. Then, you will set up a p...
Price: $29.99 | Publisher: Packt Publishing | Release: 2018
by Raju Kumar Mishra
Quickly find solutions to common programming problems encountered while processing big data. Content is presented in the popular problem-solution format. Look up the programming problem that you want to solve. Read the solution. Apply the solution directly in your own code. Problem solved!PySpark Recipes covers Hadoop and its shortcomings. The architecture of Spark, PySpark, and RDD are presented. You will ...
Price: $35.10 | Publisher: Apress | Release: 2018
Advanced Data Analytics Using Python
by Sayan Mukhopadhyay
Gain a broad foundation of advanced data analytics concepts and discover the recent revolution in databases such as Neo4j, Elasticsearch, and MongoDB. This book discusses how to implement ETL techniques including topical crawling, which is applied in domains such as high-frequency algorithmic trading and goal-oriented dialog systems. You'll also see examples of machine learning concepts such as semi-su...
Price: $29.01 | Publisher: Apress | Release: 2018