Practical Synthetic Data Generation
Balancing Privacy and the Broad Availability of Data
Price | $59.99
|
Rating | |
Authors | Khaled El Emam, Lucy Mosquera, Richard Hoptroff |
Publisher | O'Reilly Media |
Published | 2020 |
Pages | 166 |
Language | English |
Format | Paper book / ebook (PDF) |
ISBN-10 | 1492072745 |
ISBN-13 | 9781492072744 |
Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data - fake data generated from real data - so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.
Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.
This book describes: Steps for generating synthetic data using multivariate normal distributions; Methods for distribution fitting covering different goodness-of-fit metrics; How to replicate the simple structure of original data; An approach for modeling data structure to consider complex relationships; Multiple approaches and metrics you can use to assess data utility; How analysis performed on real data can be replicated with synthetic data; Privacy implications of synthetic data and methods to assess identity disclosure.
- Khaled El Emam (3 books)
- Lucy Mosquera
- Richard Hoptroff
4 5 9
Similar Books
Practical Enterprise Data Lake Insights
by Saurabh Gupta, Venkata Giri
Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues.When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data...
Price: $24.14 | Publisher: Apress | Release: 2018
Practical Oracle Database Appliance
by Bobby Curtis, Fuad Arshad, Erik Benner, Maris Elsins, Matt Gallagher, Pete Sharman, Yury Velikanov
Practical Oracle Database Appliance is a hands-on book taking you through the components and implementation of the Oracle Database Appliance. Learn about architecture, installation, configuration, and reconfiguration. Install and configure the Oracle Database Appliance with confidence. Make the right choices between the various configurat...
Price: $49.99 | Publisher: Apress | Release: 2014
Fundamentals of Data Engineering
by Joe Reis, Matt Housley
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies availab...
Price: $34.90 | Publisher: O'Reilly Media | Release: 2022
Practical Simulations for Machine Learning
by Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning
Simulation and synthesis are core parts of the future of AI and machine learning. Consider: programmers, data scientists, and machine learning engineers can create the brain of a self-driving car without the car. Rather than use information from the real world, you can synthesize artificial data using simulations to train traditional mach...
Price: $59.99 | Publisher: O'Reilly Media | Release: 2022
Practical Python Data Wrangling and Data Quality
by Susan E. McGregor
The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can gene...
Price: $49.58 | Publisher: O'Reilly Media | Release: 2021
by Sandeep Uttamchandani
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can't scale data science teams fast enough to keep up with the growing amounts of data to transform. What's the answer? Self-service data.With this practical book, data...
Price: $48.49 | Publisher: O'Reilly Media | Release: 2020
by Max Shron
Many analysts are too concerned with tools and techniques for cleansing, modeling, and visualizing datasets and not concerned enough with asking the right questions. In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Da...
Price: $25.17 | Publisher: O'Reilly Media | Release: 2014
by Greg Jordan
Why have developers at places like Facebook and Twitter increasingly turned to graph databases to manage their highly connected big data? The short answer is that graphs offer superior speed and flexibility to get the job done.It's time you added skills in graph databases to your toolkit. In Practical Neo4j, database expert Greg Jord...
Price: $24.53 | Publisher: Apress | Release: 2015