Practical Synthetic Data Generation

Balancing Privacy and the Broad Availability of Data



Bookstore > Books > Practical Synthetic Data Generation

Price$48.30 - $67.99
Rating
AuthorsKhaled El Emam, Lucy Mosquera, Richard Hoptroff
PublisherO'Reilly Media
Published2020
Pages166
LanguageEnglish
FormatPaper book / ebook (PDF)
ISBN-101492072745
ISBN-139781492072744
EBook Hardcover Paperback

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data - fake data generated from real data - so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.

Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.

This book describes: Steps for generating synthetic data using multivariate normal distributions; Methods for distribution fitting covering different goodness-of-fit metrics; How to replicate the simple structure of original data; An approach for modeling data structure to consider complex relationships; Multiple approaches and metrics you can use to assess data utility; How analysis performed on real data can be replicated with synthetic data; Privacy implications of synthetic data and methods to assess identity disclosure.


  1. (3 books)



Similar Books


Practical Oracle Database Appliance

Practical Oracle Database Appliance

by Bobby Curtis, Fuad Arshad, Erik Benner, Maris Elsins, Matt Gallagher, Pete Sharman, Yury Velikanov

Practical Oracle Database Appliance is a hands-on book taking you through the components and implementation of the Oracle Database Appliance. Learn about architecture, installation, configuration, and reconfiguration. Install and configure the Oracle Database Appliance with confidence. Make the right choices between the various configurat...

Price:  $49.99  |  Publisher:  Apress  |  Release:  2014

Practical Enterprise Data Lake Insights

Practical Enterprise Data Lake Insights

by Saurabh Gupta, Venkata Giri

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues.When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data...

Price:  $24.14  |  Publisher:  Apress  |  Release:  2018

Thinking with Data

Thinking with Data

by Max Shron

Many analysts are too concerned with tools and techniques for cleansing, modeling, and visualizing datasets and not concerned enough with asking the right questions. In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Da...

Price:  $10.73  |  Publisher:  O'Reilly Media  |  Release:  2014

Practical Neo4j

Practical Neo4j

by Greg Jordan

Why have developers at places like Facebook and Twitter increasingly turned to graph databases to manage their highly connected big data? The short answer is that graphs offer superior speed and flexibility to get the job done.It's time you added skills in graph databases to your toolkit. In Practical Neo4j, database expert Greg Jordan gu...

Price:  $24.53  |  Publisher:  Apress  |  Release:  2015

Python for Data Mining Quick Syntax Reference

Python for Data Mining Quick Syntax Reference

by Valentina Porcu

Learn how to use Python and its structures, how to install Python, and which tools are best suited for data analyst work. This book provides you with a handy reference and tutorial on topics ranging from basic Python concepts through to data mining, manipulating and importing datasets, and data analysis.Python for Data Mining Quick Syntax...

Price:  $23.88  |  Publisher:  Apress  |  Release:  2018

Streaming Systems

Streaming Systems

by Tyler Akidau, Slava Chernyak, Reuven Lax

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to wo...

Price:  $52.52  |  Publisher:  O'Reilly Media  |  Release:  2018

Building an Anonymization Pipeline

Building an Anonymization Pipeline

by Luk Arbuckle, Khaled El Emam

How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner.Luk Arbuckle and Khaled ...

Price:  $36.65  |  Publisher:  O'Reilly Media  |  Release:  2020

Modern Scala Projects

Modern Scala Projects

by Ilango Gurusamy

Scala, together with the Spark Framework, forms a rich and powerful data processing ecosystem. Modern Scala Projects is a journey into the depths of this ecosystem. The machine learning (ML) projects presented in this book enable you to create practical, robust data analytics solutions, with an emphasis on automating data workflows with t...

Price:  $30.83  |  Publisher:  Packt Publishing  |  Release:  2018