sdv module

class sdv.SDV(meta_filename)

Bases: object

find_all_metadata(table_name)
find_metadata(table_name)

Read the appropriate metadata from the file and return the json object for only one table.

Parameters:table_name (str) – name of the table of interest.
Returns:json object
get_covariance(name)
get_fk(table_name, other)

Find the column name in the parent table that this table refers to (a foreign key).

Parameters:
  • table_name (str) – name of the table of interest.
  • other (str) – name of parent table.
Returns:

str or None – foreign keys of table

get_model_params(table_name)

Returns the params stored in the model summary file

Parameters:table_name (str) – name of the table of interest.
Returns:(map<str, list>, pandas.DataFrame, list<float>), represents (distribs, covar, values)
get_pk(table_name)

Get primary key from the table name. If this table name is a merged table, the individual ones are split by _ and each should have the same primary key.

Parameters:table_name (str) – name of table whose primary key you are querying for.
Returns:str or None – primary key of table
get_summary(name)
get_synth_data(table_name)

Returns the synthetic data of a table

Parameters:table_name (str) – name of the table of interest.
Returns:pandas.DataFrame
get_table(name)
get_table_names()

Return all table names in a given Dataset. :returns: list<str> – list of all table names

learn_model(summary_filename='model_statistics')

If summary file has been read in, recovers tables from existing model. Otherwise, creates tables from scratch and stores a summary in the given filename.

Parameters:summary_filename (str) – The filename containing the model summary.
load_model_stats(summary_filename='model_statistics.db')

Loads a summary of an existing model, recovers tables from model.

Parameters:summary_filename (str) – The filename containing the model summary.
Raises:IOError
preview_table_names()
synth_all(write_out=False)

Recursively synthesizes all the tables in the dataset that have been identified as able to be synthesized. Samples the synthesized data for the same number of rows as in existing data and optionally writes out to csv.

Parameters:write_out (boolean) – whether or not to write synthesized data to csv
synth_data_to_csv(table_name, out_name=None)

Writes the synthetic data of the desired table to csv

Parameters:
  • table_name (str) – name of the table of interest.
  • table_name – name of the file to write to.
synth_rows(table_name, num_rows)
synth_table(table_name, write_out=False)

Synthesizes the specified table by recursively synthesizing any parent tables. If no parents, synthesizes table from existing data. Samples the synthesized data for the same number of rows as in existing data and optionally writes out to csv.

Parameters:
  • table_name (str) – name of the table to-be synthesized
  • write_out (boolean) – whether or not to write synthesized data to csv
sdv.convert_dates(df, name, format=None)
sdv.main()