table module¶

class table.NewTable(name, column_data, PK, raw, children, relations)¶

write_data(summary_filename)¶

Write the summary information to the shelve file

Parameters:	summary_filename (str) – the file to write to

class table.RecoveredTable(name, column_data, PK, children, relations, summary)¶: Bases: table.Table

class table.Table(name, column_data, PK)¶

Bases: object

format_synth_data()¶

Take self.synth_data and format it to mimic the actual data

Returns:	pandas.DataFrame
Raises:	Custom Exception if synthetic data doesn’t exist

get_child_summary(pkey, child_col_name, child_abs, df)¶: If pkey is a parent row for child_abs table, find and return the distribution of its child rows.

preview_synth_data()¶

Previews 5 rows of the synthetic data

Returns:	pandas.DataFrame
Raises:	Custom Exception if synthethic data doesn’t exist

sample_synth_data()¶: Sample the synthetic data to only keep the number that was in the original dataset. Also use the same primary keys as the original dataset

summarize_rows(rows, ret_argslist=True)¶: Find the relevant summary fields for a given subset of data (rows)

synth_children()¶

Synthesize children rows for each row in this parent. i.e. synthesize the complete child table for this one

If synthetic data for the parent exists, then generate children of the synthetic data.

synth_row(**kwargs)¶

Synthesize a row based on some observed values

If synthetic data for the parent exists, then generate row based its synthetic data.

synth_rows(num_rows=1)¶

Synthesize N rows, either from synthetic data (if exists)

Parameters:	num_rows (int) – number of rows to synthesize

to_csv(pathname)¶

Write either self.synth_data to the pathname specified

Parameters:	pathname (str) – where to write the csv to