table module

class table.NewTable(name, column_data, PK, raw, children, relations)

Bases: table.Table

write_data(summary_filename)

Write the summary information to the shelve file

Parameters:summary_filename (str) – the file to write to
class table.RecoveredTable(name, column_data, PK, children, relations, summary)

Bases: table.Table

class table.Table(name, column_data, PK)

Bases: object

add_synth_data(data)
clear_synth_data()

Remove all the synthetic data we have saved

format_synth_data()

Take self.synth_data and format it to mimic the actual data

Returns:pandas.DataFrame
Raises:Custom Exception if synthetic data doesn’t exist
get_child_ct(pkey, child_col_name, child_abs, df)
get_child_summary(pkey, child_col_name, child_abs, df)

If pkey is a parent row for child_abs table, find and return the distribution of its child rows.

preview_synth_data()

Previews 5 rows of the synthetic data

Returns:pandas.DataFrame
Raises:Custom Exception if synthethic data doesn’t exist
sample_PK(child_name, child_col_name)
sample_synth_data()

Sample the synthetic data to only keep the number that was in the original dataset. Also use the same primary keys as the original dataset

set_parents(parents, mappings)
summarize_rows(rows, ret_argslist=True)

Find the relevant summary fields for a given subset of data (rows)

synth_children()

Synthesize children rows for each row in this parent. i.e. synthesize the complete child table for this one

If synthetic data for the parent exists, then generate children of the synthetic data.

synth_row(**kwargs)

Synthesize a row based on some observed values

If synthetic data for the parent exists, then generate row based its synthetic data.

synth_rows(num_rows=1)

Synthesize N rows, either from synthetic data (if exists)

Parameters:num_rows (int) – number of rows to synthesize
to_csv(pathname)

Write either self.synth_data to the pathname specified

Parameters:pathname (str) – where to write the csv to