table module¶
-
class
table.
NewTable
(name, column_data, PK, raw, children, relations)¶ Bases:
table.Table
-
write_data
(summary_filename)¶ Write the summary information to the shelve file
Parameters: summary_filename (str) – the file to write to
-
-
class
table.
RecoveredTable
(name, column_data, PK, children, relations, summary)¶ Bases:
table.Table
-
class
table.
Table
(name, column_data, PK)¶ Bases:
object
-
add_synth_data
(data)¶
-
clear_synth_data
()¶ Remove all the synthetic data we have saved
-
format_synth_data
()¶ Take self.synth_data and format it to mimic the actual data
Returns: pandas.DataFrame Raises: Custom Exception if synthetic data doesn’t exist
-
get_child_ct
(pkey, child_col_name, child_abs, df)¶
-
get_child_summary
(pkey, child_col_name, child_abs, df)¶ If pkey is a parent row for child_abs table, find and return the distribution of its child rows.
-
preview_synth_data
()¶ Previews 5 rows of the synthetic data
Returns: pandas.DataFrame Raises: Custom Exception if synthethic data doesn’t exist
-
sample_PK
(child_name, child_col_name)¶
-
sample_synth_data
()¶ Sample the synthetic data to only keep the number that was in the original dataset. Also use the same primary keys as the original dataset
-
set_parents
(parents, mappings)¶
-
summarize_rows
(rows, ret_argslist=True)¶ Find the relevant summary fields for a given subset of data (rows)
-
synth_children
()¶ Synthesize children rows for each row in this parent. i.e. synthesize the complete child table for this one
If synthetic data for the parent exists, then generate children of the synthetic data.
-
synth_row
(**kwargs)¶ Synthesize a row based on some observed values
If synthetic data for the parent exists, then generate row based its synthetic data.
-
synth_rows
(num_rows=1)¶ Synthesize N rows, either from synthetic data (if exists)
Parameters: num_rows (int) – number of rows to synthesize
-
to_csv
(pathname)¶ Write either self.synth_data to the pathname specified
Parameters: pathname (str) – where to write the csv to
-