atm.core module¶
Core ATM module.
This module contains the ATM class, which is the one responsible for executing and orchestrating the main ATM functionalities.
Classes
| 
 | 
- 
class atm.core.ATM(dialect='sqlite', database='atm.db', username=None, password=None, host=None, port=None, query=None, access_key=None, secret_key=None, s3_bucket=None, s3_folder=None, models_dir='models', metrics_dir='metrics', verbose_metrics=False)[source]¶
- Bases: - object- Methods - add_datarun(dataset_id[, budget, …])- Register one or more Dataruns to the Database. - add_dataset(train_path[, test_path, name, …])- Add a new dataset to the Database. - load_model(classifier_id)- Load a Model from the Database. - run(train_path[, test_path, name, …])- Create a Dataset and a Datarun and then work on it. - work([datarun_ids, save_files, …])- Get unfinished Dataruns from the database and work on them. - 
add_datarun(dataset_id, budget=100, budget_type='classifier', gridding=0, k_window=3, metric='f1', methods=['logreg', 'dt', 'knn'], r_minimum=2, run_per_partition=False, score_target='cv', priority=1, selector='uniform', tuner='uniform', deadline=None)[source]¶
- Register one or more Dataruns to the Database. - The methods hyperparameters will be analyzed and Hyperpartitions generated from them. If - run_per_partitionis- True, one Datarun will be created for each Hyperpartition. Otherwise, a single one will be created for all of them.- Parameters
- dataset_id (int) – Id of the Dataset which this Datarun will belong to. 
- budget (int) – Budget amount. Optional. Defaults to - 100.
- budget_type (str) – Budget Type. Can be ‘classifier’ or ‘walltime’. Optional. Defaults to - 'classifier'.
- gridding (int) – - griddingsetting for the Tuner. Optional. Defaults to- 0.
- k_window (int) – - ksetting for the Selector. Optional. Defaults to- 3.
- metric (str) – Metric to use for the tuning and selection. Optional. Defaults to - 'f1'.
- methods (list) – List of methods to try. Optional. Defaults to - ['logreg', 'dt', 'knn'].
- r_minimum (int) – - r_minimumsetting for the Tuner. Optional. Defaults to- 2.
- run_per_partition (bool) – whether to create a separated Datarun for each Hyperpartition or not. Optional. Defaults to - False.
- score_target (str) – Which score to use for the tuning and selection process. It can be - 'cv'or- 'test'. Optional. Defaults to- 'cv'.
- priority (int) – Priority of this Datarun. The higher the better. Optional. Defaults to - 1.
- selector (str) – Type of selector to use. Optional. Defaults to - 'uniform'.
- tuner (str) – Type of tuner to use. Optional. Defaults to - 'uniform'.
- deadline (str) – Time deadline. It must be a string representing a datetime in the format - '%Y-%m-%d %H:%M'. If given,- budget_typewill be set to- 'walltime'.
 
- Returns
- The created Datarun or list of Dataruns. 
- Return type
- Datarun 
 
 - 
add_dataset(train_path, test_path=None, name=None, description=None, class_column=None)[source]¶
- Add a new dataset to the Database. - Parameters
- train_path (str) – Path to the training CSV file. It can be a local filesystem path, absolute or relative, or an HTTP or HTTPS URL, or an S3 path in the format - s3://{bucket_name}/{key}. Required.
- test_path (str) – Path to the testing CSV file. It can be a local filesystem path, absolute or relative, or an HTTP or HTTPS URL, or an S3 path in the format - s3://{bucket_name}/{key}. Optional. If not given, the training CSV will be split in two parts, train and test.
- name (str) – Name given to this dataset. Optional. If not given, a hash will be generated from the training_path and used as the Dataset name. 
- description (str) – Human friendly description of the Dataset. Optional. 
- class_column (str) – Name of the column that will be used as the target variable. Optional. Defaults to - 'class'.
 
- Returns
- The created dataset. 
- Return type
- Dataset 
 
 - 
load_model(classifier_id)[source]¶
- Load a Model from the Database. - Parameters
- classifier_id (int) – Id of the Model to load. 
- Returns
- The loaded model instance. 
- Return type
 
 - 
run(train_path, test_path=None, name=None, description=None, class_column='class', budget=100, budget_type='classifier', gridding=0, k_window=3, metric='f1', methods=['logreg', 'dt', 'knn'], r_minimum=2, run_per_partition=False, score_target='cv', selector='uniform', tuner='uniform', deadline=None, priority=1, save_files=True, choose_randomly=True, cloud_mode=False, total_time=None, verbose=True)[source]¶
- Create a Dataset and a Datarun and then work on it. - Parameters
- train_path (str) – Path to the training CSV file. It can be a local filesystem path, absolute or relative, or an HTTP or HTTPS URL, or an S3 path in the format - s3://{bucket_name}/{key}. Required.
- test_path (str) – Path to the testing CSV file. It can be a local filesystem path, absolute or relative, or an HTTP or HTTPS URL, or an S3 path in the format - s3://{bucket_name}/{key}. Optional. If not given, the training CSV will be split in two parts, train and test.
- name (str) – Name given to this dataset. Optional. If not given, a hash will be generated from the training_path and used as the Dataset name. 
- description (str) – Human friendly description of the Dataset. Optional. 
- class_column (str) – Name of the column that will be used as the target variable. Optional. Defaults to - 'class'.
- budget (int) – Budget amount. Optional. Defaults to - 100.
- budget_type (str) – Budget Type. Can be ‘classifier’ or ‘walltime’. Optional. Defaults to - 'classifier'.
- gridding (int) – - griddingsetting for the Tuner. Optional. Defaults to- 0.
- k_window (int) – - ksetting for the Selector. Optional. Defaults to- 3.
- metric (str) – Metric to use for the tuning and selection. Optional. Defaults to - 'f1'.
- methods (list) – List of methods to try. Optional. Defaults to - ['logreg', 'dt', 'knn'].
- r_minimum (int) – - r_minimumsetting for the Tuner. Optional. Defaults to- 2.
- run_per_partition (bool) – whether to create a separated Datarun for each Hyperpartition or not. Optional. Defaults to - False.
- score_target (str) – Which score to use for the tuning and selection process. It can be - 'cv'or- 'test'. Optional. Defaults to- 'cv'.
- priority (int) – Priority of this Datarun. The higher the better. Optional. Defaults to - 1.
- selector (str) – Type of selector to use. Optional. Defaults to - 'uniform'.
- tuner (str) – Type of tuner to use. Optional. Defaults to - 'uniform'.
- deadline (str) – Time deadline. It must be a string representing a datetime in the format - '%Y-%m-%d %H:%M'. If given,- budget_typewill be set to- 'walltime'.
- verbose (bool) – Whether to be verbose about the process. Optional. Defaults to - True.
 
- Returns
- The created Datarun or list of Dataruns. 
- Return type
- Datarun 
 
 - 
work(datarun_ids=None, save_files=True, choose_randomly=True, cloud_mode=False, total_time=None, wait=True, verbose=False)[source]¶
- Get unfinished Dataruns from the database and work on them. - Check the ModelHub Database for unfinished Dataruns, and work on them as they are added. This process will continue to run until it exceeds total_time or there are no more Dataruns to process or it is killed. - Parameters
- datarun_ids (list) – list of IDs of Dataruns to work on. If - None, this will work on any unfinished Dataruns found in the database. Optional. Defaults to- None.
- save_files (bool) – Whether to save the fitted classifiers and their metrics or not. Optional. Defaults to True. 
- choose_randomly (bool) – If - True, work on all the highest-priority dataruns in random order. Otherwise, work on them in sequential order (by ID). Optional. Defaults to- True.
- cloud_mode (bool) – Save the models and metrics in AWS S3 instead of locally. This option works only if S3 configuration has been provided on initialization. Optional. Defaults to - False.
- total_time (int) – Total time to run the work process, in seconds. If - None, continue to run until interrupted or there are no more Dataruns to process. Optional. Defaults to- None.
- wait (bool) – If - True, wait for more Dataruns to be inserted into the Database once all have been processed. Otherwise, exit the worker loop when they run out. Optional. Defaults to- False.
- verbose (bool) – Whether to be verbose about the process. Optional. Defaults to - True.
 
 
 
-