EL

class almos.el.el(**kwargs)

Class containing all the functions from the active almos module

exploratory_learning_process()

Main function for the exploratory learning process, including: - Reading and concatenating predictions with the raw data. - Splitting data into experimental and prediction sets. - Calculating quartiles and assigning points for exploration and exploitation. - Updating the dataset and saving results into organized batch folders.

This process manages both exploration and exploitation of data for an exploratory learning cycle.

finalize_process(start_time_overall)

Stop the timer, calculate the total time taken and move the .dat file to the proper batch folder.

generate_plots(results_plot_no_pfi_df, results_plot_pfi_df)

Generates and saves subplots for each model type (no_PFI and PFI) and logs a confirmation message upon successful completion.

Parameters

results_plot_no_pfi_dfpd.DataFrame

DataFrame containing the results for the 'no_PFI' model.

results_plot_pfi_dfpd.DataFrame

DataFrame containing the results for the 'PFI' model.

run_robert_process()

Executes the full ROBERT model update and prediction process.

This method performs the following steps: - Initializes a logger to record process details and parameters. - Filters the input data to create a CSV file for updating the ROBERT model. - Creates necessary directories and moves files as required. - Runs the ROBERT model update command, logging all output and errors. - Checks for successful generation of the model report. - Runs the prediction command to generate new predictions with the updated model. - Verifies that predictions were successfully created and logs the result.

Raises:

SystemExit: Exits the program if any step fails or if required files are not found.

Parameters

elbool

Indicates whether exploratory learning process is enabled and should be performed. Defaults to "False". This parameter is activated in command line (i.e. --el)

csv_namestr

Name of the CSV file containing the database. (i.e. 'FILE.csv').

ystr

Name of the column containing the response variable in the input CSV file (i.e. 'solubility').

namestr

Name of the column containing the molecule names in the input CSV file (i.e. 'names').

ignorelist, default=[]

List containing the columns of the input CSV file that will be ignored during the ROBERT process (i.e. --ignore "[name,SMILES]"). The descriptors will be included in the final CSV file. The y value, name column and batch column are automatically ignored by ROBERT.

explore_rtfloat, default= 1

Specifies the exploration ratio for the exploratory learning process, determining how many points to explore in relation to the total number of experiments. (i.e. '--explore_rt 0.5') If not provided or invalid, the program will request the values in the proper format.

n_expsint,

Number of experiments to be selected in the exploratory learning process for the new batch. (i.e. '--n_exps 10') If not provided or invalid, the program will request the values in the proper format.

tolerancestr, default='medium'

Indicates the tolerance level for the convergence process, defining the percentage change threshold required for convergence. Options: 1. 'tight': Strictest level, convergence occurs if the metric improves by ≤1% (threshold = 0.01). 2. 'medium': Balanced level, convergence occurs if the metric improves by ≤5% (threshold = 0.05). 3. 'wide': Least strict, convergence occurs if the metric improves by ≤10% (threshold = 0.10). (i.e. '--tolerance tight')

robert_keywordsstr, default=""

Additional keywords to be passed to the ROBERT model generation (i.e. --robert_keywords "--model RF --train [70] --seed [0]")

reversebool, default=False

If set to True, the order of the points in the new batch is reversed, prioritizing in exploitation lower values (i.e. --reverse ).

intelexbool, default=False

If set to True, the program will not need module scikit-learn-intelex to speed up the model update process.