EL
- class almos.el.el(**kwargs)
Class containing all the functions from the active almos module
- exploratory_learning_process()
Main function for the exploratory learning process, including: - Reading and concatenating predictions with the raw data. - Splitting data into experimental and prediction sets. - Calculating quartiles and assigning points for exploration and exploitation. - Updating the dataset and saving results into organized batch folders.
This process manages both exploration and exploitation of data for an exploratory learning cycle.
- finalize_process(start_time_overall)
Stop the timer, calculate the total time taken and move the .dat file to the proper batch folder.
- generate_plots(results_plot_no_pfi_df, results_plot_pfi_df)
Generates and saves subplots for each model type (no_PFI and PFI) and logs a confirmation message upon successful completion.
Parameters
- results_plot_no_pfi_dfpd.DataFrame
DataFrame containing the results for the 'no_PFI' model.
- results_plot_pfi_dfpd.DataFrame
DataFrame containing the results for the 'PFI' model.
- run_robert_process()
Executes the full ROBERT model update and prediction process.
This method performs the following steps: - Initializes a logger to record process details and parameters. - Filters the input data to create a CSV file for updating the ROBERT model. - Creates necessary directories and moves files as required. - Runs the ROBERT model update command, logging all output and errors. - Checks for successful generation of the model report. - Runs the prediction command to generate new predictions with the updated model. - Verifies that predictions were successfully created and logs the result.
- Raises:
SystemExit: Exits the program if any step fails or if required files are not found.
Parameters
- elbool
Indicates whether exploratory learning process is enabled and should be performed. Defaults to "False". This parameter is activated in command line (i.e. --el)
- csv_namestr
Name of the CSV file containing the database. (i.e. 'FILE.csv').
- ystr
Name of the column containing the response variable in the input CSV file (i.e. 'solubility').
- namestr
Name of the column containing the molecule names in the input CSV file (i.e. 'names').
- ignorelist, default=[]
List containing the columns of the input CSV file that will be ignored during the ROBERT process (i.e. --ignore "[name,SMILES]"). The descriptors will be included in the final CSV file. The y value, name column and batch column are automatically ignored by ROBERT.
- explore_rtfloat, default= 1
Specifies the exploration ratio for the exploratory learning process, determining how many points to explore in relation to the total number of experiments. (i.e. '--explore_rt 0.5') If not provided or invalid, the program will request the values in the proper format.
- n_expsint,
Number of experiments to be selected in the exploratory learning process for the new batch. (i.e. '--n_exps 10') If not provided or invalid, the program will request the values in the proper format.
- tolerancestr, default='medium'
Indicates the tolerance level for the convergence process, defining the percentage change threshold required for convergence. Options: 1. 'tight': Strictest level, convergence occurs if the metric improves by ≤1% (threshold = 0.01). 2. 'medium': Balanced level, convergence occurs if the metric improves by ≤5% (threshold = 0.05). 3. 'wide': Least strict, convergence occurs if the metric improves by ≤10% (threshold = 0.10). (i.e. '--tolerance tight')
- robert_keywordsstr, default=""
Additional keywords to be passed to the ROBERT model generation (i.e. --robert_keywords "--model RF --train [70] --seed [0]")
- reversebool, default=False
If set to True, the order of the points in the new batch is reversed, prioritizing in exploitation lower values (i.e. --reverse ).
- intelexbool, default=False
If set to True, the program will not need module scikit-learn-intelex to speed up the model update process.