gpytorchwrapper.src.data.data_splitter

Functions

calculate_kfold_results(kfold_data)

exit_program()

input_output_split(data, data_conf)

Split the data into input and output

k_fold_split(x, y, training_conf, ...[, ...])

Split the data using k-fold cross-validation

split_data(x, y, data_conf, transform_conf, ...)

Split the data into training and test sets.

stratified_shuffle_split(x, y[, n_bins, ...])

Split the data into training and test sets using stratified shuffle split

write_kfold_results(kfold_results, out_dir)

Write out the average kfold results to a text file.

gpytorchwrapper.src.data.data_splitter.calculate_kfold_results(kfold_data)[source]
gpytorchwrapper.src.data.data_splitter.exit_program()[source]
gpytorchwrapper.src.data.data_splitter.input_output_split(data: DataFrame, data_conf: DataConf) tuple[DataFrame, DataFrame][source]

Split the data into input and output

Parameters:
  • data (pd.DataFrame) – The data to be split

  • data_conf (DataConf) – dataclass containing the data specifications

Returns:

  • x (pd.DataFrame) – The input data

  • y (pd.DataFrame) – The output data

gpytorchwrapper.src.data.data_splitter.k_fold_split(x: DataFrame, y: DataFrame, training_conf: TrainingConf, transform_conf: TransformConf, directory: Path, split_size: float = 0.2) None[source]

Split the data using k-fold cross-validation

Parameters:
  • x (pd.DataFrame) – The input data

  • y (pd.DataFrame) – The output data

  • training_conf (dict) – Dictionary containing the training specifications

  • transform_conf (dict) – Dictionary containing the transformer specifications

  • directory (pathlib.Path) – The output directory

  • split_size (float) – The size of the test set

Return type:

None

gpytorchwrapper.src.data.data_splitter.split_data(x: DataFrame, y: DataFrame, data_conf: DataConf, transform_conf: TransformConf, training_conf: TrainingConf, testing_conf: TestingConf, directory: Path) None | tuple[DataFrame, DataFrame, DataFrame, DataFrame] | tuple[DataFrame, None, DataFrame, None][source]

Split the data into training and test sets. If neither kFold nor stratified shuffle split is selected, perform a random split.

Parameters:
  • x (pd.DataFrame) – The input data

  • y (pd.DataFrame) – The output data

  • data_conf (DataConf) – Dictionary containing the data specifications

  • transform_conf (TransformConf) – Dictionary containing the transformer specifications

  • training_conf (TrainingConf) – Dictionary containing the training specifications

  • testing_conf (TestingConf) – Dictionary containing the testing specifications

  • directory (pathlib.Path) – The output directory

Returns:

None is returned if kfold testing is done, otherwise a tuple of training DataFrames and optional test DataFrames

Return type:

None or tuple of optional DataFrames

gpytorchwrapper.src.data.data_splitter.stratified_shuffle_split(x: DataFrame, y: DataFrame, n_bins: int | None = 5, test_size: float = 0.2) tuple[DataFrame, DataFrame, DataFrame, DataFrame][source]

Split the data into training and test sets using stratified shuffle split

Parameters:
  • x (pd.DataFrame) – The input data

  • y (pd.DataFrame) – The output data

  • n_bins (int) – The number of bins to split the data into

  • test_size (float) – The size of the test set

Returns:

  • train_x (pd.DataFrame) – The input training set

  • train_y (pd.DataFrame) – The output training set

  • test_x (pd.DataFrame) – The input test set

  • test_y (pd.DataFrame) – The output test set

gpytorchwrapper.src.data.data_splitter.write_kfold_results(kfold_results: dict, out_dir: Path) None[source]

Write out the average kfold results to a text file.

Parameters:
  • kfold_results (dict) – dictionary containing average kfold training and test errors and the r2 correlation for the test set

  • out_dir (Path) – path the the directory where results are saved

Return type:

None