gpytorchwrapper.src.data.data_splitter

Functions

`calculate_kfold_results`(kfold_data)
`exit_program`()
`input_output_split`(data, data_conf)	Split the data into input and output
`k_fold_split`(x, y, training_conf, ...[, ...])	Split the data using k-fold cross-validation
`split_data`(x, y, data_conf, transform_conf, ...)	Split the data into training and test sets.
`stratified_shuffle_split`(x, y[, n_bins, ...])	Split the data into training and test sets using stratified shuffle split
`write_kfold_results`(kfold_results, out_dir)	Write out the average kfold results to a text file.

gpytorchwrapper.src.data.data_splitter.calculate_kfold_results(kfold_data)[source]

gpytorchwrapper.src.data.data_splitter.exit_program()[source]

gpytorchwrapper.src.data.data_splitter.input_output_split(data: DataFrame, data_conf: DataConf) → tuple[DataFrame, DataFrame][source]

Split the data into input and output

Parameters:

data (pd.DataFrame) – The data to be split
data_conf (DataConf) – dataclass containing the data specifications

Returns:

x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data

gpytorchwrapper.src.data.data_splitter.k_fold_split(x: DataFrame, y: DataFrame, training_conf: TrainingConf, transform_conf: TransformConf, directory: Path, split_size: float = 0.2) → None[source]

Split the data using k-fold cross-validation

Parameters:

x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
training_conf (dict) – Dictionary containing the training specifications
transform_conf (dict) – Dictionary containing the transformer specifications
directory (pathlib.Path) – The output directory
split_size (float) – The size of the test set

Return type:

None

gpytorchwrapper.src.data.data_splitter.split_data(x: DataFrame, y: DataFrame, data_conf: DataConf, transform_conf: TransformConf, training_conf: TrainingConf, testing_conf: TestingConf, directory: Path) → None | tuple[DataFrame, DataFrame, DataFrame, DataFrame] | tuple[DataFrame, None, DataFrame, None][source]

Split the data into training and test sets. If neither kFold nor stratified shuffle split is selected, perform a random split.

Parameters:

x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
data_conf (DataConf) – Dictionary containing the data specifications
transform_conf (TransformConf) – Dictionary containing the transformer specifications
training_conf (TrainingConf) – Dictionary containing the training specifications
testing_conf (TestingConf) – Dictionary containing the testing specifications
directory (pathlib.Path) – The output directory

Returns:

None is returned if kfold testing is done, otherwise a tuple of training DataFrames and optional test DataFrames

Return type:

None or tuple of optional DataFrames

gpytorchwrapper.src.data.data_splitter.stratified_shuffle_split(x: DataFrame, y: DataFrame, n_bins: int | None = 5, test_size: float = 0.2) → tuple[DataFrame, DataFrame, DataFrame, DataFrame][source]

Split the data into training and test sets using stratified shuffle split

Parameters:

x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
n_bins (int) – The number of bins to split the data into
test_size (float) – The size of the test set

Returns:

train_x (pd.DataFrame) – The input training set
train_y (pd.DataFrame) – The output training set
test_x (pd.DataFrame) – The input test set
test_y (pd.DataFrame) – The output test set

gpytorchwrapper.src.data.data_splitter.write_kfold_results(kfold_results: dict, out_dir: Path) → None[source]

Write out the average kfold results to a text file.

Parameters:

kfold_results (dict) – dictionary containing average kfold training and test errors and the r2 correlation for the test set
out_dir (Path) – path the the directory where results are saved

Return type:

None