gpytorchwrapper.src.data.data_splitter
Functions
|
|
|
Split the data into input and output |
|
Split the data using k-fold cross-validation |
|
Split the data into training and test sets. |
|
Split the data into training and test sets using stratified shuffle split |
|
Write out the average kfold results to a text file. |
- gpytorchwrapper.src.data.data_splitter.input_output_split(data: DataFrame, data_conf: DataConf) tuple[DataFrame, DataFrame] [source]
Split the data into input and output
- Parameters:
data (pd.DataFrame) – The data to be split
data_conf (DataConf) – dataclass containing the data specifications
- Returns:
x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
- gpytorchwrapper.src.data.data_splitter.k_fold_split(x: DataFrame, y: DataFrame, training_conf: TrainingConf, transform_conf: TransformConf, directory: Path, split_size: float = 0.2) None [source]
Split the data using k-fold cross-validation
- Parameters:
x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
training_conf (dict) – Dictionary containing the training specifications
transform_conf (dict) – Dictionary containing the transformer specifications
directory (pathlib.Path) – The output directory
split_size (float) – The size of the test set
- Return type:
None
- gpytorchwrapper.src.data.data_splitter.split_data(x: DataFrame, y: DataFrame, data_conf: DataConf, transform_conf: TransformConf, training_conf: TrainingConf, testing_conf: TestingConf, directory: Path) None | tuple[DataFrame, DataFrame, DataFrame, DataFrame] | tuple[DataFrame, None, DataFrame, None] [source]
Split the data into training and test sets. If neither kFold nor stratified shuffle split is selected, perform a random split.
- Parameters:
x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
data_conf (DataConf) – Dictionary containing the data specifications
transform_conf (TransformConf) – Dictionary containing the transformer specifications
training_conf (TrainingConf) – Dictionary containing the training specifications
testing_conf (TestingConf) – Dictionary containing the testing specifications
directory (pathlib.Path) – The output directory
- Returns:
None is returned if kfold testing is done, otherwise a tuple of training DataFrames and optional test DataFrames
- Return type:
None or tuple of optional DataFrames
- gpytorchwrapper.src.data.data_splitter.stratified_shuffle_split(x: DataFrame, y: DataFrame, n_bins: int | None = 5, test_size: float = 0.2) tuple[DataFrame, DataFrame, DataFrame, DataFrame] [source]
Split the data into training and test sets using stratified shuffle split
- Parameters:
x (pd.DataFrame) – The input data
y (pd.DataFrame) – The output data
n_bins (int) – The number of bins to split the data into
test_size (float) – The size of the test set
- Returns:
train_x (pd.DataFrame) – The input training set
train_y (pd.DataFrame) – The output training set
test_x (pd.DataFrame) – The input test set
test_y (pd.DataFrame) – The output test set
- gpytorchwrapper.src.data.data_splitter.write_kfold_results(kfold_results: dict, out_dir: Path) None [source]
Write out the average kfold results to a text file.
- Parameters:
kfold_results (dict) – dictionary containing average kfold training and test errors and the r2 correlation for the test set
out_dir (Path) – path the the directory where results are saved
- Return type:
None