I appreciate your time in this matter. I really need an answer for this, for my thesis. I use the PaySim dataset from Kaggle (https://www.kaggle.com/datasets/ealaxi/paysim1).

First of all, I use training, validation, and testing set.

- Is there really a rule on data split ratio and is it acceptable if I check the model performance on each split, for example, 70/15/15, 80/10/10?

- After fitting the model with training dataset, we get the default model/fitted model. Which dataset (Training or Validation set) shall I use for examining the model performance?

My intention of having 3 types of set (Training, Validation, and Testing) is to use the Validation set as hyperparameter tuning examination.

Thanks so much.