AI/ML News & Innovations Hub

I appreciate your time in this matter. I really need an answer for this, for my thesis. I use the PaySim dataset from Kaggle (https://www.kaggle.com/datasets/ealaxi/paysim1).

First of all, I use training, validation, and testing set.

- Is there really a rule on data split ratio and is it acceptable if I check the model performance on each split, for example, 70/15/15, 80/10/10?

- After fitting the model with training dataset, we get the default model/fitted model. Which dataset (Training or Validation set) shall I use for examining the model performance?

My intention of having 3 types of set (Training, Validation, and Testing) is to use the Validation set as hyperparameter tuning examination.

Thanks so much.

Proper way to split dataset (split ratio) and evaluate the baseline model or fitted model in the firsthand