I am currently working with PyCaret 3.4.0, since 4.0 lacks some configuration parameters that are useful for my case.
I tried to replicate PyCaret results using scikit-learn.

This is my script, after running Pycaret's setup and obtaining the transformed data:

compare_models(include=['rf'],cross_validation=True)

scoring=['accuracy','precision','recall','f1','f1_macro','f1_weighted','f1_micro', 'roc_auc']
rfc = RandomForestClassifier(random_state=42, n_jobs=-1)
scores = cross_validate(rfc, xtrain_trans, ytrain_trans, scoring=scoring, cv=cv)

compare_models return these results:

    Model   Accuracy    AUC Recall  Prec.   F1  Kappa   MCC TT (Sec)
rf  Random Forest Classifier    0.7164  0.7617  0.7164  0.7254  0.7137  0.4329  0.4416  0.25

but I get these results from sklearn:

  • Accuracy=0.6948717948717947

  • AUC=0.7411665257819103

  • Recall=0.6908791208791208

  • Precission=0.7005056185644422

  • F1=0.6867142420587947

  • F1_macro=0.6910345427878428

  • F1_weighted=0.6911863570749788

  • F1_micro=0.6948717948717947

Just to clarify, I am using the exactly same CV splitter on both PyCaret and sklearn.

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=10, random_state=42)

The setup used was:

setup(xtrain, 
            target = 'Group',
            session_id=42,
            test_data=xtest, #None default
            imputation_type=None, 
            remove_multicollinearity=True,
            multicollinearity_threshold=0.70,
            remove_outliers=True,
            transformation=True,
            transformation_method='quantile',
            normalize=True,
            feature_selection=False,
            fold_strategy=cv,
            use_gpu=True)

I know the differences are small. But I would like to use RFECV to reduce the number of features. The problem is that if I perform RFECV, which I must perform using sklearn since it is not implemented in PyCaret, results are not comparable. After performing RFECV, the best average score I get (F1=0.70, reduced number of features) is smaller than the average score I get with PyCaret (F1=0.7137, all features), but higher than the average score I get with sklearn (F1=0.6867). So now I'm not sure how to move forward, and if I should use the reduced set.

REPRODUCIBLE EXAMPLE:

from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_validate
from sklearn.ensemble import RandomForestClassifier

cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=10, random_state=42)

from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=5, n_informative=2, n_classes=2, flip_y=0.2, random_state=42)
dataset = pd.DataFrame(X)
dataset.columns = ['X1', 'X2', 'X3', 'X4', 'X5']
dataset['y'] = y
setup0 = setup(dataset,target = 'y',session_id=42,train_size=0.7,imputation_type=None,remove_multicollinearity=True,multicollinearity_threshold=0.70,remove_outliers=True,transformation=True,transformation_method='quantile',normalize=True,feature_selection=False,fold_strategy=cv,use_gpu=True)

# CV average results from PyCaret
compare_models(include=['rf'],sort='F1',errors='raise',cross_validation=True)

xtrain_trans = setup0.get_config('X_train_transformed')
ytrain_trans = setup0.get_config('y_train_transformed')
scoring=['accuracy', 'roc_auc', 'recall', 'precision', 'f1','f1_macro','f1_weighted','f1_micro']
rfc = RandomForestClassifier(random_state=42, n_jobs=-1)
scores = cross_validate(rfc, xtrain_trans, ytrain_trans, scoring=scoring, cv=cv)
scores=pd.DataFrame(scores)
scores=scores.mean().round(4)

# CV average results from sklearn
pd.DataFrame(scores).transpose()