Stack Overflow Machine Learning Tag
2026-05-26 19:15 UTC
Score 21.0
AI-112-20260526-social-media-c6705e08
Full article
I am currently working with PyCaret 3.4.0, since 4.0 lacks some configuration parameters that are useful for my case. I tried to replicate PyCaret results using scikit-learn. This is my script, after running Pycaret's setup and obtaining the transformed data: compare_models(include=['rf'],cross_validation=True) scoring=['accuracy','precision','recall','f1','f1_macro','f1_weighted','f1_micro', 'roc_auc'] rfc = RandomForestClassifier(random_state=42, n_jobs=-1) scores = cross_validate(rfc, xtrain_trans, ytrain_trans, scoring=scoring, cv=cv) compare_models return these results: Model Accuracy AUC Recall Prec. F1 Kappa MCC TT (Sec) rf Random Forest Classifier 0.7164 0.7617 0.7164 0.7254 0.7137 0.4329 0.4416 0.25 but I get these results from sklearn: Accuracy=0.6948717948717947 AUC=0.7411665257819103 Recall=0.6908791208791208 Precission=0.7005056185644422 F1=0.6867142420587947 F1_macro=0.6910345427878428 F1_weighted=0.6911863570749788 F1_micro=0.6948717948717947 Just to clarify, I am using the exactly same CV splitter on both PyCaret and sklearn. cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=10, random_state=42) The setup used was: setup(xtrain, target = 'Group', session_id=42, test_data=xtest, #None default imputation_type=None, remove_multicollinearity=True, multicollinearity_threshold=0.70, remove_outliers=True, transformation=True, transformation_method='quantile', normalize=True, feature_selection=False, fold_strategy=cv, use_gpu=True) I know the differences are small.…