I think what you are looking for is something like crepes, which seems to do exactly what you are asking (providing p-values). I stumbled upon this while looking for a method to calibrate the models (i.e., fitting a spline to the outputs).

The code below provides what you ask for on a sklearn RandomForestClassifier:

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from crepes import WrapClassifier
from sklearn.ensemble import RandomForestClassifier
    
dataset = fetch_openml(name="qsar-biodeg", parser="auto")
    
X = dataset.data.values.astype(float)
y = dataset.target.values
    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
X_prop_train, X_cal, y_prop_train, y_cal = train_test_split(X_train, y_train, test_size=0.25)
    
rf = WrapClassifier(RandomForestClassifier(n_jobs=-1))
rf.fit(X_prop_train, y_prop_train)
rf.calibrate(X_cal, y_cal)
rf.predict_p(X_test)

Notice that we need to split the data into three to make sure the calibration is performed separately. Regarding other implementation (tf/keras - pytorch), I don't know if this is compatible - I think not. I have also found a Venn-Abers implementation that doesn't seem to need access to the model.