A key element of a trustworthy model is that it can give an estimate of its confidence in a given prediction. We've already talked about one way to do this for linear models, and today we'll talk about a technique for getting uncertainty estimates for any model.
Let's continue using the fish dataset from last time:
import os
import pandas as pd
fish = pd.read_csv(os.path.expanduser("~/Downloads/Fish.csv"))
We build a ColumnTransformer for convenience:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
ct = ColumnTransformer(
[
("scale", StandardScaler(), ["Length1", "Length2", "Length3", "Height", "Width"]),
("ohe", OneHotEncoder(), ["Species"]),
]
)
Next we construct a pipeline which uses the ColumnTransformer from above as
well as scikit-learn's implementation of bagging. Specifically, our
BaggingRegressor will consist of 100 ElasticNet models, each one trained on a
random 25% of the dataset (with replacement).
from sklearn.ensemble import BaggingRegressor
import sklearn.linear_model as lm
pipe = make_pipeline(
ct, BaggingRegressor(lm.ElasticNetCV(), n_estimators=100, max_samples=0.25, random_state=42, n_jobs=-1,)
)
pipe.fit(fish, fish["Weight"])
Finally, we can snag those 100 models and make a prediction for a new fish:
from sklearn.ensemble import BaggingRegressor
import sklearn.linear_model as lm
pipe = make_pipeline(
ct, BaggingRegressor(lm.ElasticNetCV(), n_estimators=100, max_samples=0.25, random_state=42, n_jobs=-1,)
)
pipe.fit(fish, fish["Weight"])
new_fish = pd.DataFrame(
[
{
"Species": "Bream",
"Weight": -1,
"Length1": 31.3,
"Length2": 34,
"Length3": 39.5,
"Height": 15.1285,
"Width": 5.5695,
}
]
)
predictions = [e.predict(new_fish)[0] for e in estimators]
plt.hist(predictions, bins=15)
plt.savefig("twm1_hist.png", bbox_inches="tight")
Which gives us a nifty histogram of expected weight:

The cool thing about this approach, though, is that we can swap in any model
within the BaggingRegressor, and the rest of the code is unaffected. For
instance, here's the distribution of predictions when using decision trees:

Interesting idea, right? There's still a few more approaches I want to highlight in coming posts, but after that I'll be comparing them all to see which uncertainty estimation technique is best.
Comments? Questions? Concerns? Please tweet me @SamuelDataT or email me. Thanks!