To effectively aggregate outputs from multiple classifiers that each handle different subsets of classes, you can explore the following strategies:
1. Confidence-Based Voting (Weighted Voting):
Each classifier provides a confidence score for its predicted class. You can combine these scores across classifiers and select the class with the highest overall confidence.
If classifiers are highly specialized for their own subsets, you could assign higher weights to the confidence of the classifier responsible for the class being predicted.
Example:
Classifier A outputs probabilities for classes {0, 1}, and Classifier B for classes {2, 3}.
For an input, you obtain:
=> Classifier A: 0.7 for class 0, 0.3 for class 1.
=> Classifier B: 0.6 for class 2, 0.4 for class 3.
Aggregate by normalizing these probabilities, then pick the class with the highest combined confidence.
2. Out-of-Distribution Detection:
Train an out-of-distribution (OOD) detector for each classifier that determines if an input belongs to that classifier’s class set.
If a classifier recognizes an input as belonging to its subset, use its prediction. If not, either discard it or adjust the output based on another classifier.
This can be done with a softmax confidence threshold (e.g., if all softmax values are low, it might indicate an out-of-distribution input).
3. Meta-Classifier (Ensemble):
Train a meta-classifier that learns to combine the predictions of your subset classifiers.
Each classifier provides a prediction or a set of features (like confidence scores), and the meta-classifier learns how to combine these inputs into a final prediction.
You can use techniques like logistic regression, random forests, or even a neural network as the meta-classifier.
4. Class Hierarchy (Hierarchical Classifiers):
If the classes have a hierarchical relationship, create a hierarchical classification approach where the system first assigns the input to a broader category (e.g., "Classifier A or B") and then uses the specialized classifiers within that category.
This helps in routing inputs to the appropriate classifier.
5. Mixture of Experts:
Use a mixture of experts approach where a gating network decides which classifier to trust for a particular input. This gating network is trained to learn which classifier specializes in which subset of inputs.
The gating mechanism outputs a probability distribution over the classifiers (experts), and the final prediction is a weighted combination of the classifier predictions.
6. Calibrating Classifier Outputs:
Calibrate the classifiers so that their confidence scores are more reliable across the board. Techniques like Platt scaling or temperature scaling can help adjust the confidence outputs, allowing for more effective combination when aggregating.
7. Use “Other” Class with Proper Balancing:
You mentioned using an "other" class, but if this didn’t work, it may be due to imbalance or improper handling of how this class is represented.
You could explore rebalancing the training set or using methods like label smoothing to ensure that the "other" class doesn’t dominate or get ignored.
8. Error-Correcting Output Codes (ECOC):
Assign each classifier to a binary encoding of classes (error-correcting code) and use the combination of classifier outputs to reconstruct the final class.
This method can also help with cases where some classifiers might be unsure, as the redundancy in the encoding allows for correction.
Please take a note that Combining classifiers trained on different subsets requires careful balancing of confidence levels and avoiding bias towards one subset over another. A meta-classifier or mixture of experts approach may offer the most streamlined solution.