Countering Bias and Enhancing Generalization in AI Models

Machine learning models, as they are implemented in the real world, often use data that is potentially biased. For instance, the celebrity image dataset CELEBA includes a larger number of blonde female celebrities; therefore, the classifier may incorrectly predict “blond” as the hair color for most female faces. Such gender bias could lead to severe consequences in applications like medical diagnosis where precision is crucial.

Recent research has highlighted that deep networks may unintentionally amplify such statistical biases. This is known as the simplicity bias of deep learning, which means that the deep networks fail to identify complex but potentially more accurate features, instead preferring weakly predictive features early in the training process.

Considering these challenges, we propose two potential solutions to mitigate simplicity bias and spurious features: applying early readouts and feature forgetting. In a paper titled “Using Early Readouts to Mediate Featural Bias in Distillation”, we demonstrate how early readouts can signal issues with the quality of the learned representations. Particularly, these predictions tend to be more wrong, and more confidently wrong, when the model is relying on spurious features.

We also found that the strategy of "forgetting" the problematic features helps the model to further identify more predictive and useful features. Consequently, this leads to an improved ability of the models to generalize to unseen domains. Our research in devising these advanced applications is guided by our AI Principles and Responsible AI practices.

In another closely related project, we aim to overcome simplicity bias using a “feature sieve”. Here, we intervene directly on the information provided by early readouts to enhance both feature learning and model generalization. For this, we alternate between identifying problematic features and removing them from the network.

Our application of the feature sieve sees substantial gains in generalization over a range of baseline models on real-world spurious feature benchmark datasets like BAR, CelebA Hair, NICO, and ImagenetA. The margin of improvement, in some instances, is as high as 11%.

In conclusion, we hope our work on early readouts and feature sieving will lead to a new era in the development of adversarial feature learning methodologies and improve the resilience and generalization capabilities of deep learning systems.

Disclaimer: The above article was written with the assistance of AI. The original sources can be found on Google AI.