Fairness: Types of bias Stay organized with collections Save and categorize content based on your preferences.
-
Machine learning models can be susceptible to bias due to human involvement in data selection and curation.
-
Understanding common human biases is crucial for mitigating their impact on model predictions.
-
This webpage explores various types of biases, including reporting bias, historical bias, and automation bias, among others, providing definitions and examples for each.
-
Selection bias, group attribution bias, and implicit bias are also discussed, with subtypes and illustrations.
-
While not exhaustive, the presented biases highlight potential areas of concern when developing and evaluating machine learning models.
Machine learning (ML) models are not inherently objective. ML practitioners train models by feeding them a dataset of training examples, and human involvement in the provision and curation of this data can make a model's predictions susceptible to bias.
When building models, it's important to be aware of common human biases that can manifest in your data, so you can take proactive steps to mitigate their effects.
Note: The following inventory of biases provides just a small selection of biases that are often uncovered in machine learning datasets; this list is not intended to be exhaustive. Wikipedia's catalog of cognitive biases enumerates over 100 different types of human bias that can affect our judgment. When auditing your data, beware of any and all potential sources of bias that might skew your model's predictions.Reporting bias
-
Definition
Reporting bias occurs when the frequency of events, properties, and/or outcomes captured in a dataset does not accurately reflect their real-world frequency. This bias can arise because people tend to focus on documenting circumstances that are unusual or especially memorable, assuming that the ordinary does not need to be recorded.
Click chevron_right for an example.
-
Example
A sentiment-analysis model is trained to predict whether book reviews are positive or negative based on a corpus of user submissions to a popular website. The majority of reviews in the training dataset reflect extreme opinions (reviewers who either loved or hated a book), because people were less likely to submit a review of a book if they did not respond to it strongly. As a result, the model is less able to correctly predict sentiment of reviews that use more subtle language to describe a book.
Click chevron_left for the definition.
Historical bias
-
Definition
Historical bias occurs when historical data reflects inequities that existed in the world at that time.
Click chevron_right for an example
-
Example
A city housing dataset from the 1960s contains home-price data that reflects discriminatory lending practices in effect during that decade.
Click chevron_left for the definition.
Automation bias
-
Definition
Automation bias is a tendency to favor results generated by automated systems over those generated by non-automated systems, irrespective of the error rates of each.
Click chevron_right for an example
-
Example
ML practitioners working for a sprocket manufacturer were eager to deploy the new "groundbreaking" model they trained to identify tooth defects, until the factory supervisor pointed out that the model's precision and recall rates were both 15% lower than those of human inspectors.
Click chevron_left for the definition.
Selection bias
Selection bias occurs if a dataset's examples are chosen in a way that is not reflective of their real-world distribution. Selection bias can take many different forms, including coverage bias, non-response bias, and sampling bias.
Coverage bias
-
Definition
Coverage bias occurs if data is not selected in a representative fashion.
Click chevron_right for an example
-
Example
A model is trained to predict future sales of a new product based on phone surveys conducted with a sample of consumers who bought the product. Consumers who instead opted to buy a competing product were not surveyed, and as a result, this group of people was not represented in the training data.
Click chevron_left for the definition.
Non-Response bias
-
Definition
Non-response bias (also known as participation bias) occurs if data ends up being unrepresentative due to participation gaps in the data-collection process.
Click chevron_right for an example
-
Example
A model is trained to predict future sales of a new product based on phone surveys conducted with a sample of consumers who bought the product and with a sample of consumers who bought a competing product. Consumers who bought the competing product were 80% more likely to refuse to complete the survey, and their data was underrepresented in the sample.
Click chevron_left for the definition.
Sampling bias
-
Definition
Sampling bias occurs if proper randomization is not used during data collection.
Click chevron_right for an example
-
Example
A model is trained to predict future sales of a new product based on phone surveys conducted with a sample of consumers who bought the product and with a sample of consumers who bought a competing product. Instead of randomly targeting consumers, the surveyor chose the first 200 consumers that responded to an email, who might have been more enthusiastic about the product than average purchasers.
Click chevron_left for the definition.
Group attribution bias
Group attribution bias is a tendency to generalize what is true of individuals to the entire group to which they belong. Group attribution bias often manifests in the two following forms.
In-group bias
-
Definition
In-group bias is a preference for members of your own group you also belong, or for characteristics that you also share.
Click chevron_right for an example
-
Example
Two ML practitioners training a résumé-screening model for software developers are predisposed to believe that applicants who attended the same computer-science academy as they both did are more qualified for the role.
Click chevron_left for the definition.
Out-group homogeneity bias
-
Definition
Out-group homogeneity bias is a tendency to stereotype individual members of a group to which you do not belong, or to see their characteristics as more uniform.
Click chevron_right for an example
-
Example
Two ML practitioners training a résumé-screening model for software developers are predisposed to believe that all applicants who did not attend a computer-science academy don't have sufficient expertise for the role.
Click chevron_left for the definition.
Implicit Bias
-
Definition
Implicit bias occurs when assumptions are made based on one's own model of thinking and personal experiences that don't necessarily apply more generally.
Click chevron_right for an example
-
Example
An ML practitioner training a gesture-recognition model uses a head shake as a feature to indicate a person is communicating the word "no." However, in some regions of the world, a head shake actually signifies "yes."
Click chevron_left for the definition.
Confirmation bias
-
Definition
Confirmation bias occurs when model builders unconsciously process data in ways that affirm pre-existing beliefs and hypotheses.
Click chevron_right for an example
-
Example
An ML practitioner is building a model that predicts aggressiveness in dogs based on a variety of features (height, weight, breed, environment). The practitioner had an unpleasant encounter with a hyperactive toy poodle as a child, and ever since has associated the breed with aggression. When curating the model's training data, the practitioner unconsciously discarded features that provided evidence of docility in smaller dogs.
Click chevron_left for the definition.
Experimenter's bias
-
Definition
Experimenter's bias occurs when a model builder keeps training a model until it produces a result that aligns with their original hypothesis.
Click chevron_right for an example
-
Example
An ML practitioner is building a model that predicts aggressiveness in dogs based on a variety of features (height, weight, breed, environment). The practitioner had an unpleasant encounter with a hyperactive toy poodle as a child, and ever since has associated the breed with aggression. When the trained model predicted most toy poodles to be relatively docile, the practitioner retrained the model several more times until it produced a result showing smaller poodles to be more violent.
Click chevron_left for the definition.
Exercise: Check your understanding
Which of the following types of bias could have contributed to the skewed predictions in the college admissions model described
in the introduction?
Historical bias
The admissions model was trained on student records from the past 20 years. If minority students were underrepresented in this data, the model could have reproduced the same historical inequities when making predictions on new student data.
In-group bias
The admissions model was trained by current university students, who could have had an unconscious preference for admitting students that came from backgrounds similar to their own, which could have affected how they curated or feature-engineered the data on which the model was trained.
Confirmation bias
The admissions model was trained by current university students, who likely had preexisting beliefs about what types of qualifications correlate with success in the computer science program. They could have inadvertently curated or feature-engineered the data so that the model affirmed these existing beliefs.
Automation bias
Automation bias might explain why the admissions committee chose to use an ML model to make admissions decisions; they might have believed an automated system would produce better results than decisions made by humans. However, automation bias doesn't provide any insight into why the model's predictions ended up being skewed.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-03 UTC.
Need to tell us more? [[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-03 UTC."],[],[]]