Definitions are hard. Just ask Arvind Narayanan, associate professor of computer science at Princeton, whose aptly titled tutorial 21 fairness definitions and their politics 1 was a highlight at the Conference on Fairness, Accountability, and Transparency (FAT*). In his tutorial, Narayanan discussed the various fairness definitions in the context of machine learning and algorithmic decision-making, as well as the political and ethical implications of these definitions. By exploring 21 different fairness definitions, Narayanan aimed to demonstrate that fairness is a context-dependent, multifaceted concept that often requires careful consideration of ethical and societal values. The tutorial emphasized the importance of understanding the assumptions, trade-offs, and limitations associated with each definition, and he urged designers of algorithms to make informed decisions about which fairness definitions are most appropriate for a particular context.
As we attempt to define ethical AI, it is crucial to identify several core and contextual components. Ethical AI should be explainable, trustworthy, safe, reliable, robust, auditable, and fair, among numerous other aspects. Formal methods and definitions involve the use of accurate mathematical modeling and reasoning to draw rigorous conclusions. The challenge of formally defining explainability will soon become apparent – while there is a formal definition to verify a model’s adherence to differential privacy, quantifying explainability, trust, and ethics proves more nuanced. Consequently, the definitions presented here are imperfect representations of our current understanding of the subject. As taxonomies evolve and underlying semantics shift, we will strive to clarify some of the key terms to provide a clearer picture.
Explainability
Explainability refers to the ability of a machine learning algorithm to provide clear and understandable explanations for its decision-making process. While deep learning has made significant strides in areas such as computer vision and natural language processing, these models are often viewed as “black boxes” because their decision-making process is not always transparent. This lack of transparency can be a significant barrier to the adoption of deep learning models in certain areas, such as healthcare and finance, where the consequences of algorithmic decisions can be significant. As a result, developing methods to explain the reasoning of these models is critical for their wider adoption and success.
Explainability is one of those “-ilities” or non-functional requirements3 – the quality of being explainable, 4 such as being capable of giving the reason for our cause. Explainability, therefore, can be the ability to provide a reason or justification for an action or belief.
In simple terms, we can infer that if an event is explainable, it provides sufficient information to draw a conclusion as to why a particular decision was made. Explainable to whom? To a human. Although it’s preferable if it’s possible, this doesn’t have to be a layperson. Explainable to a subject-matter expert (SME) is fine. The SME themselves can both assure non-expert users and explain to them why a machine made such a decision in a less technical manner. Human understanding is critical. Explainability is mandatory and required by law in certain protected domains, such as finance and housing.
Interpretability
Interpretability is another very closely related concept that is typically used interchangeably with explainability, but there are some subtle differences, which we will discuss shortly. Lipton did a detailed analysis to address model properties and techniques thought to confer interpretability and decided that, at present, interpretability has no formal technical meaning – well, that’s not very helpful. Informally, interpretability directly correlates with understandability or intelligibility (of a model) so that we as humans can understand how it works. Understandable models are transparent or white-box/glass-box models, whereas incomprehensible models are considered black boxes.
For the purpose of this discourse, interpretability is generally seen as a subset of explainability. Interpretability refers to the ability to understand the specific features or inputs that a model uses to make its predictions.
A system can be interpretable if we can find and illustrate cause and effect. An example would be the weather temperature on crop yields. The crop will have an optimum temperature for its highest yields, so we can use temperature as a predictor (feature) in the crop yield (target variable). However, the relationship between the temperature and the crop yield will not be explainable until an understanding of the bigger picture is in place. In the same vein, a model can be transparent without being explainable. For instance, we can clearly see the following prediction function:
Predict(x1, x2) > y′ (1.1)
However, if we don’t know much about hyperparameters x1 and x2:
x1 and x2 (1.2)
which might be a combination of several real-world features, the model is not explainable.
Also, a model can be explainable, transparent, and still biased. Explainability is not a guarantee of fairness, safety, trust, or bias. It just ensures that you, as a human SME, can understand the model.
Explicability
The two terms explainability and explicability may appear the same, but in this context, they do differ. Explicability is the broader term, referring to the concept of transparency, communication, and understanding in machine learning, while explainability refers to the ability to provide clear and understandable reasons for how a given machine learning model makes its decisions.
Explicability is a term typically used in regulations and related governance documents. It literally means “capable of being explained” and it is deemed crucial to build and maintain users’ trust in AI systems by EU Ethical guidelines 5.
Does a safe system have to be explainable? In our opinion, yes, absolutely. While there is an ongoing discussion among researchers on this topic, the first-ever “great AI debate” at the Neural Information Processing Systems (NeurIPS) conference was about how interpretability is necessary for machine learning.
Note
At the time of writing, this debate has moved on. Since the launch of ChatGPT in late 2022 by OpenAI, there has been increasing awareness at governmental levels regarding the importance of AI assurance and regulatory guardrails. It seems likely that an international body overseeing AI regulation will be established. If this does not happen, individual countries and trading groups will establish and govern AI at these levels.
Safe and trustworthy
AI safety is an area that deals with nonfunctional requirements, such as reliability, robustness, and assurance. An AI system is deemed safe and trustworthy if it exhibits reliability, meaning that it acts within the desired ranges of outputs, even when the inputs are new, in and around edge conditions. It also has to be robust, be able to handle adversarial inputs (as shown in Figure 1.1), and not be gullible and easily fooled, providing high confidence predictions for unrecognizable images7.
This debate highlights an ongoing discussion in the machine learning community about the trade-off between performance and interpretability. The participants, Rich Caruana and Patrice Simard, argued that interpretability is essential to understand the reasoning behind machine learning models and ensure their responsible use, while Kilian Weinberger and Yann LeCun argued that performance should be the main focus of machine learning research. Interpretability can sometimes compromise performance and may not be possible in highly complex deep learning models. The participants argued that explainable and interpretable machine learning models are essential to build trust and ensure the responsible use of AI in society (The Great AI Debate – NIPS2017 8).
A safe system should also be auditable, meaning it must be transparent to verify the internal state when the decision was made. This auditability is particularly important within regulated industries, such as health and finance, where those seeking to use AI for given applications will need to always be able to prove to a regulator that the machine learning models underpinning the AI meet the required regulatory standards for AI.
The system and processes used within an enterprise to monitor the internal state of machine learning models and their underlying data must also be auditable. This ensures that tracing back to the AI components is possible, enabling a retrospective review such as root-cause analysis in a reliable manner. Such audit processes are increasingly being codified and built into enterprise MLOps platforms.
Privacy and security are also key components of a safe and trustworthy AI system. User data has specific contexts, needs, and expectations and should be protected accordingly during its entire life cycle.
Stanford Center for AI Safety (http://aisafety.stanford.edu/) focuses on developing rigorous techniques to build safe and trustworthy AI systems and establish confidence in their behavior and robustness. This Stanford Center for AI Safety white paper (https://aisafety.stanford.edu/whitepaper.pdf) by Kochenderfer, et al provides a great overview of AI safety and its related aspects, and it makes for good reading.
Fairness
Fairness in machine learning systems refers to the principle that decisions made by these systems should not discriminate or be biased against individuals or groups based on their race, gender, ethnicity, religion, or other personal characteristics. Fairness is about not showing implicit bias or unintended preference toward specific subgroups, features, or inputs. We mentioned previously a detailed tutorial on 21 fairness definitions and their politics9 at the Conference on Fairness, Accountability, and Transparency 10, but we will adhere to the EU’s draft guidelines, which correlate fairness with ensuring an equal and just distribution of both benefits and costs, ensuring that individuals and groups are free from unfair bias, discrimination, and stigmatization.
Microsoft’s Melissa Holland, in her post about our shared responsibility for AI, 11 defines fairness as follows:
“AI Models should treat everyone in a fair and balanced manner and not affect similarly situated groups of people in different ways.”
Machines may learn to discriminate for of a variety of reasons, including skewed samples, tainted examples, limited features, sample size, disparity, and proxies. This can lead to disparate treatment of the users. As the implicit bias seeps into the data, this can lead to serious legal ramifications, especially in regulated domains such as credit (Equal Credit Opportunity Act), education (Civil Rights Act of 1964 and Education Amendments of 1972), employment (Civil Rights Act of 1964), housing (Fair Housing Act), and public accommodation (Civil Rights Act of 1964). The protected classes that cannot be discriminated against include race (Civil Rights Act of 1964), color (Civil Rights Act of 1964), sex (Equal Pay Act of 1963 and Civil Rights Act of 1964), religion (Civil Rights Act of 1964), national origin (Civil Rights Act of 1964), citizenship (Immigration Reform and Control Act), age (Age Discrimination in Employment Act of 1967), pregnancy (Pregnancy Discrimination Act), familial status (Civil Rights Act of 1968), disability status (Rehabilitation Act of 1973 and Americans with Disabilities Act of 1990), veteran status (Vietnam Era Veterans’ Readjustment Assistance Act of 1974 and Uniformed Services Employment and Reemployment Rights Act), and genetic information (Genetic Information Nondiscrimination Act). In addition to the laws in the United States, there are also international laws aimed at ensuring fairness, such as the European Union’s General Data Protection Regulation (GDPR), which mandates that automated decision-making systems do not lead to discriminatory or unjust outcomes. The Equality Act of 2010 in the United Kingdom prohibits discrimination based on protected characteristics, which encompass age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, and sexual orientation. These international laws are designed to prevent discrimination and promote fairness in machine learning systems.
In the context of Arvind Narayanan’s tutorial, an example of the incompatibility of different fairness definitions is illustrated using two fairness metrics – statistical parity (P(Yˆ = 1|A = a) = P(Yˆ = 1) for all a ∈ {0, 1}) and equalized odds (P(Yˆ = 1|Y = y, A = a) = P(Yˆ = 1|Y = y) for all a ∈ {0, 1} and y ∈ {0, 1}). These definitions can be incompatible when the base rates of positive outcomes in the two demographic groups are different. In such a scenario, it is not possible to satisfy both definitions simultaneously, as adjusting the algorithm to achieve statistical parity might result in unequal true positive rates and false positive rates across groups, violating equalized odds. Conversely, ensuring equalized odds can lead to a different proportion of positive outcomes between the groups, violating statistical parity. This example demonstrates that satisfying multiple fairness definitions at the same time may not always be possible, highlighting the need for careful consideration of trade-offs and context when selecting appropriate fairness definitions.
In practice, the fairness of an AI system also has a lot to do with accountability – “the ability to contest and seek effective redress against decisions made by AI systems and by the humans operating them.” The EU’s ethics guidelines for trustworthy AI 12 recommend holding the unfair entity identifiable and accountable. The entity accountable for the decision must be identifiable, and the decision-making processes should be explicable.
Ethics
Ethics are at the core of responsible AI development. Ethics in machine learning fairness refers to the set of principles and values that guide the development and use of machine learning systems to ensure that they are just, equitable, and unbiased. This includes ensuring that machine learning models are developed using representative and unbiased datasets, that the features used in a model are relevant and fair, and that algorithms are evaluated for any unintended consequences or biases.
Ethics are defined as “moral principles that govern a person’s behavior or the conducting of an activity” (Oxford English Dictionary 13). The goal of ethics in machine learning fairness is to ensure that these systems are designed and deployed in a way that is consistent with our values, and that they promote the well-being of society as a whole. This includes considering the potential impacts of these systems on different groups of people and ensuring that they do not perpetuate or exacerbate existing inequalities and biases. Morals often describe your particular values concerning what is right and what is wrong. While ethics can refer broadly to moral principles, you often see it applied to questions of correct behavior within a relatively narrow area of activity.
Even though used interchangeably, morals are the individual beliefs about what is right or wrong, while ethics are a set of principles and values that are shared by a group or profession and are intended to guide behavior in a particular context – hence, instead of “moral-AI,” it makes sense to strive and build ethical AI practices to ensure that machine learning systems are designed and deployed in a way that is both technically sound and socially responsible.
In the following sections, you will see several definitions of what constitutes an ethical AI. Despite the growing attention to ethical considerations in AI, there is still no clear consensus on what constitutes “ethical AI.” This lack of agreement is due to a number of factors – the rapidly evolving nature of AI technologies, the complexity of the ethical issues involved, and the diverse range of stakeholders with differing interests and values.
This raises an important question, as posed by Gray Scott, an expert in the philosophy of technology, digital consciousness, and humanity’s technological advancements:
“The real question is, when will we draft an AI bill of rights? What will that consist of? And who will get to decide that?”
Eileen Chamberlain Donahoe, the executive director of the Global Digital Policy Incubator at Stanford University’s Center for Democracy and the first US ambassador to the United Nations Human Rights Council, offers a potential answer to the question of AI ethics and safety standards that are both enforceable and accountable. According to Donahoe, the answer may already be found in the Universal Declaration of Human Rights (UDHR) and a series of international treaties that outline the civil, political, economic, social, and cultural rights envisioned by the UDHR. This perspective has a wide global consensus and could be suitable for the purpose of regulating AI in the short term.
Transparency
Model transparency refers to the ability to understand and explain how a machine learning model works and how it arrived at its predictions or decisions.
Model transparency, explainability, and interpretability are related but distinct concepts in responsible AI. Model transparency refers to the degree of visibility and understandability of a model’s inner workings, including input, output, and processing steps. Model explainability aims to provide human-understandable reasons for a model’s output, while model interpretability goes deeper to allow humans to understand a model’s internal processes. Achieving model transparency can involve methods such as model interpretation, data and process transparency, and clear documentation. While all three concepts are important in responsible AI, not all transparent or explainable models are necessarily interpretable.
Keeping humans in the loop for decision support systems
Imagine the following conversation:
Physician: “We believe the best course of action for you requires surgery, and this may lead to amputation of your leg.”
Patient: “Really? That’s quite bleak, but why?”
Physician: “Because, well, mainly because our treatment algorithm said so!”
As you can imagine, this conversation is unlikely to go smoothly. Without specific details about why surgery is necessary, along with case studies, assurance of potentially high success rates (with caveats, of course), and empathetic human reinforcement, the patient will likely remain unconvinced.
That’s why keywords such as augmentation and support play crucial roles, as they emphasize the importance of human involvement in heavily regulated and human-centric systems. While a model providing recommendations may be acceptable in many situations, it cannot wholly replace human decision-making. The complete autonomy of AI models may be challenging to accept due to potential regulatory, compliance, or legal consequences. It is essential to keep humans in the loop for oversight and reinforcement of correct behavior, at least for now, to ensure that AI is used responsibly and ethically.
Model governance
Model governance refers to the process of managing and overseeing the development, deployment, and maintenance of machine learning models in an organization. It involves setting policies, standards, and procedures to ensure that models are developed and used in a responsible, ethical, and legally compliant way.
Model governance is necessary because machine learning models can have significant impacts on individuals, businesses, and society as a whole. Models can be used to make decisions about credit, employment, healthcare, and other critical areas, so it is important to ensure that they are reliable, accurate, and fair.
The key components of model governance include the following:
- Model inventory and documentation: Keeping an up-to-date inventory of all models in use and their relevant documentation, including details about their data sources, training methodologies, performance metrics, and other relevant information
- Model monitoring and performance management: Monitoring models in production to ensure that they continue to perform as expected, and implementing systems to manage model performance, such as early warning systems and automated retraining
- Model life cycle management: Establishing clear processes and workflows for the entire life cycle of a model, from development to decommissioning, including procedures for model updates, versioning, and retirement
- Model security and data privacy: Ensuring that models and their associated data are secure and protected against cyber threats and that they comply with relevant data privacy regulations, such as GDPR and CCPA
- Model interpretability and explainability: Implementing methods to ensure that models are interpretable and explainable, enabling users to understand how a model works and how it arrived at its output
- Model bias and fairness management: Implementing measures to identify and mitigate bias in models and ensure that models are fair and unbiased in their decision-making
- Model governance infrastructure and support: Establishing an organizational infrastructure and providing the necessary support, resources, and training to ensure effective model governance, including dedicated teams, governance policies, and training programs