Addressing AI's Hidden Agenda
What are the sources and significant examples of implicit bias in artificial intelligence? What can be done to minimize their consequences in establishing acceptable data practices?
When we evoke the implicit bias of artificial intelligence, we are recognizing that machine learning reflects the unconscionable attitudes and stereotypes that influence managerial understanding, actions, and decisions in both repetitive and exceptional situations. Although this bias can be potentially either favorable or unfavorable, these implicit evaluations or beliefs can severely limit an organization's ability to leverage Data Science objectively. There are four potential sources of implicit bias present in Artificial Intelligence: in the data, in the algorithms, in our own logic, and in our definitions of ethical behavior. Let's explore each in turn.
Machine learning involves specifying a data set that is used to train (test) the algorithm that will subsequentially be used in real world conditions. Although as a rule the more data an organization collects to train the model, the better the results, several sources of implicit bias significantly compromise the model pertinence's in practice. Sample bias occurs when the data used to train the model does not accurately represent the context, or problem space, in which the model will operate. Prejudicial bias occurs when training data content is influenced by stereotypes or prejudice coming from the population. Finally, measurement bias results from faulty measurement where the outcome systematically all the data.
Twitter streams have provided both a common resource for sentiment analysis of consumer opinions, and a telling example of sample bias. Twitter is a logical choice for text mining when running algorithms like SVM and Naive Bayes, for the resulting data set is easily accessible, cost effective and includes a wide variety of consumer profiles. Yet this choice provides the opinions of a particular set of technology users, rather than a representative sample of the general population. It will miss people who do not have access to a smartphone (nor ready assess to a computer), not to mention those that are allergic to this social media channel. The test data will likely include several messages from the multitude of automated user accounts that have flooded this medium, as well as the retweets of these bots programmed to provide polarizing positions on each issue.
The algorithms at the heart of machine learning will never be better than the data used to test them.
A second source of implicit bias is found in the way in which algorithms are constructed to explain or to predict real-world phenomena. For the Data Scientist, bias, along with variance, describe an algorithmic property that influences prediction performance. Since bias and variance are interdependent, data scientists are forced to seek a balance between the two. Models with high variance tend to fit the training data better but may not generalize well to data outside the training data set. Finding the appropriate balance between the two for a given use case is by definition arbitrary and often opaque to the organizational decision-makers relying on these algorithms.
Equivant's COMPAS software is widely used for bail and sentencing in many criminal justice systems. This proprietary software relies on an algorithm that is used to predict the likelihood of repeat criminality. It is also a well-documented example of algorithmic bias. In early 2016, the news agency ProPublica published a study demonstrating the racial bias of the algorithm: is systematically overestimates the criminal intentions of black defendants, while underestimating the recidivism of their white counterparts. Although COMPAS' algorithms are not open to public scrutiny due to the software's proprietary status, the algorithm apparently is twice as likely to misclassify black defendants, and white recidivists are misclassified 63.2 percent of the time.
A third source of bias can be found in the means in which human beings induce and deduce logical conclusions from the data they see. A cognitive bias is a systematic pattern of deviation from norm or rationality in judgment. Some cognitive biases are presumably adaptive. Cognitive biases may influence perceptions of similarity, judgement, memory and action. Although cognitive biases may lead to more effective actions in each context, they often lead to misleading representations of personal and business challenges. Certain cognitive biases are a "by-product" of human processing limitations, resulting either from a lack of appropriate mental mechanisms or a manager's limited bandwidth for information processing.
Such cognitive biases lead managers whose teams have just had a profitable quarter to take the available data as proof that the market is growing, while managers who are struggling tend to interpret the same data as evidence to the contrary. Availability bias involves the cognitive mechanisms in which managers privilege data that comes easily to mind when qualifying new information. If they are making a risk assessment, they tend to weigh new data by similar events that have either occurred in the recent past or provided strong emotional attachment. The accuracy of managerial judgment is tainted by experience, they are much more likely to under- or overestimate the probability that the event is likely to occur. This begs the question of whether human rationality is inevitably burdened with cognitive biases, or whether the intrinsic value of managerial decision-making is a manager's ability to selectively interpret the outcomes of AI?.
A final source of bias comes from human perceptions of ethics. Ethics can be understood as a set of general principles that guide individual and organizational behavior. Ethics reflect our perceptions of what is right and wrong, our beliefs about the nature of our company and its market, as well as our convections of what is acceptable in terms of business practice. According to the proponents of cultural relativism, there is no singular truth on which to base ethical behavior: our interpretations of the limits of what is acceptable are conditioned by existing social norms, cultural practices, and religious influences. Ethical considerations are often in conflict with our perceptions of self-interest, and as a result, many managers may cross the line without being aware that they are doing anything unethical.
Take for example Stanford University's Human-Centered Artificial Intelligence Institute. The University launched the Institute last year based on a vision that "designers of AI must be broadly representative of humanity". Yet when the Institute revealed the 120 faculty and tech leaders partnering on the initiative, over 80 percent were white and almost as many were male. What exactly does this group represent? Are they representative of the different racial, cultural, and intellectual currents of the industry today, or of the population at large? Educated for the majority in exclusive business and engineering schools, are they able to relate to local ethical challenges? If they are indeed representative of today's challenges, should the representation be modified to represent a more desirable diversity?
What can be done to minimize the consequences of the implicit biases of artificial intelligence? One strategy can be to raise managerial awareness of the sources of implicit bias in data, algorithms, and applications used throughout the organization. A complementary strategy would involve routinely evaluating the use cases and data driven decisions for evidence of undesirable bias. Identifying the sources of risk, uncertainly and ambiguity that often haunt managerial decision-making can help ensure that machine intelligence is used where it can be most effective. Senior management can go one step further in Identifying and consciously acknowledging acceptable data practices and desired ethical positions. Last, but not least, organizations and professional associations can institute feedback mechanisms to encourage discussion and debate around digital ethics.
This text is part of our contribution to Jay Liebowitz's upcoming book on "Data Analytics and AI". Further thoughts on digital ethics can be found in our ebook on Digital Ethics as well as in our management conferences, courses and summer school at The Business Analytics Institute.