In an era where machine learning (ML) technologies are becoming more prevalent, the ethical and operational issues surrounding them cannot be ignored. Here’s how we tackled this challenge:
💡 The Problem: ML models often don’t perform equally well for underrepresented groups, placing vulnerable populations at a disadvantage.
🌐 Our Solution: We leveraged web search and Generative AI to improve the robustness and reduce bias in discriminative ML models.
🔍 Methodology: 1. We identified weak decision boundaries for classes representing vulnerable populations (e.g., female doctor of color). 2. We constructed search queries for Google and generated text for creating images with DALL-E 2 and Stable Diffusion. 3. We used these new training samples to reduce population bias.
📈 Results: 1. Achieved a significant reduction (77.30%) in the model’s gender accuracy disparity. 2. Enhanced the classifier’s decision boundary, resulting in fewer weak spots and better class separation.
🌍 Applicability: Although demonstrated on vulnerable populations, this approach is extendable to a wide range of problems and domains.
Author: Chirag Shah Professor of Information Science, University of Washington
The prominent model of information access before search engines became the norm – librarians and subject or search experts providing relevant information – was interactive, personalized, transparent and authoritative. Search engines are the primary way most people access information today, but entering a few keywords and getting a list of results ranked by some unknown function is not ideal.
A new generation of artificial intelligence-based information access systems, which includes Microsoft’s Bing/ChatGPT, Google/Bard and Meta/LLaMA, is upending the traditional search engine mode of search input and output. These systems are able to take full sentences and even paragraphs as input and generate personalized natural language responses.
At first glance, this might seem like the best of both worlds: personable and custom answers combined with the breadth and depth of knowledge on the internet. But as a researcher who studies the search and recommendation systems, I believe the picture is mixed at best.
AI systems like ChatGPT and Bard are built on large language models. A language model is a machine-learning technique that uses a large body of available texts, such as Wikipedia and PubMed articles, to learn patterns. In simple terms, these models figure out what word is likely to come next, given a set of words or a phrase. In doing so, they are able to generate sentences, paragraphs and even pages that correspond to a query from a user. On March 14, 2023, OpenAI announced the next generation of the technology, GPT-4, which works with both text and image input, and Microsoft announced that its conversational Bing is based on GPT-4.
Thanks to the training on large bodies of text, fine-tuning and other machine learning-based methods, this type of information retrieval technique works quite effectively. The large language model-based systems generate personalized responses to fulfill information queries. People have found the results so impressive that ChatGPT reached 100 million users in one third of the time it took TikTok to get to that milestone. People have used it to not only find answers but to generate diagnoses, create dieting plans and make investment recommendations.
“Even then, people are still finding things that are biased, and to some people, they are unable to distinguish this information because the source cannot be validated.”– Chirag Shah
Discusses the impact of AI on academic integrity, with a focus on ChatGPT.
Our guest is Chirag Shah, Ph.D. Chirag is a Professor of Information and Computer Science at the University of Washington. He is the Founding Director of InfoSeeking Lab and Founding Co-Director of the Center for Responsibility in AI Systems & Experiences (RAISE). He works on intelligent information access systems focusing on fairness and transparency.
By Professor Chirag Shah, UW Information School Monday, February 13, 2023
Artificial intelligence platforms such as ChatGPT have caught the attention of researchers, students and the public in recent weeks. For this dean’s message, I have invited Chirag Shah, an Information School professor and expert in AI ethics, to share his thoughts on the future of generative AI.
— Anind K. Dey, Information School Dean and Professor
ChatGPT has caused quite an uproar. It’s an exciting AI chat system that leverages huge amounts of text processing to provide short, natural-sounding responses as well as complete complex tasks. It can write long essays, generate reports, develop insights, create tweets, and provide customized plans for various goals from dieting to retirement planning.
Amid the excitement about what ChatGPT can do, many have quickly started pointing out issues with its usage. Plagiarism and bias are the most immediate concerns, and there are many long-term challenges about the implications of such technology on educational processes, jobs, and even human knowledge and its dissemination at a global scale.
We have entered a new era in which systems can not only retrieve the information we want, but generate conversations, code, images, music and even simple videos on their own. This is powerful technology that has the potential to change how the world works with information, and as with any revolutionary technology, its benefits are paired with risk and uncertainty.
Traditionally, we have had two types of systems to access information: direct and algorithmically mediated. When we read newspapers, we are accessing information directly. When we use search engines or browse through recommendations on Netflix’s interface, we are accessing algorithmically mediated information. In both categories, the information already existed. But now we are able to access a third type: algorithmically generated information that didn’t previously exist.
There could be great benefits to having AI create information. For example, what if an author working on a children’s book needed an illustration where astronauts are playing basketball with cats in space? Chances are, no system could retrieve it. But if the author makes a query to DALL-E, Imagen, or Stable Diffusion, for example, they will get a pretty good response that is generated rather than retrieved.
Generated information can be customized to our given need and context without our having to sift through sources. However, we have little understanding of how and why the information was provided. We can be excited about an all-knowing AI system that is ready to chat with us 24/7, but we should also be wary about being unable to verify what the system tells us.
What if I asked you which U.S. president’s face is on the $100 bill? If you said “Benjamin Franklin,” then you fell for a trick question. Benjamin Franklin was a lot of things — a Founding Father, scientist, inventor, the first Postmaster General of the United States — but he was never a president. So, you’ve generated an answer that doesn’t or shouldn’t exist. Various pieces of otherwise credible information you know, such as presidents on dollar bills and Benjamin Franklin as a historical figure, gave you a sense of correctness when you were asked a leading question.
Similarly, algorithmically generated information systems combine sources and context to deliver an answer, but that answer isn’t always valid. Researchers are also concerned that these systems invariably can’t or won’t provide transparency about their sources, reveal their processes, or account for biases that have long plagued data and models in AI.
Big tech companies and startups are quickly integrating such technologies, and that raises many pressing questions. Will this be the new generation of information access for all? Will we or should we eliminate many of the cognitive tasks and jobs that humans currently do, given that AI systems could do them? How will this impact education and workforce training for the next generation? Who will oversee the development of these technologies? As researchers, it’s our job to help the public and policymakers understand technology’s implications so that companies are held to a higher standard. We need to help ensure that these technologies benefit everyone and support the values we want to promote as a society.
Oh, and if that Benjamin Franklin question tripped you up, don’t feel bad. ChatGPT gets it wrong too!
The ACM is the world’s largest computing society. It recognizes up to 10 percent of its worldwide membership as distinguished members based on their professional experience, groundbreaking achievements and longstanding participation in computing. The ACM has three tiers of recognition: fellows, distinguished members and senior members. The Distinguished Member Recognition Program, which honors members with at least 15 years of professional experience, recognized Shah for his work at the intersection of information, access and responsible AI. Shah expressed his gratitude and appreciation for the award.
“I’m incredibly grateful for all the support I’ve received from everyone. It’s a very humbling experience,” said Shah.
Shah has contributed a great deal of research related to people-centered information access and examining how biases and issues of discrimination that are present within information systems can be counteracted. One of Shah’s significant contributions to the iSchool has been co-founding Responsibility in AI Systems and Experiences (RAISE).
“I am both astounded and unsurprised by this new effort,” says Chirag Shah at the University of Washington, who studies search technologies. “When it comes to demoing these things, they look so fantastic, magical, and intelligent. But people still don’t seem to grasp that in principle such things can’t work the way we hype them up to.”
An artificial intelligence program that has impressed the internet with its ability to generate original images from user prompts has also sparked concerns and criticism for what is now a familiar issue with AI: racial and gender bias.
And while OpenAI, the company behind the program, called DALL·E 2, has sought to address the issues, the efforts have also come under scrutiny for what some technologists have claimed is a superficial way to fix systemic underlying problems with AI systems.
“The common thread is that all of these systems are trying to learn from existing data,” Shah said. “They are superficially and, on the surface, fixing the problem without fixing the underlying issue.”
Marketing and analytics experts said marketers can choose from a number of off-the-shelf predictive analytics tools with machine learning and AI built in. However, Shah explained that the more advanced marketing operations often build their own algorithms and custom tools, seeing it as a way to differentiate their efforts and maximize the success for their own organizations. “It almost also becomes a proprietary thing. For many companies, the way they derive their insights is the ‘secret sauce,'” he said.