Research Guides: Artificial Intelligence (AI) Resource Guide: Evaluating AI Output

Evaluating AI Output

EVALUATING OUTPUT (from Boston College Libraries, Generative AI Guide)

Here are some questions to consider when evaluating output from generative AI tools:

Date: When was the information created? Has it been updated?
Authority: Who created the information? What is their authority and what are their credentials? What is their point of view? What possible biases might they have?
Purpose: What is the information source’s purpose? Why was it created? Who is the intended audience?
Documentation: What sources are cited in this information? If none, is there another way to verify the information?

You might notice that these questions are difficult (or sometimes even impossible) to answer when using generative AI tools. You will have to decide how this affects if and how you use the information you get from these tools.

UNDERSTANDING BIAS IN LARGE LANGUAGE MODELS (LLMs)

Beyond bias on points of individual facts, large language models reflect the language, attitudes, and perspectives of the creators of its training data. Thus, the style of language and types of thoughts expressed, and even conclusions the LLM comes to reflect those creators, and not a general "universal" human.

“White supremacist and misogynistic, ageist, etc., views are overrepresented in the training data, not only exceeding their prevalence in the general population but also setting up models trained on these datasets to further amplify biases and harms.”

(from Timnit Gebru, Emily M. Bender, Angelina McMillian-Major, and Shmargret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”, p. 613.)

This is true of gender, demographics, and also location. The vast majority of data comes from the Global North.

The image below, from a recent paper, shows the locations of place names found in the Llama-2-70b LLM. Europe, North America, and Asia are fairly well represented, while Africa and South America are nearly absent. As a result, the language model may reflect the attitudes and cultural assumptions of people in those areas far more consistently.

An attempt by AI to figure out the real-world locations of millions of place names appearing in the Llama-2-70b LLM

Gurnee, W., & Tegmark, M. (2023). Language Models Represent Space and Time (arXiv:2310.02207). arXiv. https://doi.org/10.48550/arXiv.2310.02207