Information Foraging with Generative AI: A Study of 3 Chatbots

Created on November 12, 2023 at 10:43 am

Summary: In a study of ChatGPT ORG , Bard ORG , and Bing Chat ORG , users found these tools helpful and trustworthy. They expected these AI ORG chatbots to aggregate information in a concise and specific manner, while fully considering contextual cues.

Generative-AI bots like ChatGPT, Bing PERSON , and Bard PERSON have reduced the amount of work required for users to find and gather information. Instead of doing all the work of information foraging (which involves looking through the many web pages available to them and picking up bits of information as they go), people can now use a bot to provide them with a precise, often concise answer.

However, these chatbots are currently far from perfect: people have a difficult time articulating their prompts and getting the output they desire. In addition, the bot can sometimes “forget” the context and provide broad or irrelevant answers, making tasking completion tedious.

To understand how users take advantage of these tools, their attitudes towards them, and their usage patterns, we ran a diary study following AI ORG chatbot users over a period of DATE

2 weeks DATE . In this article, we report on some general findings from this study.

On This Page:

Our Research

We ran a diary study with 18 CARDINAL participants:

8 CARDINAL used the newest version of ChatGPT ( 4.0 CARDINAL , at the time of the study, but a few conversations were accidentally logged with GPT 3.5 LAW )

5 CARDINAL used Google ORG ’s Bard

PRODUCT

5 CARDINAL used Bing Chat

The participants had various levels of experience with the chatbots: some had used them before, some had used one CARDINAL bot but tested another in the study, and others had heard about them but had not used them. They logged all their conversations with the bots over a period of 2 weeks DATE . At the end of the diary study, 14 CARDINAL participants were invited for in-depth interviews. The study was conducted in May and June 2023 DATE .

Participants logged a total of 425 CARDINAL conversations with the bots. On average, each participant recorded 23.61 CARDINAL conversations. Most conversations ( 83.29% PERCENT ) were carried out in participants’ homes. The device of choice was a desktop computer — but that was likely due to the setup that we imposed for the diary study (participants had to install a browser extension to record their conversations).

Types of Conversations

Participants engaged in a wide range of conversations with the bots. Some of the conversations were meant to test the bots and explore their limits. But most conversations were directed toward an informational goal. (Below we provide links only to those conversations that do not contain information that could be tracked back to the participants.)

Even though the range of the conversations was quite large, most of the conversations ( 75% PERCENT ) involved seeking information. Some of these activities aimed to find a specific fact, others were open-ended and researched a broader topic, and others looked for recommendations for a product or resource.

The next biggest category of conversations ( 14% PERCENT ) was creation activities. These involved asking the bot to create something new — like a recipe, a meal plan, a color, an image, or an email. Many of these creation activities were about writing text (e.g., emails, resumes, professional letters).

Another fairly common conversation type ( 8% PERCENT ) was entertainment. In this category, we saw users test the limits of the bots by asking them to do basic arithmetic, create a joke, play games like twenty CARDINAL questions, and argue with them on a topic (e.g., North Carolina GPE is boring).

A small percentage ( 3.6% PERCENT ) of the activities logged by our participants were task-focused activities: they involved getting the bots to help with a specific problem that they were facing — for example, figuring out how to code something in a specific programming language, instructions for setting up a virtual desktop for multiple users, translating a note from English LANGUAGE to Spanish LANGUAGE .

Note that the boundaries between these types of activities are quite fluid — for example, an open-ended brainstorming session could be seen as an information-seeking activity or as a creation activity. We chose to categorize as creation only those activities that generated a very specific artifact (such as a poem or a workout plan). Task-specific activities can also be seen as a special type of creation activity that involves creating precise instructions.

Note also that we expect to find different categories in different audiences (and that is why we do not attempt to generalize the relative proportions of these activities to the population at large). For example, professionals may primarily focus on creation activities. We also expect that, as people continue to build familiarity with AI bots, entertainment activities lose appeal or evolve to have a social-companionship role.

Helpfulness and Trustworthiness Scores

For each conversation, participants had to rate whether the chatbot was able to help them with the tasks and how trustworthy the bot’s answer was.

In general, people were quite pleased with the chatbots and rated their interactions as helpful and trustworthy. On a scale from 1 to 7 CARDINAL ( 1 CARDINAL =low, 7 CARDINAL =high), the helpfulness score averaged 5.77 CARDINAL across all bots and the trustworthiness rating was 6.00 DATE . Not surprisingly, the helpfulness and the trustworthiness scores were correlated (r=0.68, p < 0.0001 CARDINAL ).

About half CARDINAL of all conversations ( 53% PERCENT ; 95% PERCENT confidence interval: 0.48–0.58 CARDINAL ) were followed up by an action. Sometimes the action involved double-checking the information provided by the bot. In other instances, users acted upon that information (for example, purchasing the product or using a service recommended by the bot, sending an email created with the bot, or even initiating a new conversation based on the previous one); this type of behavior was a direct confirmation of the value provided by the bot.

The next sections dissect the reasons behind the helpfulness and trustworthiness ratings.

Bot Helpfulness: Hits and Misses

There were several reasons why people were so satisfied with the bots, but they all boil down to one CARDINAL big theme: the bots shortcut the task of information foraging.

Shortcutting Information Foraging

Information foraging on the web is a process that traditionally has 3 CARDINAL components (with the last component being present in only some tasks):

Find potential sources of information (usually as a set of links on a search-engine results page) Evaluate the most promising sources and extract relevant information from them Aggregate all the found information into a coherent answer that responds to the user’s need.

All these operations normally have a high interaction cost.

Search engines help users with the first ORDINAL

two CARDINAL parts of the information foraging: they rank the different pages and, more recently, may present a short excerpt from a page in the form of a featured snippet. For many queries, the featured snippet will include exactly the answer sought by the user and the user will leave the search-results page without clicking on any link (a phenomenon known as good abandonment).

Generative AI bots, however, are able to do that, and much more. With them, the steps of information foraging become a black box: the user presents a query and gets a coherent answer based on multiple sources of information, often without knowing exactly where that information comes from. This eliminates the need to scour through many sites, scan pages and pages of information, and extract the relevant bits. All this work is done in the background by the AI ORG bot and translates into a huge efficiency bonus for users.

As a result, several participants appreciated that the bot had saved them time:

[ Bard PERSON ] wrote a very concise and simple email that gets the job done and saved me time .

. My general thought of the whole program was that [ChatGPT] was quick and it was accurate.

and it was accurate. Bard PERSON has […] Google ORG search engine to go and play with and can research multiple databases simultaneously within seconds TIME to go and pull information . For me, it is an incredible time saver , much like an assistant.

. For me, it is , much like an assistant. I have been completely amazed with the AI [Bing] over the last two weeks DATE and it will become part of the daily DATE work effort now. There are so many things that I can use it to assist me during work to make me more efficient and productive.

However, people’s emotional reactions to this improved efficiency were mixed. Some participants were concerned that the bots may present a threat to their jobs:

I was genuinely surprised and all at once threatened by how quickly the AI [ChatGPT] was able to digest the information on GIF Stickers ORG and their usage.

Other participants recognized that this efficiency would change the nature of their jobs, but in a positive and exciting way:

I’m a graphic designer by trade, you know? […] I definitely can see some things that I do for my job that would be time-consuming, are now just […] done in seconds TIME [with ChatGPT]. So, I don’t see that as necessarily challenging to my job. Like, it’s gonna take away a job. It just means that you’re gonna have to pivot where you’re using your time and you’re gonna say, ’well, I’m not gonna need to waste my time on this stuff necessarily in the future,’ and I’ll just be able to focus my time on something different.

The component of information foraging that most benefits from AI bots is aggregation, which is usually present in research activities. Such activities typically involve combining information from multiple sources and extracting the essential.

Several participants noticed and commented on this aspect:

It’s wild to me how much information [ChatGPT] has, — like, it’s able to combine two CARDINAL things.

So, ChatGPT PERSON is able to […] look at a really wide range of information and then narrow it down to exactly what I need, which is useful.

to exactly what I need, which is useful. [ChatGPT] Was able to get me up to speed on a company quickly and consolidate information I have or could be available to me in one CARDINAL place with one CARDINAL streamline of thoughts. This is much more effective than me searching the internet and finding multiple perspectives on the same topic.

I have or could be available to me in one CARDINAL place with one CARDINAL streamline of thoughts. This is much more effective than me searching the internet and finding multiple perspectives on the same topic. It’s a good way to make information concise …[Bard] gave me a table or a concise way to view all that information.

…[Bard] gave me a table or a concise way to view all that information. And […] instead of having to go through all of […] the webpages to get the information, [ChatGPT] would just give it back to me.

Instrument for Learning

In many situations, bots helped users know what they didn’t know and taught them about the structure of the information space they were exploring. Bots’ responses often contained facts or terminology that were completely new to users. These answers inspired participants to ask clarification questions (or query refinements – see below) and explore different avenues, as illustrated by the following quotes:

[ChatGPT] gave me an overview and then subsequent questioning was saying here are some other things that are related to that and I didn’t consider it… I would go and utilize this new line of questioning and ask the AI ORG what it could give me about that. So as far as searching through, or developing threads in research, that’s incredibly useful.

and ask the AI ORG what it could give me about that. So as far as searching through, or developing threads in research, that’s incredibly useful. I feel confident that ChatGPT ORG will be able to hold my hand throughout this project to verify data, provide ideas for additional data sources to make more accurate predictions, and strengthen my data model[..] [it] could help me ‘analyze the data and identify trends or patterns that might be useful in making an informed prediction.’

The bots thus acted as an instrument for discovery and learning in unfamiliar domains, enriching people’s understanding of a subject and helping them acquire nuance.

In some cases, the bots were able to correct human errors or misconceptions and steer users on the right path. For example, when one CARDINAL participant asked ChatGPT ORG to write a 70 CARDINAL -page story for children ages 2–7 DATE , the bot replied that that book would be too long for a child that age and gave a shorter one instead. The participant said:

[I was] blown away at the advice given for my prompt. [ChatGPT] AI recognizing my two CARDINAL requirements were in conflict and resolving it. The book is good too!

Some thought the bots allowed them to explore unfamiliar topics without time-consuming training:

I find the coding aspect even more [empowering] because […] I can translate things in languages I don’t understand or just […] get a good head start on things I don’t really understand. And technology moves fast.

Pain Points

Despite the high scores for helpfulness, there were several situations that generated low ratings. Sometimes these were caused by occasional bugs (e.g., when the bot did not provide a response or took a long time to respond); but most issues arose when the bots were not able to execute information foraging well, up to the level expected by the user.

Too Broad, Vague ORG , or Obvious Information

People were unhappy when the bot provided information that felt generic or too obvious. For example, one CARDINAL user was annoyed when ChatGPT ORG gave high-level, general information to the question of how to find entry-level work.

A different participant was disappointed that Bing provided very broad, easy-to-guess answers to the question of how to get better at chess. He said:

I feel like this chat got a lot of bullet points with little detail. I started off asking ‘How can I get better at chess?’ The answer was: learn the rules, play a lot, practice, and study. All good things to do, but no help in how to do them […] [Bing] provided [suggested] questions that I thought were good and would get a little deeper: ‘How do I study the end game?’ and ‘How do I study the opening?’ They both returned the same answer just swapping out “endgame” for “opening,” again just telling me to study and practice with no further detail how. These results did not even include links or videos pointing me in the right direction.

Because crafting prompts for bots takes more time than typing a query in a search engine, users have higher expectations for these tools than for search engines. They want answers specific to their particular situations. When AI chatbots fail to outperform search engines, users are disappointed.

For example, one CARDINAL participant asked ChatGPT about things to look for when buying a 1991 DATE

Ford ORG

F150 PRODUCT from a secondhand ORDINAL buyer. The bot offered general tips that were valid for buying any old car, instead of providing the user with the specific issues that may have been typical for that car. The participant rated the vague response provided by bot as the most frustrating conversation:

I was really hoping that it was going to provide me with specifics for the car, and it just gave me […] a general overview of what you should do when looking at any used car. […] I was not really impressed with this because I could Google ORG what to look for in a used car and it would give me probably this and more. I was hoping it would give me […] model-specific things to look for.

Participants also assigned low ratings to conversations where the bot seemed to avoid a controversial topic:

I wanted to know more about the Week in Virology podcast with people who volunteered to let themselves be infected with COVID. […] Bing was a wee bit elusive in answering.

Bing PERSON seems to be guarded against anything to do with COVID or the economy .

. I asked Bing PERSON to write me a song in the tone of Sir Mix-a-Lot […] but Bing PERSON does not, and says, ‘well, this is kind of offensive material’ […] I wanted to take a risqué format, like Two CARDINAL Live Crew and Sir Mix-a-Lot, and see if it would soften it […] And in fact, it could not, was guarded, [..] it didn’t facilitate that at all.

Participants expected the bot to ask additional questions to refine their queries and be able to generate more specific or nonobvious answers. However, bots did not do so often enough: only 16 CARDINAL out of 425 CARDINAL conversations ( 3.76% PERCENT ) contained a bot-initiated query refinement. (Of course, many queries did not require a query refinement.)

I started with a very bland statement “I need a book recommendation” and [Bing] returned the top 8 CARDINAL books for 2023 DATE according to Vogue ORG . That seems like an okay place to start, but I feel like it should provide followup questions to me to help narrow dow n or refine my search instead of just the [suggested] followup questions to ask it.

n or refine my search instead of just the [suggested] followup questions to ask it. I didn’t really know how to start this chat, so I tried ‘Should I buy life insurance?’ [Bing] answered with a very basic ‘if you have dependents yes, if not no’ and an ad for AARP ORG , which I believe is only for seniors. I am interested in life insurance so I clarified, ‘should I buy term or whole-life insurance?’ It explained the basic difference between the two CARDINAL but provided no help in deciding or really comparing. I did like the provided followup question asking what is the cost of the two CARDINAL . This was a more detailed answer that gave an actual scenario and estimate. It also finally provided some links of places to read more about what I was looking for. It then provided the followup question ‘how do I choose between term and whole life insurance?’ which is really what I was looking for out of this. Again, it basically said ‘it depends’ and to consult someone.

A complaint that was specific to Bing PERSON was that it completed only the first ORDINAL part of the information-foraging process. Instead of providing an answer, it showed links to sites that contained the answer, thus forcing the user to do the work of extracting the information from those sources:

Bing provides resources, not answers . The burden of being trustworthy is lifted off of Bing the moment they present a resource instead of an answer.

. The burden of being trustworthy is lifted off of Bing the moment they present a resource instead of an answer. While it did pull links to Nashville Event ORG pages, I don’t feel that Bing PERSON presented anything more than just typing ‘things to do in Nashville GPE

this weekend DATE ’ into a search bar. Additionally, when I asked about free events, it did pull up a concert in Centennial Park FAC as an option; however, when I followed the link, that event was not easily visible on the page.

Forgetting or Ignoring Context

Occasionally, the bots forgot the larger context of a conversation, which the user had included in an early prompt. For example, a person who was looking for easy hamburger side dishes complained:

When I clarified I was looking for a vegetable and a starch I received the same response, and then a few additional options that […] I thought were too complicated. I am not looking to make a casserole tonight TIME . I then tried to clarify I was looking for something easy, and [Bing] forgot I was looking for hamburger side dishes. It searched for and responded with easy recipes like protein pancakes and ice cream.

Another user was trying to gather information about starting a laundry service. When, later in the conversation, she asked, What is the going price per pound? Bard PERSON responded with the price for scrap metal.

Sometimes the bot completely ignored the user’s context. For example, even though Bing PERSON had access to the user’s current location, it was not able to find movies playing in that area:

I asked it to find movies playing near me. We are visiting so I figured, with [current] location on, it would pick up the local theater; it did not.

Irrelevant Information

Participants were annoyed when the chatbot provided more information than necessary or information they felt did not match their interests:

I did use the Thumbs Down [feature], because the [ChatGPT] chatbot continued to give me suggestions of shows that I said I was not interested in.

Sometimes the source of annoyance was that the bot was not able to pick up on implicit cues and stop the conversation.

For example, a participant had started a conversation about why Taylor Swift PERSON wants her fans to be obsessed with her. After a few exchanges in which ChatGPT ORG explained the unhealthy obsessions with celebrities and related them to social media, the participant typed, Social ORG media has overtaken all aspects of most things. She meant this statement to confirm what the bot had said (the equivalent of “yes, you’re right” in a conversation with a human) and end the conversation. But, instead, the bot launched into an explanation of the social-media impact on society. The participant said:

I don’t need that information after I just wrapped it up and said, yes, social media has overtaken all aspects of most, most things. Then I gave a thumbs down, the response was unnecessary. It was very unnecessary. You don’t need to explain what and how social media has impacted many aspects of society. I didn’t need a followup for that. I was just agreeing with what they, you know, they had said.

Notice the use of they to refer to the bot. The bot had set the expectation that it’d behave like a human, and the participant was disappointed when it did not display a human level of subtlety.

Trusting Without Double-Checking

In general, as reflected by the high trustworthiness ratings, people believed in the answers provided by generative AI ORG . They based their judgments of trustworthiness mostly on common sense and on their prior knowledge of the topic:

I have never been to these cities before and honestly do not know anything about them so this itinerary was very helpful. [ChatGPT] gave a description of each restaurant plus popular dishes. [..] I believe that these restaurants are all rated highly on various websites. I am visiting friends who live out there, so I am curious to get their take on how popular these recommendations are.

Being close to this topic, I can validate that this information [provided by ChatGPT] seems valid, and nothing seems out of line or incorrect.

Only 22.43% PERCENT of the conversations were followed up by a verification of the information provided by the bots. People tended to double-check the bots’ answers in the following situations:

The stakes were high. For example, users would check the answer if they needed to include it in a presentation at work.

For example, users would check the answer if they needed to include it in a presentation at work. They were a new AI-bot user: Many users tended to become more trusting as they used the bot more.

Many users tended to become more trusting as they used the bot more. They suspected the bot may struggle with the prompt. Some users would double-check the answers for queries that they thought the bot was likely to mess up. This was especially true for experienced users. For instance, one CARDINAL participant mentioned that she had learned not to trust location information provided by the bot and tended to double-check those answers only.

The low verification rate was likely because participants considered most answers to be correct. According to participants’ judgment, the bot came up with the wrong answer in only 7.76% PERCENT of their conversations ( 95% PERCENT confidence interval: 5.56–10.75% PERCENT ). (It is hard to know, however, if this estimate reflects the actual number of inaccuracies, since we did not attempt to verify the bots’ answers.) Sometimes the bot displayed products or items that were not available on the web.

When I asked for suggestions for 3 CARDINAL -bedroom houses, I was given 3 CARDINAL suggestions [by Bard PERSON ]. None of them led me to the actual rental that was pictured.

[ Bard PERSON ] gave me product suggestions that I could buy online but all the stores that it said you could find the perfumes at did not exist or were closed.

I didn’t get the desired results. [Bing] gave me old concert information and where I could go to watch old concert videos. It wasn’t what I asked for and it didn’t deliver the desired results.

One CARDINAL participant wanted to find information about the Great Wolf Lodge waterpark, but Bing PERSON directed her to the corporate website instead of the park’s site. The participant spent more than one minute TIME on that site before realizing she was on the wrong site.

[Bing] did give me the information I was looking for, BUT it did not give me the right link in the beginning. I had to use other provided links after I wrote out the wrong links. It took me to their Corporate ORG page and that’s not what I wanted. However, I guess I could have changed my wording but I feel like how many people are really looking for the corporate information of Great Wolf Lodge ORG , the waterpark.

Trusting without verifying can be dangerous, since the bot has the tendency to provide incorrect information that sounds plausible. It is, however, unsurprising when we consider what we already know about humans’ information-seeking behaviors. For example, we know that, in many cases, users have limited research skills and tend to be happy with the suboptimal results they get from search engines. In our research of search behaviors, we found that users clicked through the second ORDINAL page of the search results in only 2% PERCENT of the cases (an example of Google ORG gullibility).

AI Tools Should Provide Easy Verification

Checking answers has a high interaction cost, which, in many cases, negates the advantage provided by the bot. To verify an answer, users will have to go through at least part of the traditional, web-based information foraging: locate some good sources and look up the bot’s answer through them. It’s natural for people to do it only if it’s really important for them to get the answer right.

Thus, generative AI ORG tools should help users to double-check the information source quickly and easily.

At the time of the study, Bing PERSON was the only tool that facilitated answer verification by listing where it had found the information provided in the answer. ( Just a few days DATE before this article was published, Bard PERSON introduced a new feature to support answer verification.) However, that process was still fairly tedious: users would have to visit the provided links, scan through the text, and then come back. Some were worried that they would lose their answer while doing so.

Participants used the credibility of the referenced sources as a supplemental way to assess how trustworthy the answer was. For example, when one CARDINAL participant who had searched for job recommendations did not see LinkedIn ORG among the sources, she immediately doubted the trustworthiness of that specific answer.

How AI Bots Can Be Improved

One CARDINAL of the main advantages of generative-AI bots like ChatGPT, Bing PERSON , and Bard PERSON is that they substantially diminish the interaction cost of information foraging by aggregating knowledge from various sources and presenting the essential or, sometimes, even a solution for a specific problem. They can also help users learn a new domain and open new avenues for exploration. Participants in our study found these tools very helpful and tended to rely on the answers provided by them. They used these tools primarily for a range of information-seeking activities, from fact-finding to research and recommendation.

However, the current bots can further perfect the task of information foraging by:

Always providing information that takes into account the full context of the conversation and of the user

Intelligently aggregating the information present in the sources as opposed to just pointing to sources

Asking questions to narrow down the query when the answer is too broad

Warning users of unsupported or poorly supported information

Helping users easily check on the accuracy of the presented information

They could even go one CARDINAL step further to help users in the task of creating prompts, so that they get the best possible answers.

Including such tools in domain-specific software (such as UX or graphic-design tools) and making it easy for people to access them will empower users. But, to be useful, these tools should be more powerful than a traditional search engine: whenever answers are generic, do not take into account the user’s context, or require the user to do the work of aggregating information from multiple sources, users are disappointed.

Connecting to blog.lzomedia.com... Connected... Page load complete