Assessing the efficiency of AI Generation against traditional image search

In March 2023, Getty Images announced their collaboration with NVIDIA. I worked with the AI Generation team which was beginning to explore how creative workflows were changing and how stock content might be at risk. The goal of this study was to understand how efficient AI generation is relative to traditional image search on a stock photography website. In particular, we wanted to understand perceived time and effort. The business considered these key metrics when assessing the risk towards stock content.

Completed at Getty Images (9 month contract; remote)

Problem

This product team is grappling with the new risks and opportunities of AI generated content, and needed to move quickly. They were planning to integrate AI generated images into an existing stock photography library, but there were a lot of unknowns and challenges to balance. We needed to encourage use of AI generation while protecting sales in stock content. Content creation will also be the first time Getty Images provides a service. The team needed to understand the relative efficiencies of both options, and lean into strengths.

Objectives

1

Compare the processes: Understand how users’ perceptions and efficiency differs between generating an asset with AI versus a traditional search on a stock site.

2

Understand the biggest challenges with AI generation and traditional search. Focus on any necessary adjustments for a successful launch.

Method

Participants were given a target asset and asked to search for and create an image that’s as similar as possible. Traditional search was on iStock. Generating an image was on OpenAI or an AI generation tool of choice.

Before the session: A scenario was emailed to help participants understand task context and get into the right frame of mind.

Task 1: I let the participant choose which method to use first, AI generation or traditional search. This was a good opportunity to capture the hopes and fears they had for each method.

Task 2: The participant completed the same exercise using the alternative option.

Interview: I asked participants about their experience with the platforms. They were asked which method they preferred and how long they thought it would take to edit the image using editing tools such as Photoshop.

Survey: Questions were asked on

Satisfaction with the image
Satisfaction with the options (traditional search vs AI Generation)
Use on a work project with/without editing
Confidence with the platform

Participants completed the surveys while on the video call but were asked not to share their screen.

The target asset that was given to participants: A cropped shot of a wooden coffee table with laptop, smartphone and a cup of tea in the living room at home by the window against beautiful sunlight.

Who we recruited

The initial strategy was to get feedback from people with varied AI experience levels. This is a quickly changing space, where many users have played with AI, and some still haven’t.

In a recent survey, we identified early adopters who were already using AI generation for work projects. We used these users to define our ‘experienced’ group – they were mostly from large businesses or agencies and had used it less than 20 times. We recruited 15 participants and split them out into 3 groups

Group 1 – experienced with AI Gen; compared iStock to Open AI.
Group 2 – inexperienced with AI Gen; compared iStock to Open AI.
Group 3 – experienced with AI Gen; compared iStock to their AI Gen (Midjourney or Adobe)

I recruited experienced participants via User Interviews. Some novices via the internal participant panel.

Why recruit novices if they weren’t the target audience?

From the survey, we found that 76% of users expected to be using AI generation for work projects within 0-4 years. Based on this rate of growth, it made sense to include novices in the recruitment pool. Even if they weren’t the initial target audience, we needed to understand their needs.

How did we define experienced and novice?

This is a fast changing landscape, so language around ‘novice’ and ‘experienced’ had to be clear.

Novices have no experience using AI generation but expect to use AI generation in the next 0-4 years.

Experienced users are already using AI generation for work projects. They’re demonstrating high usage (a few times a week/ everyday) on more than 20 work projects.

Why add a third group asking participants to compare iStock to their own AI generation tool?

We actually added a third group once the study had begun. We noticed that even the experienced participants were very unfamiliar with OpenAI. The intention was to see how results would differ when the participant is more familiar with the AI. It was also interesting to see which features they’d take advantage of.

Process

From the outset, I knew we needed to have standardized tasks, to make a fair comparison between traditional search and AI image generation.

And I knew I’d want to follow those tasks with a short interview and survey, to unpack participant perceptions and capture comparable data.
We started what we knew about why users have a preference for AI generation over stock content.

Triangulation

I collaborated on this project with another researcher, Maryam. Maryam ran a diary study on real life uses of AI generation (e.g. use cases and situational factors).
I focused on differences compared to traditional search. In particular, interaction challenges with AI generation.
The plan was to triangulate the research. We wanted to build an impression of how AI generation is changing the creative workflow.

Stakeholder Alignment

I conducted a number of workshops and sessions to help reach alignment with stakeholders. There were a number of considerations that needed to be accounted for to refine the testing.

Which stock site should we test with? I knew we would test on a Getty Images product (iStock, Getty Images or Unsplash) but it surprised me to learn that we wouldn’t be testing on Getty Images. I raised concerns about the platform differences and how this might introduce variables. This led to discussions with internal Subject Matter Experts. In the end, we decided to test on iStock. This was a data led decision; there’s a big overlap in iStock and Getty Images customers. The content on both sites is also similar in many ways.	How much flexibility could we allow in the search process? There are many ways to search for stock photography, including search by image and Google. It made sense to allow for search flexibility (increasing validity) but it would introduce variables. We spoke to stakeholders and clarified their concerns. In the end we focused on text based searching on the stock site only. This made the tasks more comparable across participants, and kept us focused on the Getty Images ecosystem.	Which AI generation platform should we test with? Senior leadership made this decision. My only concern was consistent and easy user access. We went with OpenAI as it doesn’t need user registration.
Should we use a target asset? The advantage of using a target asset is that we’re removing variables like the time it takes to come up with an idea. It takes the cognitive load off the participant and allows them to focus on using the Gen AI. It also helped us manage time during this 1 hour session. People were giving a maximum of 15 minutes to complete each task – many didn’t need that.	What to use for the target asset? I facilitated multi-disciplinary discussions to identify our image requirements. We wanted the asset to feel equally challenging and doable to find on iStock and Open AI. That way, the study wouldn’t end too quickly with one obvious winner. We’d also have more opportunity to learn about participant challenges. After a few pilot sessions we made our decision. We went with the image of the laptop as the experience on OpenAI and iStock felt similar.	How close should the participant’s copy of the target asset be? The main purpose of this study to measure efficiency (e.g. time on task). We asked participants to produce an almost exact match of image, if possible. In the sessions, some participants spoke of recreating ‘the gist’ of the image. I prompted them to create an exact match if possible.

Read the full research plan.

Considerations for the target asset which I took into pilot testing. The goal was to find a target asset that was equally challenging and doable to find on iStock and OpenAI.

Metrics

We definitely wanted to understand perceived time and effort. The business considered these key metrics when assessing the risk towards stock content. I also selected metrics from the System Usability Scale (SUS) which we believed would be associated with adoption.

Name	Measurement
Performance (Behavioural; quantitative)	Task success rate Time on task
Satisfaction (Attitudinal; quantitative)	Estimated time to edit (for use on a work project) Use on a work project without further editing (iStock and Open AI) Perceived similarity to target asset Satisfaction with image (iStock and Open AI) Satisfaction with platform (iStock and Open AI) Overall preference
User Pain Points and Needs (Attitudinal; qualitative)	Severe, moderate and minor issues captured from recordings

Analysis

Interviews were tagged within the global tagging scheme for AI. This helped us keep track of patterns across projects.

Results

This work was presented in two sessions, as an executive summary and a full report, to the CEO of Getty Images.

How the results were shared:

Initial executive summary focusing on key metrics and top 3 themes.
Full report with the full product team. I presented this using a framework for cognitive strategy prompts. This made it easier to compare the two methods.
Full report with the initial key stakeholders, including some additional data cuts.

These findings informed the direction of advanced features for the AI generation roadmap. They were also directed to the iStock search team.

Final outputs for Groups 1-3 using their respective AI generation platform.

Things I'd do differently

How often do users attempt to generate an exact copy of an image?

There were still many variables with this approach. Everyone took different things from the image, e.g. the angle of laptop, phone on the laptop.
Next time, we’d need look at the impact of having a different target image or a more flexible creative brief.

Some metrics, like estimated time to edit, were difficult to self report.

We didn’t ask participants to do the modifications themselves. Some participants may have found this difficult to self report.
Next time it might be worth asking participants to attempt the modifications, or at least bring the images into their editing tool to more closely consider the question.