Microsoft is using Word and Excel user data for AI training by default (or not?... Update)

A seven-step process to disable the feature

26 Nov 2024, 14:43 by Rob Thubron · TechSpot

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

A hot potato: The controversial issue of companies training their AI models on user-generated content is once again under the spotlight. This time, the programs in question are the widely used Microsoft Office and Excel. The data gathering is enabled by default, and opting out is a laborious, multi-step process.

Update / Correction (Nov 26, 4pm): Microsoft has reached out to confirm that it does not use customer data to train its large language models. They also posted the following tweet on their Microsoft 365's X account:

The original story follows below:

Microsoft's connected experiences in Office analyzes user content to provide the likes of design recommendations, editing suggestions, data insights, and similar features.

On X, nixCraft pointed out that the Redmond firm has recently enabled a feature that scrapes users' Word and Excel documents to train its internal AI systems. It's turned on by default, too.

As is so often the case when a company wants its customers to keep something enabled, opting out of the data collection is far from quick and simple. On Windows, it requires going to File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences and unchecking the box. Furthermore, once you untick the box, a prompt appears warning that disabling the option means some experiences won't be available.

// Related Stories

For those few who think this isn't a big deal, Tom's Hardware notes that a clause in Microsoft's Services Agreement grants the company a worldwide and royalty-free intellectual property license to use your content.

"To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services," according to the clause.

Microsoft isn't the only one guilty of this sort of underhand behavior, of course. Meta also uses public posts, comments, photos, and interactions with chatbots from Facebook, Instagram, Threads, and WhatsApp to train its AI models. Unlike in the EU and UK, those in the US do not have a straightforward way to opt-out – setting your account to private helps, but it still isn't a guarantee this won't happen.

In August, it was revealed that Nvidia, the company whose hardware powers the generative AI revolution, had been downloading 80 years of videos daily from YouTube, Netflix, and other platforms to train its AI models.

Microsoft hasn't commented on the story. It's possible that the outcry could lead to a clarification of its terms of use, much like Adobe did after after a popup suggested that the company could access and claim ownership of content made with its creative suite to train AI models.