Meta’s AI Training Activities: Legal Basis and GDPR Implications

What Information is Meta Using?

Meta announced in a recent blog post that it intends to expand the use of AI for their services. To train its AI, Meta intends to use “publicly available online and licensed information to train AI at Meta” but also “the information that people have shared publicly on Meta’s products and services”. The information which Meta says it intends to use are:

  • public posts
  • public photos and their captions
  • (in the future) information shared when interacting with Meta’s generative AI features

While it would seem that “public posts” would only include posts made to Public as opposed to “Friends”, it is not clear what is actually included. Meta declares that it does not use “the content of your private messages with friends and family”. It is not clear from the post whether the use of words “messages” refers to the Messenger feature (as in “we will not use your posts in Messenger”) or to posts in general (as in posts to Friends). The text is ambiguous. It could be argued that it is deliberately so but it should be assumed that Meta collects all information until proven otherwise. This is because it refers to its (long and incomprehensible) Privacy page which does not exclude posts to Friends or other non-public audience from its scope.

The Legal Basis

Meta’s declared GDPR legal basis for AI training are legitimate interests (Article 6(1)f). It is worth quoting it here in full:

processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.

In the 2023 Meta case (see my blog posts here and here), the European Court emphasised that processing of personal data for marketing purposes may be regarded as being in legitimate interests as per GDPR Recital 47 but that such interests may be overridden by the interests or fundamental rights and freedoms of the data subject. In particular it said that this is “where personal data are processed in circumstances where data subjects do not reasonably expect such processing” (paragraph 112). In other words, where there is ambiguity about what is happening, where data subjects do not expect or cannot predict what is collected and how, their privacy interests trump those of the controller. The processing of such data must be “strictly necessary for the purposes of that legitimate interest”(para 126) in order to be lawful. The Court hastens to add that Meta’s economic interests and its ad-based business model are not legitimate interests in that sense.

At the bottom of its aforementioned post, Meta claims that legitimate interests cover first and third-party data used to “to build these services”. The conclusion here is simple: Meta is developing the AI to further its economic interests and stay competitive. This is reasonable, lawful and unproblematic. Meta is relying on legitimate interests basis for this. This, on the other hand, is problematic for numerous reasons:

First, CJEU excludes the use of legitimate interests for generic ambiguous purposes and places stringent control mechanisms on them. The controller needs to have informed the users of the interest pursued, the processing needs to be carried out “only in so far as is strictly necessary” and it must be “apparent from a balancing of the opposing interests (…) that the interests or fundamental freedoms and rights of those users do not override that legitimate interest of the controller.”

Second, Meta seems to be in the process of informing the users that the deadline until which they can object is June 26. The users cannot do so easily but must fill in and submit a form where they explain why they wish to do so. Meta will then review the information and react to the request, positively or otherwise. The imposed deadline is in violation of GDPR Article 21 which allows the user to object “at any time to processing of personal data concerning him or her” if data is collected on the basis of legitimate interest basis.

This author received a notification, objected using the form and received a (positive) answer within 10 seconds of submitting the form, suggesting that Meta is algorithmically managing the requests. It is not clear at present whether negative answers are given and the extent to which this may be the case.

Third, the problematic and complicated form and the unlawful deadline put aside, it is unlikely that CJEU, the EDPB or any of the national authorities would take the position that Meta’s ultimately commercial interests take precedence over the interests of a data subject for that would mean that the development of a new and potentially useful technology (Meta claims legitimate interest “to build these services”) trumps private interests. This is manifestly not the case. While we may, under an optimistic scenario, assume that Meta’s AI will have positive and societally beneficial uses, this still does not form legitimate interests under GDPR’s narrow test.

Fourth, AI can be trained on publicly available or commercially acquired datasets but it can also be trained on private information lawfully obtained from the users. Meta’s AI seems to be relaying on a combination of the above but it is the latter that is most problematic. In order for Meta to process personal information for AI training purposes, it is unlikely that any basis other than clear and specific consent would be lawful. This consent, we now know after the 2023 judgment, needs to be granulated (see EDPB’s Opinion here). Even if Meta were to rely on consent in the future (having discovered that legitimate interests would not work), it is unlikely that anything other than a specific AI-training related consent would be acceptable. Such consent can be withdrawn at any time.

Finally, in terms of information obtained by scrapping from the web (“public” information in terms of Meta’s post), the situation is equally clear: data which is made public does not stop being personal data the processing of which requires a legitimate legal basis just because it is made public (this too has been emphasised in the Meta case, in terms of sensitive data). Here too, the most likely legal basis would be legitimate interests and here too Meta will come short for the very same reasons discussed above.

A Note on The AI Act

Let us assume the unlikely scenario where Meta is able to lawfully train its AI on users’ data. While GDPR controls how and when personal data may lawfully be processed, the AI Act controls risky AI technologies themselves, their deployment and use, and imposes ex ante risk identification and mitigation measures on high-risk AI. It also prohibits certain AI altogether. The AI Act has not been published in the EU Official Journal yet but its final form is known.

Does Meta’s AI fall under the prohibited category? Article 5 clarifies that this includes AI systems that deploy subliminal techniques, systems that exploit vulnerabilities, systems that classify natural persons based on their social behaviour, untargeted scrapping of images from the Internet, etc. Is Meta’s AI at risk of being prohibited. This is a distinct possibility but we do not at present have clear information that would indicate so. Is Meta’s AI a high-risk AI service? This is unlikely in terms of Annex III, at least at present. The uses mentioned in that Annex are relatively specifically and narrowly defined.

The AI Act also covers other systems and it is here that some of the obligations it imposes, including transparency obligation, may impact Meta’s training model.

Whatever the conclusion, Meta will at least be under the transparency obligations for the systems it develops.

Concluding Remarks

Recent developments concerning the company do not leave much space for optimism. Meta has been accused of violating child safety and rules on deceptive advertising under DSA. Most recently, Meta has been in focus for refusing to remove fraudulent ads (see here). This author has attempted to report fraudulent ads on Facebook over 50 times in the past few months. In each case, it was blatantly obvious that ads were based on fake news and that they led to illegal activities (a click on a fake news item would lead to a bitcoin scam or such). Meta refused to remove the ads in every single case. It is unlikely that the company (whose revenue is $134 billion) is going to be careful about users’ data when it trains its AI if it cannot or will not solve the illegalities mentioned above. The point is very simple: Meta is not a trustful company, nor one that has its users’ interests at heart. Its interests, whatever they may be, are not legitimate under Article 6(1) of the GDPR. Its AI training activities are unlawful and should swiftly be banned by the EU authorities.

Leave a comment