Earlier today, Google released Gemini, a shift from the “Bard” brand and the inclusion of a new paid tier. Gemini Advanced, based on Google’s Ultra 1.0 model, is Google’s most convincing attempt to take the battle to OpenAI’s ChatGPT.
While the Ulta 1.0 model provides some exciting additions, such as the ability to upload videos and raw audio processing, how does it stack up on data privacy? While the transparency around data retention is good, there’s still outstanding questions on how they use prompt data to train their models.
Transparent and responsible from the start
As soon as you login, you're presented with the option to “learn how your data is handled”, which directs you to read the Gemini Apps Privacy Hub documentation. First impressions? Pretty good and easier to access than ChatGPT.
On safety and responsibility
A more detailed overview of Gemini is provided on the DeepMind page, that extends to information on the “safety” of the mode. Gemini purports to be more responsible and transparent to combat bias and toxicity.
It is evidently trained to give more “responsible” responses than ChatGPT. For a real-world example of this, check out my post on how Gemini refused to judge a gingerbread house contest in the style of Ricky Gervais.
Data retention and deletion
The Privacy Hub states that Google “stores your Gemini Apps activity with your Google Account for up to 18 months, which you can change to 3 or 36 months in your Gemini Apps Activity setting”. The customization you have to define these periods, and then even delete from custom time periods, puts the user in control.
Although you can delete all chats in ChatGPT, you do not have the granularity of controls that Gemini offers.
Training of data
While the data deletion of Gemini is transparent and easy, the training of your data is where things get a little…murky.
Getting information on if Gemini uses your data for training is problematic. To get any sort of answer, we had to ask Gemini itself. The answer from Gemini is that “does not directly train on your individual prompts in the same way that some fine-tuning approaches do with language models”. It goes on to explain that “this is done in an aggregated way, not prompt by prompt.” This implies that they may anonymize the data before performing analysis on it, but it is unclear.
This was previously a concern for developers using Gemini for free, where it was stated that data "may be accessible to trained reviewers".
When compared to ChatGPT’s clear overview of when your data is and is not used for training, it raises more questions than it answers.
Future developments - Duet to be Gemini for Workspace
It is important to note that Google has stated Google Duet for Workspace will become Gemini for Workspace. Google Duet is currently the AI assistant for the workspace aimed at enterprises and does provide more clarity around training on data. Google states that the “content that you put into Google Workspace services (emails, documents, etc.) is yours. We never sell your data, and you can delete your content or export it.” It’s reasonable to assume this won’t change when it rebrands to Gemini for Workspace.