Industry Insights

Top AI Apps Training on Your Data - May 2024

June 5, 2024

Over the past week, the tech world has been buzzing about major players like Slack and Meta training their AI models on user data. But this issue goes far beyond just a couple of big names. The rapid integration of generative AI (GenAI) into countless SaaS products brings immense potential—and significant challenges.

One of the most pressing concerns is data privacy. Many AI-enabled apps might be training on your data without your knowledge. In this blog, we’ll explore the top 10 GenAI-enabled apps that could be training on your data, based on our monitoring and risk assessments at Harmonic Security.

Rapid GenAI Integration: Boon or Bane?

Every SaaS company seems to be racing to integrate AI into their products, promising enhanced capabilities and smarter solutions. While GenAI offers tremendous potential to transform how we work, it also introduces complex security challenges. One major concern is whether these AI-driven apps are using your data responsibly—or at all.

Quick Overview: Top 10 Apps

We analyzed the most used GenAI-enabled apps across our client base that have potentially risky content training declarations. Here they are, in alphabetical order:

  1. BambooHR
    • Uses performance data and feedback for AI development, specifically for providing AI features to the same customer. More details.
  2. Calendly
    • Performs research and analysis on user interactions to improve products, with vague specifics on data use. More details.
  3. Drift
    • Uses personal information for service improvement and development, often anonymized. More details.
  4. DocuSign
    • Implements role-based access controls and security measures to minimize privacy impacts, but does allow for the training of AI models on personal information if customer consent is given. More details.
  5. Grammarly
    • Shares data with vetted partners solely for providing services, prohibiting third-party model training. More details.
  6. LinkedIn
    • Minimizes personal data in training datasets, using privacy-enhancing technologies. More details.
  7. Pinterest
    • Uses user information to improve services and train machine learning models. More details.
  8. Smartsheet
    • Uses data for analytics and service improvement, employing machine learning for predictive features. More details.
  9. X (Formerly Twitter)
    • May use collected information for training machine learning models. More details.
  10. Yelp
    • AI content is trained on platform content and third-party services, aiming to improve user experience. More details.

Detailed Analysis - Top Apps Training on Your Data, In Their Own Words

BambooHR

Policy: “BambooHR may use Performance Data and Feedback collected from customer’s use of the BambooHR AI to further develop and improve the BambooHR AI or as otherwise permitted under the BambooHR Terms of Service. To the extent that BambooHR uses customer Data, Input and Output to fine-tune and train the machine learning methods and data models used to provide the BambooHR AI features, it will do so just to provide the BambooHR AI features to you (‘Customer Trained Model’). BambooHR will not otherwise use the Customer Trained Model to provide services to other customers” source.

Calendly

Policy: Calendly’s privacy policy is vague: “We will perform research and analysis about your use of, or interest in, our products, services, or content, or products, services or content offered by others. We do this to help make our products better and to develop new products. For EU/UK purposes, our legal basis for processing is legitimate interests” source.

Drift

Policy: Drift’s privacy policy suggests: “We use your Personal Information for providing and improving the Services. We also retain your information as necessary…to develop and improve our Services. Where we retain information for Service improvement and development, it will be anonymized and used to uncover collective insights about the use of our Services, not to specifically analyze personal characteristics about you. Information We Collect: (v) to improve our website or interactions with you; (vi) for Drift product or service development” source.

DocuSign

Policy: “Through training, an AI model learns to recognize patterns and make predictions. DocuSign has implemented role-based access controls and technical and organizational security measures to help minimize the privacy impact to individuals when we train our AI models. We intentionally design our systems with functionality to avoid training models using personal information that customers may enter into our Services (except when we have consent from a customer to do so)” source.

Grammarly

Policy: “Any information used to power Grammarly’s generative AI features, such as prompt type, prompt text, and the context in which it’s used, will be shared with our small number of thoroughly vetted partners for the sole purpose of providing you with the Grammarly experience. We do not allow any partners or third parties to use your data for training their models or improving their products” source.

LinkedIn

Policy: “The artificial intelligence models that LinkedIn uses to power generative AI features may be trained by LinkedIn or another provider. For example, some of our models are provided by Microsoft’s Azure OpenAI service. Where LinkedIn trains generative AI models, we seek to minimize personal data in the data sets used to train the models, including by using privacy enhancing technologies to redact or remove personal data from the training dataset” source.

Pinterest

Policy: “We use your information in our efforts to improve our Services, keep our users and the public safe, and to protect legal interests. To do so, we: Improve the products and services of our family of companies and offer new features. For example, using information to train, develop, and improve our technology such as our machine learning models” source.

Smartsheet

Policy: “To better understand how our users access and use the Offerings, to tailor our content and Offerings to users’ needs and interests, and for other research and analytical purposes (e.g., to evaluate and improve the Offerings and develop additional products, services, and features). We use machine or deep learning technologies for these purposes which allow us to provide users with predictive tips and other features (e.g., suggestions for column types or text)” source.

X (Formerly Twitter)

Policy: “We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy” source.

Yelp

Policy: “AI content at Yelp may be trained on Yelp platform content, like reviews, or on training sets provided through third party services, like GPT-4. We are constantly working to improve the Yelp experience, and our use of AI, to provide you with helpful information. That said, AI is a rapidly evolving technology and may sometimes include inaccurate information”. source.

The Dilemma for Security Teams

Faced with these concerns, many organizations have opted to block GenAI apps altogether, adopting a cautious approach. However, blocking entire categories of apps isn’t a sustainable long-term solution, especially as AI integration becomes ubiquitous across SaaS tools.

Shifting to Smarter, Data-Centric Strategies

Organizations need smarter strategies to navigate this complex landscape that focus on securing data rather than outright blocking tools. At Harmonic, we’re at the forefront of developing innovative solutions that provide visibility and control over GenAI apps, ensuring your data remains secure without compromising productivity.

Best Practices for Organizations

When it comes to data privacy and AI, simply being aware of the risks isn't enough. Organizations need to take concrete steps to protect their valuable data assets. Here are some key best practices to implement:

  1. Regular Audits: Conduct regular audits of the apps used within your organization to understand their data practices.
  2. Clear Policies: Develop and enforce clear data usage and AI policies. You can use our free policy generator to create one! <Link>
  3. User Training: Educate employees about the risks and best practices for using AI tools safely.

Conclusion

As AI continues to evolve, so do the challenges it brings. By staying informed and adopting smart, data-centric security strategies, organizations can harness the power of GenAI while safeguarding their data.

If you’re concerned about how GenAI-enabled apps are handling your data, reach out to Harmonic Security. We’re here to help you navigate this new frontier with confidence.

Request a demo

Team Harmonic