OpenAI is Accelerating our Development of Automated Data Security Insights

Ariel Weil
Jun 6, 2023
July 30, 2023

Today we announced our integration of Azure OpenAI to the Cyera Data Security Platform. Cyera’s data research team has been implementing machine learning models since we got started, using public training data and our own expertise to create our AI-powered data insights. This integration enables us to add large-scale, generative AI models to enable new reasoning and comprehension capabilities over data. This will provide customers with faster results, even more precise classification, deeper business context on the intended use of data, and anomaly detection. 

Cyera automatically detects, classifies, and understands data across our customers’ data landscapes. We do this using fine tuned state of the art (SOTA) models trained to perform Named Entity Recognition (NER) tasks. NER is a function of Natural Language Processing (NLP) that identifies and classifies named entities into predefined categories such as person names, organizations and locations. This enables Cyera to automatically identify and learn a customer’s unique data with extremely high precision, at speed and scale.

Let’s put speed and scale into context, using a simple example. Let’s say you have data in AWS S3 object storage. To understand any data security exposures that exist with that data, you might use Amazon’s cloud-native data loss prevention (DLP) tool, Macie. Macie represents the legacy, pattern-matching approach to data classification. This means you have to know that the buckets exist, point Macie at the buckets, then rely on basic pattern matching and signature-based rules to identify security exposures. Using a sample data set of 2057 structured and unstructured files in an S3 bucket, Macie presented a very generalized output.

AWS Macie Classification Results

When we address this same challenge using Cyera, our agentless platform discovers all of the buckets and other AWS datastores automatically. Unsupervised machine learning then identifies patterns of data, and presents clusters of like files to supervised machine learning for classification. Topic extraction and NER are performed, providing not only classes of data, but additional context on the role, region, identifiability, and protection of each class. Cyera presents a far richer picture, discovering 

  • IP data and secrets that Macie missed
  • 80 additional data classes
  • Object level classes that Macie could not interpret
  • Millions of sensitive data classes (including phone numbers, credit card numbers, and passwords) that Macie missed (representing false negatives) 
  • Context on the data’s role, region, identifiability, and protection

The platform then applies the correct security, privacy and compliance policies to uncover and remediate exposures wherever data is being managed. This level of detail and accuracy is currently available across IaaS, PaaS, and SaaS cloud environments. 

Adding the vast amounts of training data to our existing machine learning models will accelerate the platform’s ability to identify patterns and provide personalized recommendations. Cyera will use this capability to further tailor data security policies in order to help security teams make more informed data security, privacy, and governance decisions.

OpenAI’s Large Language Models (LLMs) will also accelerate how Cyera governs access to sensitive data. Applying topic extraction and clustering models to identities will allow Cyera to group users together to improve access management. Anomaly detection highlights unusual access patterns, for example when a user that typically accesses data through an application starts to extract large volumes of data in bulk. It will also allow the platform to predict actions based on similar objects, and understanding causation. Cyera’s unified policy engine will take advantage of the LLMs to identify misconfigurations, recommend specific access controls and generate new policies for data access governance. 

Cyera’s vision is for every business to realize the full potential of data — collaboration, connection with customers, insight that fuels innovation — to power a new era of development, growth and productivity. Realizing this potential requires that businesses have the ability to leverage transformational technology responsibly while protecting their most valuable asset - data.