Knowing what you have is the first step towards protecting it. Knowing where it is, the second. As you walk along in your data security journey, you’ll discover numerous avenues for identifying, labeling, and securing data. For example, do you have an inventory of what you think you own? Even if so, it is—potentially—only a fraction of what really exists in your environment.
In another scenario, a project ends and that team leaves your company; there’s a OneDrive with all data intact but you have no idea about the nature of its contents or where it should go. This problem might be felt even more acutely by the company if information resides in a cloud storage volume with no active owner.
An established framework for labeling sensitive information can help you understand your data landscape and create meaningful intelligence from it, such as discovering information loss or exploits to enable more effective policies and security controls.
What is the State of Data Labeling
Your company has vast troves of data, much of which is likely MS Office files, PDFs, and other easily transportable formats. These file types, being unstructured relative to databases and transaction processors, can be problematic to track and protect. Many organizations have either not labeled their data at scale or have implemented so many labeling systems over the years that any consistency has been lost to the mists of time. Such processes are difficult to initiate across teams and must be steadily maintained.
Assessing risk then becomes a never ending process to locate, identify, and label data, just so you can scope out acceptable conditions for access, exposure, and proper use.
You need details on where data resides and in what state, determine what it is, and apply usage knowledge based on established definitions and protocols (e.g., does it contain the word “confidential”, and if so mark a sensitivity level and enforce rules to prevent copy/print).
Next, you need a strategy to document, review, scan, and process all such data, which forms your data governance strategy. After that and a few gallons of coffee, you’re ready to tackle the actual process of tagging files.
What is Microsoft Information Protection (MIP)
Now if your company is primarily a Microsoft shop (heavily invested in Teams, SharePoint, and OneDrive), then you’ve likely come across Microsoft Information Protection (MIP).
Note that as of 2021, MIP capabilities were expanded to classify data. For some people who adopted MIP in different years, they know it as Azure Information Protection or Purview Information Protection. As of summer 2023, MIP became known as Microsoft Purview Information Protection. For our purposes here, we focus on the most common name and use case for MIP– MIP for sensitivity labeling.
MIP is a Microsoft-created framework and solution for labeling data by sensitivity levels. The labels can be applied with rule-based policies set by administrators, though many users manually apply the MIP labels directly to files.
Windows files and objects, and those stored in Azure, support mechanisms for attaching important labels as part of the file metadata itself (such as sensitivity), as properties which move with the file (although it is possible to remove them). You can augment this metadata by leveraging MIP to embed identifying labels and further use those details to make informed decisions (automated or otherwise) about content restrictions, data residency, etc.
Large enterprises are exposed to legal and regulatory risks due to inconsistencies across data storage and handling systems: the age, volume, and nature of data may be largely unknown. They may be duplicative, unmanaged, out of date, untrustworthy, or even unusable. Is the metadata necessary / known, complete and accurate, and mapped correctly?
How is MIP used?
For many Microsoft shops, MIP can serve as the foundation for enabling data security and compliance programs:
- Data loss prevention (DLP) policies trigger blocking or quarantining of sensitive information. The policies are applied to labeled data and trigger specific actions, depending on the level of sensitivity.
- Data access governance (DAG) policies gate access and encrypt labeled data, based on the data’s sensitivity.
- Data compliance policies specify how data should be collected, stored, and processed. Data labeled as highly sensitive, for example, may have a policy specifying that it should not be stored in a SharePoint where it is accessible by third parties enforcing the principle of least privilege and restricting access to only those who need the data to fulfill contractual obligations.
Proper labeling with MIP tags allow data security and compliance programs to function as intended, protecting the data based on the level of risk it poses if leaked, exposed, or improperly stored.
What are the Challenges of Implementing MIP
While MIP can help you get started on labeling your data, it’s not known to be a scalable system for the following reasons:
- Inconsistent Labeling Schema – Whether you know it or not, every company has a different data labeling system. Often, there is more than one system within a single company, with varying degrees of depth and validation (if any) between divisions. Perhaps every file has 3 labels, or 10, or certain kinds have 5, while others have none.
- Prone to False Positives – Pattern-based rules and an open-to-interpretation human process to apply MIP tags translate to low confidence that the MIP label is broadly accepted as truth across divisions. It’s important then to designate a central authority to review and validate that the labels are applied correctly. But who has time for that?
- Requires Active Enforcement – Typically, these schemas rely on users to manually tag their files, but it is nearly impossible to enforce. It is hard to check for accuracy, track implementation, or fully know the breadth of the content and processes.
- Human-Error Prone – Even if a clear labeling schema is in place, users do not always understand or agree on what should be considered internal vs. public.
- Easy to Circumvent – A user could even circumvent policies by changing the MIP tags of a file later. By changing the MIP tag, say to from “highly sensitive” to “not sensitive,” users can circumvent DLP policies that utilize the MIP label to trigger protective actions.
How Cyera can Help you Operationalize Your MIP Implementation
Cyera helps companies design and implement their data protection programs, resulting in greater value out of their Microsoft enterprise licenses.
Cyera accurately classifies and contextualizes data without the need for custom regular expressions or manual training. Our hundreds of out-of-the-box classifiers and ability to learn new classifications that are specific to your environment drastically reduce the manual effort required to set up Purview’s data classification.
You can use Cyera to extract first- and third-party MIP labels from files in your environment. Cyera generates reports to help you understand what types of MIP-protected data you are ingesting from partners or other third parties. This helps you attest to contractual data protection obligations when processing data on behalf of third parties.
With Cyera, you can visualize exactly where your sensitive data classes are stored, how many sensitive records you have, and the context around them. This helps you build a DLP program that protects your crown jewel data classes without noisy alerts or negatively impacting business productivity.
Cyera helps you answer DLP design questions with confidence:
- How many data stores have files that contain X data classification?
- Can we protect this data with policy tips and education, or does it require blocking?
- Where do we need to implement DLP to protect this data (e.g., endpoint DLP, M365 native DLP, cloud DLP)
- How many users are impacted if we implement this rule?
If you have mismatched MIP labels, Cyera can detect the issue, generate an alert, and take corrective action. Cyera monitors your environment to detect when MIP labels do not accurately match the data contained within a file. For example, did an end user apply an “Internal-All Employees” label to a file containing sensitive non-public financial information? Detecting mismatched MIP labels also represents an opportunity to conduct targeted end-user education on users who repeatedly apply incorrect MIP labels.
When MIP is not implemented or when companies decide to move away from MIP, they can opt for a more automated approach with Cyera. Cyera’s advanced PII classification automatically applies sensitivity labels and classes to data, allowing you to circumvent many of the challenges of a manual MIP implementation. The benefit of more accurate classifications is time saved from manual tuning of efforts; the benefit of classifying more records is expanded coverage for your data security programs.
Given the challenges of implementing MIP, it is essential that you have a solution that can automate data labeling and close the gap on manual efforts. Your organization has likely spent years trying to design an effective DLP program and struggled to effectively use MIP labels across your environments. Cyera will audit and improve the state of data labeling to provide a more accurate assessment of your organization’s understanding of data and its sensitivity.
Cyera’s data security platform provides deep context on your data, applying correct, continuous controls to assure cyber-resilience and compliance.
Cyera takes a data-centric approach to security, assessing the exposure to your data at rest and in use and applying multiple layers of defense. Because Cyera applies deep data context holistically across your data landscape, we are the only solution that can empower security teams to know where their data is, what exposes it to risk, and take immediate action to remediate exposures and assure compliance without disrupting the business.
To learn more about how Cyera can help you audit the effectiveness of your MIP implementation, schedule a demo today.