Expand your cybersecurity education with an in-depth glossary of data security terminology and concepts.
The process of restricting access to resources, such as computers, files, or services, to authorized users only.
Active data collection refers to data that is collected knowingly and transparently from the user, such as through a web form, check box, or survey.
Under the GDPR, "Adequate Level of Protection" refers to the level of data protection that the European Commission requires from a third country or international organization before approving cross-border data transfers to that third country or international organization.In making their judgement, the European Commission considers not only the data protection rules, and security measures of the third country or international org., but also the rule of law, respect for human rights, and the enforcement of compliance and data protection rules.
A type of behavior or action that seems abnormal when observed in the context of an organization and a user's historical activity. It is typically analyzed using some sort of machine-learning algorithm that builds a profile based upon historical event information including login locations and times, data-transfer behavior and email message patterns. Anomalies are often a sign that an account is compromised.
Data Anonymization is a process that alters personally identifiable data (PII) in such a manner that it can no longer be used to identify an individual. This can be done by removing certain identifying values from data sets, or by generalizing identifying values.
Anonymous data is data that is not related to an identifiable individual and cannot be used in combination with other data to identify individuals. Anonymous data is not protected by the GDPR.
In the context of the GDPR, "Appropriate Safeguards" refers to the application of the GDPR's data protection principles to data processing. The GDPR's data protection principles include transparency, data minimization, storage limitation, data quality, legal basis for processing, and purpose limitation.
A trail of files, logs, or paperwork used to record an activity for auditing purposes.
The act of systematically examining, evaluating, and analyzing an organization's assets to ensure compliance and security standards are met.
The process of verifying a claimed identity and proving that someone is who they claim to be when attempting to access a resource.
Brazil passed a new legal framework in mid-August of 2018 aimed at governing the use and processing of personal data in Brazil: the General Data Protection Law. The law replaces approximately 40 or so laws that currently deal with the protection of privacy and personal data, and is aimed at guaranteeing individual rights, and encouraging economic growth by creating clear and transparent rules for data collection.
An acronym for Cloud Access Security Broker. This is a type of security that monitors and controls the cloud applications that an organization's employees might use. Typically, the control is enforced by routing web traffic through a forward- or reverse-proxy. CASBs are good for managing Shadow IT and limiting employee's use of certain SaaS or the activity within those SaaS but do not monitor third-party activity in the cloud–i.e. shared documents or email.
An acronym for the California Consumer Privacy Act. This is a state-level privacy law for California, which comes into effect in 2020. The law, which is the first state-level privacy law passed in the US, applies to all businesses that collect personal data from Californians. The CCPA mirrors the requirements of the GDPR in many ways, such as establishing the right of users to access personal data and request deletion.
An acronym for Chief Data Officer. This is the executive within an organization who is the head of information security.
An acronym for Chief Information Security Officer. This is an executive within an organization responsible for managing compliance with privacy laws and policies.
An acronym of Cybersecurity Maturity Model Certification.
It is a security framework for Defense Industrial Base contractors to follow. CMMC 2.0 was announced by the Department of Defense in November 2021 and sets forth requirements for safeguarding Controlled Unclassified Information and other regulated data.
An acronym for Chief Privacy Officer. This is an executive within an organization responsible for managing compliance with privacy laws and policies.
An acronym for Cloud Service Provider. This is any company that sells a cloud computing service, be it PaaS, IaaS, or SaaS.
An acronym of Controlled Unclassified Information.
It is information created or owned by the US government that requires safeguarding. While CUI is not classified information, it is considered sensitive. CUI is governed under a number of government policies and frameworks including the Department of Defense Instruction (DoDI) 5200.48 and Cybersecurity Maturity Model Certification. According to DoDi 5200.48, safeguarding CUI is a shared responsibility between Defense Industrial Base contractors and the Department of Defense.
A certification is a declaration by a certifying body that an organization or product meets certain security or compliance requirements.
A database service which is deployed and delivered through a cloud service provider (CSP) platform.
The guarantee that information is only available to those who are authorized to use it.
In the context of privacy, consent is the ability of a data subject to decline or consent to the collection and processing of their personal data. Consent can be explicit, such as opting-in via a form, or implied, such as agreeing to an End-User License Agreement, or not opting out. Under many data protection laws, consent must always be explicit.
The transfer of personal data from one legal jurisdiction, such as the EU, to another, such as the US. Many data protection laws place major restrictions on cross-border data transfers.
The protection of information and communications against damage, exploitation, or unauthorized use.
An acronym for Data Leak Prevention or Data Loss Prevention. A type of security that prevents sensitive data, usually files, from being shared outside the organization or to unauthorized individuals within the organization. This is done usually through policies that encrypt data or control sharing settings.
An acronym for Data Protection Authority. This is an independent public authority set up to supervise and enforce data protection laws in the EU. Each EU member state has its own DPA.
An acronym for Data Protection Officer. This is an individual within an organization who is tasked with advising the organization on GPDR compliance and communicating with their Data Protection Authority. Organizations that process personal data as part of their business model are required to appoint a DPO.
Digital Rights Management: a set of access control technologies for restricting the use of confidential information, proprietary hardware and copyrighted works, typically using encryption and key management.
A data breach is a security incident during which sensitive, protected, or confidential data has been accessed or exposed to unauthorized entities. Data breaches occur in organizations of all sizes, from schools to small businesses to enterprise organizations. These incidents may expose protected or personal health information (PHI), personally identifiable information (PII), intellectual property, classified information, or other confidential data.
Some types of protected personal information include:
For businesses, sensitive data may also include customer lists, source code, credit and debit card information, user data, and other sensitive information.
Data breaches may be caused by different types of cyberattacks, such as malware, viruses, phishing attacks, ransomware, or theft of physical devices. Data breaches may also be due to misconfigurations, unpatched security vulnerabilities, malicious insiders, or other types of insider errors. Allowing unauthorized individuals into a building or floor, attaching or sharing the wrong document, or even copying the wrong person on an email all have the potential to expose data and result in a significant data breach.
Many industries, particularly the financial and healthcare industries, mandate controls of sensitive data. Industry guidelines and government regulations increasingly require strict controls, disclosure rules if a breach occurs, and penalties or fines for organizations that fail to safeguard the data in their care.
The Payment Card Industry Data Security Standard (PCI DSS) applies to financial institutions and businesses that handle financial information. The Health Insurance Portability and Accountability Act (HIPAA) regulates who has access to view and use PHI in the healthcare industry.
The General Data Protection Regulation (GDPR) in the European Union increases individuals’ control and rights over their personal data and includes the potential for significant fines for organizations found not to be in compliance with the regulation. Other countries also have significant regulations regarding data protection. The United States has several laws at the federal and state levels intended to protect the personal data of U.S. residents.
Negative impacts to a business due to a data breach include fines; costs related to investigating, mitigating, and recovering from the incident; reputation loss; litigation; and possibly even the inability to operate the business.
The act of notifying regulators as well as victims of data breaches that an incident has occurred. Under Article 34 of the GDPR, an organization must notify affected users within 72 hours of the incident.
According to the GDPR, a Data Broker is any entity that collects and sells individuals’ personal data.
An organized inventory of data assets in the organization. Data catalogs use metadata to help organizations manage their data. They also help data professionals collect, organize, access, and enrich metadata to support data discovery and governance.
The process of dividing the data into groups of entities whose members are in some way similar to each other. Data privacy and security professionals can then categorize that data as high, medium, and low sensitivity data.
A definition that allows each type of data in a data store to be programmatically detected, typically using a test or algorithm. Data privacy and security professionals associate data classes with rules that define actions that should be taken when a given data class is detected. For example, sensitive information or PII should be tagged with a business term or classification, and further for some sensitive data classes a specific data quality constraint should be applied.
Data classification is the process of organizing data into relevant categories to make it simpler to retrieve, sort, use, store, and protect.
A data classification policy, properly executed, makes the process of finding and retrieving critical data easier. This is important for risk management, legal discovery, and regulatory compliance. When creating written procedures and guidelines to govern data classification policies, it is critical to define the criteria and categories the organization will use to classify data.
Data classification can help make data more easily searchable and trackable. This is achieved by tagging the data. Data tagging allows organizations to label data clearly so that it is easy to find and identify. Tags also help you to manage data better and identify risks more readily. A data tag also enables it to be processed automatically and ensures timely and reliable access to data, as required by some state and federal regulations.
Most data classification projects help to eliminate duplication of data. By discovering and eliminating duplicate data, organizations can reduce storage and backup costs as well as reduce the risk of confidential data or sensitive data being exposed in case of a data breach.
Specifying data stewardship roles and responsibilities for employees inside the organization is part of data classification systems. Data stewardship is the tactical coordination and implementation of an organization's data assets, while data governance focuses on more high-level data policies and procedures.
Data classification increases data accessibility, enables organizations to meet regulatory compliance requirements more easily, and helps them to achieve business objectives. Often, organizations must ensure that data is searchable and retrievable within a specified timeframe. This requirement is impossible without robust classification processes for classifying data quickly and accurately.
To meet data security objectives, data classification is essential. Data classification facilitates appropriate security responses for data security based on the types of data being retrieved, copied, or transmitted. Without a data classification process, it is challenging to identify and appropriately protect sensitive data.
Data classification provides visibility into all data within an organization and enables it to use, analyze, and protect the vast quantities of data available through data collection. Effective data classification facilitates better protection for such data and promotes compliance with security policies.
Data classification tools are intended to provide data discovery capabilities; however, they often analyze data stores only for metadata or well-known identifiers. In complex environments, data discovery is ineffective if it can discover only dates but cannot identify whether they are a date of birth, a transaction date, or the dateline of an article. Without this additional information, these discovery tools cannot identify whether data is sensitive and therefore needs protection.
“The best DSPs will have semantic and contextual capabilities for data classification — judging what something really is, rather than relying on preconfigured identifiers.“ Gartner: 2023 Strategic Roadmap for Data Security Platform Adoption
Modern data security platforms must include semantic and contextual capabilities for data classification, to identify what a piece of data is rather than using preconfigured identifiers, which are less accurate and reliable. Because organizations are increasing the use of cloud computing services, more sensitive data is now in the cloud. However, a lot of the sensitive data is unstructured, which makes it harder to secure.
A data classification scheme enables you to identify security standards that specify appropriate handling practices for each data category. Storage standards that define the data's lifecycle requirements must be addressed as well. A data classification policy can help an organization achieve its data protection goals by applying data categories to external and internal data consistently.
Data discovery and inventory tools help organizations identify resources that contain high-risk data and sensitive data on endpoints and corporate network assets. These tools help organizations identify the locations of both sensitive structured data and unstructured data by analyzing hosts, database columns and rows, web applications, file shares, and storage networks.
Tagging or applying labels to data helps to classify data. This is an essential part of the data classification process. These tags and labels define the type of data, the degree of confidentiality, and the data integrity. The level of sensitivity is typically based on levels of importance or confidentiality, which aligns with the security measures applied to protect each classification level. Industry standards for data classification include three types:
While each approach has a place in data classification, user-based classification is a manual and time-consuming process, and extremely likely to be error-prone. It will not be effective at categorizing data at scale and may put protected data and restricted data at risk.
It is important for data classification efforts to include the determination of the relative risk associated with diverse types of data, how to manage that data, and where and how to store and send that data. There are three broad levels of risk for data and systems:
Automated tools can perform classification that defines personal data and highly sensitive data based on defined data classification levels. A platform that includes a classification engine can identify data stores that contain sensitive data in any file, table, or column in an environment. It can also provide ongoing protection by continuously scanning the environment to detect changes in the data landscape. New solutions can identify sensitive data and where it resides, as well as apply the context-based classification needed to decide how to protect it.
Classifying data as restricted, private, or public is an example of data classification. Like identifying risk levels, public data is the least-sensitive data and has the lowest security requirements. Restricted data receives the highest security classification, and it includes the most sensitive data, such as health data. A successful data classification process extends to include additional identification and tagging procedures to ensure data protection based on data sensitivity.
Security and risk leaders can only protect sensitive data and intellectual property if they know the data exists, where it is, why it is valuable, and who has access to use it. Data classification helps them to identify and protect corporate data, customer data, and personal data. Labeling data appropriately helps organizations to protect data and prevent unauthorized disclosure.
The General Data Protection Regulation (GDPR), among other data privacy and protection regulations, increases the importance of data classification for any organization that stores, transfers, or processes data. Classifying data helps ensure that anything covered by the GDPR is quickly identified so that appropriate security measures are in place. GDPR also increases protection for personal data related to racial or ethnic origin, political opinions, and religious or philosophical beliefs, and classifying these types of data can help to reduce the risk of compliance-related issues.
Organizations must meet the requirements of established frameworks, such as the GDPR, California Consumer Privacy Act (CCPA), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standard (PCI DSS), Gramm-Leach-Bliley Act (GLBA), Health Information Technology for Economic and Clinical Health (HITECH), among others. To do so, they must evaluate sensitive structured and unstructured data posture across Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) environments and contextualize risk as it relates to security, privacy, and other regulatory frameworks.
According to the GDPR, a Data Controller is an organization, agency, public authority, or individual that determines the how and why of data processing. The data controller may also be a data processor, or they may employ a third-party data processor.
In communications, data flow is the path taken by a message from origination to destination that includes all nodes through which the data travels.
An illustration that shows the way information flows through a process or system. Data flow diagrams include data inputs and outputs, data stores, and the various subprocesses the data moves through.
Also known as records of authority, data inventories identify personal data within systems and help in the mapping of how data is stored and shared. Data inventories are defined under privacy regulations including the GDPR, CCPA, and CPRA.
The requirement that data is physically stored in the same country or group of countries that it originated from. This is a common requirement in modern privacy and data protection bills, such as the GDPR, China’s CSL, and Brazil’s Security Law. For example, under the GDPR, a company collecting the data of an EU citizen would have to store that data on a server in the EU.
The accidental loss of data, whether via accidental deletion, destruction, or theft.
A privacy concept that states data collectors should only collect and retain the bare minimum of personal data that is necessary for the data processor to perform their duties, and should delete that data when it is no longer necessary.
Any action that is performed on personal data or sets of personal data, such as collecting, structuring, storing, or disseminating that data.
GDPR defines a data processor in GDPR as any organization that collects, processes, stores or transmits personal data of EU citizens.
A legal term referring to laws and regulations aimed at protecting the personal data of individuals and determining that data’s fair use.
This is a principle set forth in Article 5 of the GDPR. The principles listed in Article 5 are: Lawfulness, fairness and transparency; Purpose limitation; Data minimization; Accuracy; Storage limitation; Integrity and confidentiality.
A concept that refers to the physical or geographic location of an organization's data. Privacy and security professionals focus on the data laws or regulatory requirements imposed on data based on the data laws that govern a country or region in which it resides. When a businesses uses cloud services (IaaS, PaaS, or SaaS), they may not be aware of their data's physical location. This can create data residency concerns when, for example, data for a citizen of the European Union is stored in a US-based cloud datacenter.
Data is every business’s most crucial asset – the foundation of any security program. Data Security Posture Management (DSPM) is an emerging security trend named by Gartner in its 2022 Hype Cycle for Data Security. The aim of DSPM solutions is to enable security and compliance teams to answer three fundamental questions:
The cloud has fundamentally changed how businesses function. Moving workloads and data assets is now simpler than ever, and is a boon for productivity, enabling businesses to quickly respond to customer demands and create new revenue opportunities. However, the pace and permissive nature of the cloud also dramatically expands a company’s threat surface and raises the likelihood of a data breach. Put simply, the distributed nature of the cloud seriously complicates data security.
Historically, a number of technologies have attempted to address challenges related to data security, including:
DSPM solutions combine capabilities from all three of these areas and represent the next-generation approach in cloud data security.
DSPM represents a next-generation approach to data security
DSPM vendors are taking a cloud-first approach to make it easier to discover, classify, assess, prioritize, and remediate data security issues. They are solving cloud security concerns by automating data detection and protection activities in a dynamic environment and at a massive scale.
Gartner Research summarizes the DSPM space, saying, “Data security posture management provides visibility as to where sensitive data is, who has access to that data, how it has been used, and what the security posture of the data store or application is. In simple terms, DSPM vendors and products provide “data discovery+” — that is, in-depth data discovery plus varying combinations of data observability features. Such features may include real-time visibility into data flows, risk, and compliance with data security controls. The objective is to identify security gaps and undue exposure. DSPM accelerates assessments of how data security posture can be enforced through complementary data security controls.” To summarize Gartner’s definition, DSPM provides visibility as to where sensitive data is, who has access to that data, how it has been used, and what the security posture of the data store or application is.
The foundation of a DSPM offering is data discovery and classification. Reports like Forrester’s Now Tech: Data Discovery And Classification, Q4 2020 dive deep into data discovery and classification technologies, which in Forrester’s case aligns to five segments: data management, information governance, privacy, security, and specialist concerns. These segments align with three major buying centers: global risk and compliance, security, and business units/product owners.
DSPM focuses on delivering automated, continuous, and highly accurate data discovery and classification for security teams. The following list provides clarity on how these approaches align with buying centers, all of which have data discovery and classification needs, but as you will see below, want to leverage it for different purposes:
Posture management solutions abound
Today there are three prevailing types of security tools that offer posture management solutions: cloud security posture management (CSPM), SaaS security posture management (SSPM), and data security posture management (DSPM). The solutions can be disintermediated as follows:
While DSPM solutions have focused on a cloud-first approach, data security is not limited only to cloud environments. Therefore more mature DSPM solutions will also include on-prem use cases since most businesses maintain some form of on-prem data, and will for years to come. In addition, as the DSPM space evolves, and solutions gain maturity, some will become more robust data security platforms, which will include the ability to:
DSPM solutions address key security use cases
Businesses thrive on collaboration. The current reality of highly distributed environments - many of which leverage cloud technologies - means that any file or data element can be easily shared at the click of a button. DSPM provides the missing piece to complete most security programs’ puzzles – a means of identifying, contextualizing, and protecting sensitive data.
DSPM solutions empower security teams to:
Data sprawl refers to the significant quantities of data many organizations create daily. Data sprawl can be defined as the generation of data, or digital information, created by businesses. Data is a valuable resource because it enables business leaders to make data-driven decisions about how to best serve their client base, grow their business, and improve their processes. However, managing vast amounts of data and so many data sources can be a serious challenge.
Large businesses, particularly enterprises, are generating a staggering amount of data due to the wide variety of software products in use, as well as newly introduced data formats, multiple storage systems in the cloud and in on-premises environments, and huge quantities of log data generated by applications. There is an overwhelming amount of data being generated and stored in the modern world.
As organizations scale and increasingly use data for analysis and investigation, that data is being stored in operating systems, servers, applications, networks, and other technologies. Many organizations generate massive quantities of new data all day, every day, including:
These files and records are dispersed across multiple locations, which makes inventorying, securing, and analyzing all that data extremely difficult.
Data sprawl is described as the ever-expanding amount of data produced by organizations every day. Amplified by the shift to the cloud, organizations can scale more rapidly, producing more and more data. New uses for big data continue to develop, requiring an increase in how much data is stored in operating systems, servers, networks, applications, and other technologies.
Further complicating matters, databases, analytics pipelines, and business workflows have been migrating rapidly to the cloud, moving across multiple cloud service providers (CSPs) and across structured and unstructured formats. This shift to the cloud is ongoing, and new data stores are created all the time. Security and risk management (SRM) leaders are struggling to identify and deploy data security controls consistently in this environment.
"...unstructured data sprawl (both on-premises and hybrid/multi-cloud) is difficult to detect and control when compared to structured data."
Gartner, Hype Cycle for Data Security, 2022
Organizations generate new data every hour of every day. The customer data in customer relationship management (CRM) systems may also include financial data, which is also in an accounting database or enterprise resource planning (ERP) system. Sales data and transactional data may be in those systems as well, and siloed by different departments, branches, and devices. To get the benefits promised by data analytics, data analysts need to cross reference multiple sources and therefore may have difficulty making accurate and informed decisions.
Ultimately, organizations need data to facilitate day-to-day workflows and generate analytical insights for smarter decision-making. The problem is that the amount of data organizations generate is spiraling out of control. According to a recent IDC study, the Global DataSphere is expected to more than double from 2022 to 2026. The worldwide DataSphere is a measure of how much new data is created, captured, replicated, and consumed each year, growing twice as fast in the Enterprise DataSphere compared to the Consumer DataSphere.
As organizations generate data at a faster pace, it is becoming harder to manage this information. Organizations might have data stored in various locations, making it hard to access business-critical information and generate accurate insights. Team members must cross-reference data in multiple formats from multiple sources, making analytics difficult. Managing dispersed information across different silos wastes time and money. Data may become corrupted during transmission, storage, and processing. Data corruption compromises the value of data, and the likelihood of corruption may increase alongside increasing data sprawl.
In addition, the effort is wasted when data is duplicated by employees who were not able to find the data needed where expected, which can then also result in ghost data. This duplicate data is considered redundant. Other data may be obsolete (out of date) or trivial (not valuable for business insights). This excess data results in excessive resource utilization and increases cloud storage costs.
Employees may be handling data carelessly, not understanding how the way they share and handle data can introduce risk. Unauthorized users may also have access to sensitive information, particularly when the data produced and stored is not appropriately managed. Manually classifying data is time-consuming and error-prone and may increase the risk of sensitive data exposure, so finding automated solutions is essential for managing large stores of data.
Data sprawl compromises data value and presents significant security risks. There are also security concerns because too much data can be difficult to control. This increases the chances of data breaches and other security risks. Furthermore, organizations that do not manage data sprawl may jeopardize the trust of customers and face strict penalties due to the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), or other data protection legislation for non-compliance.
Getting data sprawl under control requires a structured approach to data management. It is essential to have a solution in place to discover and classify data. Because data is spread across on-premises and cloud environments, it is critical to identify the environments where data is stored to ensure that all data is identified and managed. Tools that can discover and classify data in SaaS, IaaS, and PaaS environments are important, as are those that can find and classify structured and unstructured data. The goal of these tools is to create a unified view across the environment.
Identifying a central place to store data is one way to manage data sprawl. Cloud security standards continue to improve, making a centralized cloud repository an appealing option for many organizations. Cloud storage platforms are an excellent method of storing data in a way that creates a single source of truth that is more accessible to employees in many locations. At the same time, companies must establish data access governance (DAG) policies that outline how data should be collected, processed, and stored. These policies must also put policies in place to govern the data, including access controls, retention, risk management, compliance, and data disposition (how it is disposed of at the end of its lifecycle). DAG policies complement data loss prevention (DLP) programs. Data security posture management (DSPM) combines data discovery and classification, data loss prevention, and data access governance to create a next-generation approach to cloud data security.
For organizations that want to manage data sprawl, it is imperative to know what data exists in the environment, where it is located, and who has access to it. Different tools exist to manage all the data that organizations store, but few can prevent data sprawl.
Automated data discovery and data classification solutions must be able to identify and classify sensitive data. Artificial intelligence (AI) and machine learning (ML) can more accurately classify difficult-to-identify data, such as intellectual property and sensitive corporate data.
Data sprawl solutions can also increase overall data security by helping to locate and identify duplicate and redundant data. Once sprawling data has been identified and classified, it becomes easier to dispose of stale data or extraneous data. This can save on storage costs as well as eliminate duplicate and irrelevant data.
Enterprises collect data daily and it is easy to create multiple copies. The first step for companies that wish to manage access to data and prevent data loss is to fully understand their data — both where it is now, whether IT or security teams are aware of the data stores or not, and any data stores that are created in the future. Identifying sensitive data and who has access to them can help prevent data breaches by ensuring that appropriate security controls are enforced.
A repository for storing, managing and distributing data sets on an enterprise level.
Defense Industrial Base (DIB) contractors are companies that conduct business with the US military and are part of the military industry complex responsible for research, production, delivery, and service.
DIB contractors are responsible for meeting compliance requirements set by government policies and frameworks including the the Department of Defense Instruction (DoDI) 5200.48 and Cybersecurity Maturity Model Certification.
According to DoDi 5200.48, safeguarding Controlled Unclassified Information is a shared responsibility between DIB contractors and the Department of Defense.
Electronic Lab Notebooks (Electronic Laboratory Notebook or ELN) is the digital form of a paper lab notebook. In the pharmaceutical industry, it is used by researchers, scientists, and technicians to document observations, progress, and results from their experiments performed in a laboratory.
While ELN enables information to be documented and shared electronically, it also exposes proprietary information to malicious insiders or external hackers. As a result, ELN should be subject to appropriate security controls to prevent misuse or loss.
An adequacy agreement created in 2016 to replace the EU-U.S. Safe Harbor Agreement. The EU-U.S. Privacy Shield lets participating organizations under the jurisdiction of the US Federal Trade Commission transfer personal data from the EU to the United States.
Encryption is the method of converting a plaintext into a cipher text so that only the authorized parties can decrypt the information and no third parties can tamper with the data. Unencrypted usually refers to data or information that is stored unprotected, without any encryption. Encryption is an important way for individuals and companies to protect sensitive information from hacking. For example, websites that transmit credit card and bank account numbers encrypt this information to prevent identity theft and fraud.
The primary supervisory authority established by the GDPR. The board consists of the heads of EU member states’ supervisory authorities as well as the European Data Protection Supervisor. The goal of the EDPB is to ensure consistent application of the GDPR by member states.
An independent authority that aims to ensure that European organizations and member states comply with the privacy rules of the GDPR.
Where the a result of a query, algorithm or search only registers a match if there is a 100% match.
A false positive is an alert that incorrectly indicates a vulnerability exists or malicious activity is occurring. These false positives add a substantial number of alerts that need to be evaluated, increasing the noise level for security teams.
False positives may be triggered by a variety of incidents, such as:
The increase of security testing and monitoring tools increases the overall number of alerts security teams receive, which in turn increases the number of false positives coming in to be triaged. These types of security events increase the noise for overburdened security teams, making them more likely to ignore valid security events because they assume they are false positives.
Realistically, security teams cannot and do not need to resolve every single issue exposed by alerts, nor can software development and testing teams analyze each alert. These teams get a high number of alerts and it requires time to investigate each alert. When time-constrained teams continuously receive a high number of alerts, they are more likely to experience alert fatigue and focus on instances where there is a clear issue that needs to be resolved.
False positives increase the likelihood that internal security teams will miss important security events because they believe them to be invalid or simply see too many alerts to investigate each one. False negatives are similarly problematic, because they show that no vulnerability or security issue is present when there actually is a problem that needs to be addressed.
While some number of false positives will be investigated to verify that they do not, in fact, pose a threat to the organization, false negatives are less likely to be investigated as test results appear to indicate that the software is functioning as intended. Both false positives and false negatives can pose a threat to security teams and the organizations they protect.
An unsupervised learning method whereby a series of files is divided into multiple groups, so that the grouped files are more similar to the files in their own group and less similar to those in the other groups.
Where scores of a result can fall from 0 - 100, based on the degree to which the search data and file data values match.
An acronym for the General Data Protection Regulation. This is a data protection law that applies to all 28 Member States of the European Union.The aim of the GDPR is to set a high standard for data protection, and to provide one set of data protection rules for the entire EU. The 99 articles of the GDPR set forth several fundamental rights of data protection, including the right to be informed, right of access, right to rectification, right to erasure/to be forgotten, right to restrict processing, right to data portability, right to object and rights in relation to automated decision making and profiling.Those rules set by the GDPR apply to any organization that processes the personal data of EU residents, whether that organization itself is based in the EU or not. The GDPR modernizes the principles from the EU's 1995 Data Protection Directive and applies to personal data of EU citizens from that is processed by what the regulation calls data controller and data processors. Financial penalties for non-compliance reach up to USD $24M, or 4% percent of worldwide annual turnover, whichever is higher.
Ghost data is backups or snapshots of data stores where the original has been deleted. Ghost data is a type of shadow data, which includes unmanaged data store copies and snapshots or log data that are not included in an organization’s backup and recovery plans. Ghost data refers to data that still exists within a database or storage system but is no longer actively used or known to be accessible. For example, if a data store is created for a product and the product has been discontinued, the production data is usually removed as there is no longer a business justification to maintain it. However if copies of the data remain in staging or development environments they would be considered ghost data.
Ghost data occurs or is created due to a few reasons, such as when a user or program deletes a file or database entry, but the data is not permanently removed from the system. Ghost data also happens when data is migrated to a new system, but the old data is not completely erased from the original system.
Cloud adoption led to a proliferation of data. Much of that data is structured, secured, and monitored, but a considerable proportion of that data is unstructured, unsecured, and unmonitored. This presents real risks to organizations today. And while data collection and analysis can yield important business benefits, it can also increase the risk to the organization if not effectively managed. Ghost data presents significant risks to organizations because it cannot be effectively managed.
Ghost data can cause problems for organizations because it:
Ghost data may include sensitive data, including customer and employee personally identifiable information (PII).
“Over 30% of scanned cloud data stores are ghost data, and more than 58% of the ghost data stores contain sensitive or very sensitive data.” – Cyera Research
The problem with ghost data begins with how data is stored today. In the past, organizations had storage capability limited by hardware capacity. If an organization or team needed more storage space, the IT team purchased additional hardware, reviewed the data to determine what could be purged, or both.
The wide adoption of cloud storage and services changed that equation. Because the cloud is extensible, organizations can continually expand storage to accommodate the accumulation of data. Data is also being generated and stored at an unprecedented rate, creating an expansive data landscape. Further increasing the quantity of data, most organizations store multiple copies of data in different formats and different cloud providers. This makes it more difficult to identify duplicate and redundant data and much easier to have data stores that have become lost or are no longer tracked or protected.
Few companies delete older versions of data as it becomes obsolete. It is easy to store more data, but most organizations have no limits in place to trigger a review of the data across all environments, including multiple cloud environments. This results in data sprawl and creates challenges in data classification efforts.
“35% [of respondents] utilize at least two public cloud providers from a list that included Amazon Web Services, Google Cloud, Microsoft Azure, Alibaba, IBM, and Oracle; 17% of respondents rely on three or more.” – Cyera Research
Many organizations choose to keep older copies of data in case it is needed at some point. If there is no review or verification of the data — where it is, how much there is, what sensitive information exists in the data, or whether the data is securely stored — this ghost data both increases storage costs and poses a significant business risk.
Frequently, teams copy data to non-production environments. This not only creates an additional copy of the data but places it in a less secure environment. Non-production environments are not secured with the same rigor as production environments, therefore the sensitive data they contain is more susceptible to inadvertent disclosure or exfiltration. Frequently, ghost data is accessible to users that have no business justification for accessing that data, increasing data security risks.
These data copies also represent a potential EU General Data Protection Regulation (GDPR) violation. GDPR specifies that personal data be kept only as long as the data are required to achieve the business purpose it was collected for (except for scientific or historical research). After this period, the data must be disposed of appropriately, but when personal data exists in ghost data, it is likely to remain in the environment, increasing organizational risk. It can sometimes be difficult for IT teams to delete ghost data because they are unaware of it.
“60% of the data security posture issues present in cloud accounts stem from unsecured
sensitive data.” – Cyera Research
Sometimes, the database may be gone but snapshots are still there. Sometimes those snapshots are unencrypted, while other times the data stores exist in the wrong region. That exposes organizations to both increased costs and security risks. The additional data, unencrypted and in unknown locations, increases the attack surface for the organization.
Ghost data can increase the risk of ransomware because attackers do not care whether the data is up-to-date or accurate. They only care about what is easy to access and what is not being monitored. While the organization may not be aware of its ghost data, that lack of visibility does not protect it from attackers.
Stolen ghost data can be exfiltrated and used for malicious purposes. Cyber attackers can prove that they have access to the data and thereby execute a successful ransomware attack. Double extortion attacks are as successful with ghost data as with any other data because attackers have the same increased leverage. The attackers rely not only on encryption (which would not be of concern to an organization as it relates to ghost data). They can also publicly release the stolen data to encourage payment of the ransom. Robust backups cannot help with the issue of ghost data because the leverage to release data publicly remains the same.
Unfortunately, cloud providers offer limited visibility into what data customers have. Cloud service providers (CSPs) do not identify how sensitive data is. CSPs also do not provide specific advice on how to improve the security and risk posture of data across their cloud estate. This results in increased risks to cyber resilience and compliance. An organization’s lack of visibility into its data across all cloud providers increases the risk of exposing sensitive data. Similarly, ghost data or any other data store that is not adequately identified and classified is likely to have overly permissive access.
Significant changes in how data is managed and stored in cloud and hybrid environments have also led to new questions, including:
In modern corporate environments, it is important for all teams involved to understand their responsibilities when it comes to managing, security, and protecting data. It is a joint effort between builders and the security team. However, managing data on an ongoing basis remains a challenge without the technology to discover and classify sensitive data automatically.
Modern software solutions and products have had a significant impact in terms of creating data, increasing the data available for analytics, and growing complexity in corporate environments. AI/ML can help address the challenges created by these technological advances. In particular, AI/ML can help identify ghost data and increase data security by using continuous learning and automation to:
Robust AI/ML data classification solutions can accurately classify data that previously was challenging to identify, including intellectual property and other sensitive corporate data. AI/ML can also help enable organizations to make smarter decisions and create better policies about what to do with data and how to protect sensitive data.
To begin with, it is important to think of data as being at an advanced layer of security. In the past, data was not considered a layer of security. This is because there was no straightforward way to deal with data in the past. Today, with AI/ML, it is far easier to access, understand, and know the data within an organization and across all its disparate environments.
As technology has changed, the focus of security has moved from infrastructure to data-related security. While CISOs remain in charge of the technical aspects of security, new challenges in business and cybersecurity require more collaboration across the business team, IT, security, and privacy office to move forward and meet data security and data privacy requirements.
Regulations and requirements are becoming more stringent globally, requiring organizations to take more responsibility for the data they are collecting. This includes all the data spread across environments, including ghost data. Managing that data requires robust data discovery and data classification.
An acronym for the Health Insurance Portability and Accountability Act. This is an American law that sets national standards and regulations for the transfer of electronic healthcare records. Under HIPAA, patients must opt in before their healthcare information can be shared with other organizations.
An acronym for the Health Information Technology for Economic and Clinical Health Act. This is an American law enacted as part of the American Recovery and Reinvestment Act of 2009. HITECH aims to build on the healthcare security and privacy requirements set forth by HIPAA. HITECH does so by adding tiered monetary penalties for noncompliance, as well as the requirement for breach notifications.
A Federal Trade Commission rule requiring vendors of personal health records to notify consumers following a breach involving unsecured information. And if a service provider to such a vendor is breached, they must notify the vendor. The rule also stipulates an exact timeline and method by which these public notifications must be made.
Information Rights Management is a subset of Digital Rights Management that protects corporate information from being viewed or edited by unwanted parties typically using encryption and permission management.
International standard for how to manage information security, first published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) in 2005, then revised in 2013. It outlines standards for creating, executing, maintaining and optimizing an information security management system, in order to help organizations make their information assets more security.
The directives, rules, regulations, and best practices that an organization follows to manage and secure information.
Any individual with insider access to an organization's networks or resources that would allow them to exploit the vulnerabilities of that organization's security or steal data.
The assurance that information has not been changed and that it is accurate and complete. The GDPR mandates that data controllers and processors implement measures guarantee data integrity.
A security principle which mandates that users should be granted the least amount of permissions necessary to perform their job.
The GDPR mandates that data controllers must demonstrate a legal basis for data processing. The six legal bases for processing listed in the law are: consent, necessity, contract requirement, legal obligation, protection of data subject, public interest, or legitimate interest of the controller.
An acronym for Multifactor Authentication. This represents an authentication process that requires more than one factor of verification. An example would be a login that requires a username and password combination, as well as an SMS-code verification, or the use of a physical security key.
A deliberate configuration change within a system by a malicious actor, typically to create back-door access or exfiltrate information. While the original change in configuration might involve a compromised account or other vulnerability, a malconfiguration has the benefit of offering long term access using legitimate tools, without further need of a password or after a vulnerability is closed.
A term that represents a number of different types of malicious software that is intended to infiltrate computers or computer network.
A database with storage, data, and compute services that is managed and maintained by a third-party provider instead of by an organization's IT staff.
Sensitive information swapped with arbitrary data intended to resemble true production data, rendering it useless to bad actors. It's most frequently used in test or development environments, where realistic data is needed to build and test software, but where there is no need for developers to see the real data.
Data that describes other data. For databases, metadata describes properties of the data store itself, as well as the definition of the schema.
A dangerous or unapproved configuration of an account that could potentially lead to a compromise typically done by a well-intentioned user attempting to solve an immediate business problem. While there is no malicious intent, misconfiguration is actually the leading cause of data loss or compromise.
An acronym for the National Institute of Standards and Technology. NIST is a unit of the US Commerce Department tasked with promoting and maintaining measurement standards. NIST leads the development and issuance of security standards and guidelines for the federal government.
In data security or privacy terms, this is the breach of a legal duty to protect personal information.
Sensitive information swapped with arbitrary data intended to resemble true production data, rendering it useless to bad actors. It's most frequently used in test or development environments, where realistic data is needed to build and test software, but where there is no need for developers to see the real data.
When an individual makes an active indication of choice, such as checking a box indicating willingness to share information with third parties.
Either an explicit request for a user to no longer share information or receive updates from an organization, or a lack of action that implies that the choice has been made, such as when a person does not uncheck a box indicating willingness to share information with third parties.
An acronym for the Payment Card Industry Data Security Standard. This is a widely accepted set of policies and procedures intended to optimize the security of credit, debit and cash card transactions and protect cardholders against misuse of their personal information.
An acronym for Protected Health Information. The HIPAA Privacy Rule provides federal protections for personal health information held by covered entities and gives patients an array of rights with respect to that information.
An acronym of Personally Identifiable Information. This is any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means. Examples include social security number (SSN), passport number, driver's license number, taxpayer identification number, patient identification number, financial account number, or credit card number, personal address information including street address or email address, or personal telephone numbers.
Any data collection technique that gathers information automatically, with or without the end user’s knowledge.
A type of malware that encrypts the files on an endpoint device using a mechanism for which only the attacker has the keys. While the attacker will offer the key in exchange for payment, fewer than half of victims that do pay actually recover their files.
The idea that organizations should only retain information as long as it is pertinent.
An individual’s right to request and receive their personal data from a business or other organization.
The right for individuals to correct or amend information about themselves that is inaccurate.
An individual’s right to have their personal data deleted by a business or other organization possessing or controlling that data.
An individual’s right to have their personal data deleted by a business or other organization possessing or controlling that data.
In cybersecurity, a risk assessment is a comprehensive analysis of an organization to identify vulnerabilities and threats. The goal of a risk assessment is to identify an organization’s risks and make recommendations for mitigating those risks. Risk assessments may be requested after a specific trigger, to complete an assessment before moving forward as part of larger governance and risk processes, or to assess a portfolio periodically as part of meeting an enterprise risk management or compliance objective.
Two popular risk assessment frameworks are the National Institute of Standards and Technology (NIST) Cybersecurity Framework and the International Organization for Standardization/ International Electrotechnical Commission (ISO)/IEC) 27001:2022 standard.
Risk assessments may be based on different methodologies: qualitative, quantitative, or a hybrid of the two. A quantitative assessment provides concrete data that includes the probability and potential impact of a threat based on data collection and statistical analysis. A qualitative assessment provides a more subjective, generalized view and what would happen to operations and productivity for different internal teams if one of the risks occurred.
A risk assessment should include an up-to-date inventory of the systems, vendors, and applications in scope for the assessment. This information helps security risk management leaders understand the risk associated with:
A single risk assessment provides a point in time snapshot of the current risks present and how to mitigate them. Ongoing or continuous risk assessments provide a more holistic view into the shifting risk landscape that exists in most organizations.
Risk assessments also help organizations assess and prioritize the risks to their information, including their data and their information systems. An assessment also helps security and technology leaders communicate risks in business terms to internal stakeholders, specifically the executive team and the board of directors. This information helps them make educated decisions about which areas of the cybersecurity program need to be prioritized and how to allocate resources in alignment with business goals.
The growth of digital business and related assets require better management of complex technology environments, which today include:
It also creates a growing volume of data, types of data, and technology assets. A comprehensive risk assessment should include these assets, allowing an organization to gain visibility into all its data, provide insight into whether any of that data is exposed, and identify any serious security issues. A data risk assessment can help an organization secure and minimize exposure of sensitive data by providing a holistic view of sensitive data, identifying overly permissive access, and discovering stale and ghost data.