Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
An acronym for Cloud Service Provider. This is any company that sells a cloud computing service, be it PaaS, IaaS, or SaaS.
An acronym of Controlled Unclassified Information.
It is information created or owned by the US government that requires safeguarding. While CUI is not classified information, it is considered sensitive. CUI is governed under a number of government policies and frameworks including the Department of Defense Instruction (DoDI) 5200.48 and Cybersecurity Maturity Model Certification. According to DoDi 5200.48, safeguarding CUI is a shared responsibility between Defense Industrial Base contractors and the Department of Defense.
A database service which is deployed and delivered through a cloud service provider (CSP) platform.
An organized inventory of data assets in the organization. Data catalogs use metadata to help organizations manage their data. They also help data professionals collect, organize, access, and enrich metadata to support data discovery and governance.
The process of dividing the data into groups of entities whose members are in some way similar to each other. Data privacy and security professionals can then categorize that data as high, medium, and low sensitivity data.
A definition that allows each type of data in a data store to be programmatically detected, typically using a test or algorithm. Data privacy and security professionals associate data classes with rules that define actions that should be taken when a given data class is detected. For example, sensitive information or PII should be tagged with a business term or classification, and further for some sensitive data classes a specific data quality constraint should be applied.
Data classification is the process of organizing data into relevant categories to make it simpler to retrieve, sort, use, store, and protect.
A data classification policy, properly executed, makes the process of finding and retrieving critical data easier. This is important for risk management, legal discovery, and regulatory compliance. When creating written procedures and guidelines to govern data classification policies, it is critical to define the criteria and categories the organization will use to classify data.
Data classification can help make data more easily searchable and trackable. This is achieved by tagging the data. Data tagging allows organizations to label data clearly so that it is easy to find and identify. Tags also help you to manage data better and identify risks more readily. A data tag also enables it to be processed automatically and ensures timely and reliable access to data, as required by some state and federal regulations.
Most data classification projects help to eliminate duplication of data. By discovering and eliminating duplicate data, organizations can reduce storage and backup costs as well as reduce the risk of confidential data or sensitive data being exposed in case of a data breach.
Specifying data stewardship roles and responsibilities for employees inside the organization is part of data classification systems. Data stewardship is the tactical coordination and implementation of an organization's data assets, while data governance focuses on more high-level data policies and procedures.
Data classification increases data accessibility, enables organizations to meet regulatory compliance requirements more easily, and helps them to achieve business objectives. Often, organizations must ensure that data is searchable and retrievable within a specified timeframe. This requirement is impossible without robust classification processes for classifying data quickly and accurately.
To meet data security objectives, data classification is essential. Data classification facilitates appropriate security responses for data security based on the types of data being retrieved, copied, or transmitted. Without a data classification process, it is challenging to identify and appropriately protect sensitive data.
Data classification provides visibility into all data within an organization and enables it to use, analyze, and protect the vast quantities of data available through data collection. Effective data classification facilitates better protection for such data and promotes compliance with security policies.
Data classification tools are intended to provide data discovery capabilities; however, they often analyze data stores only for metadata or well-known identifiers. In complex environments, data discovery is ineffective if it can discover only dates but cannot identify whether they are a date of birth, a transaction date, or the dateline of an article. Without this additional information, these discovery tools cannot identify whether data is sensitive and therefore needs protection.
“The best DSPs will have semantic and contextual capabilities for data classification — judging what something really is, rather than relying on preconfigured identifiers.“ Gartner: 2023 Strategic Roadmap for Data Security Platform Adoption
Modern data security platforms must include semantic and contextual capabilities for data classification, to identify what a piece of data is rather than using preconfigured identifiers, which are less accurate and reliable. Because organizations are increasing the use of cloud computing services, more sensitive data is now in the cloud. However, a lot of the sensitive data is unstructured, which makes it harder to secure.
A data classification scheme enables you to identify security standards that specify appropriate handling practices for each data category. Storage standards that define the data's lifecycle requirements must be addressed as well. A data classification policy can help an organization achieve its data protection goals by applying data categories to external and internal data consistently.
Data discovery and inventory tools help organizations identify resources that contain high-risk data and sensitive data on endpoints and corporate network assets. These tools help organizations identify the locations of both sensitive structured data and unstructured data by analyzing hosts, database columns and rows, web applications, file shares, and storage networks.
Tagging or applying labels to data helps to classify data. This is an essential part of the data classification process. These tags and labels define the type of data, the degree of confidentiality, and the data integrity. The level of sensitivity is typically based on levels of importance or confidentiality, which aligns with the security measures applied to protect each classification level. Industry standards for data classification include three types:
While each approach has a place in data classification, user-based classification is a manual and time-consuming process, and extremely likely to be error-prone. It will not be effective at categorizing data at scale and may put protected data and restricted data at risk.
It is important for data classification efforts to include the determination of the relative risk associated with diverse types of data, how to manage that data, and where and how to store and send that data. There are three broad levels of risk for data and systems:
Automated tools can perform classification that defines personal data and highly sensitive data based on defined data classification levels. A platform that includes a classification engine can identify data stores that contain sensitive data in any file, table, or column in an environment. It can also provide ongoing protection by continuously scanning the environment to detect changes in the data landscape. New solutions can identify sensitive data and where it resides, as well as apply the context-based classification needed to decide how to protect it.
Classifying data as restricted, private, or public is an example of data classification. Like identifying risk levels, public data is the least-sensitive data and has the lowest security requirements. Restricted data receives the highest security classification, and it includes the most sensitive data, such as health data. A successful data classification process extends to include additional identification and tagging procedures to ensure data protection based on data sensitivity.
Security and risk leaders can only protect sensitive data and intellectual property if they know the data exists, where it is, why it is valuable, and who has access to use it. Data classification helps them to identify and protect corporate data, customer data, and personal data. Labeling data appropriately helps organizations to protect data and prevent unauthorized disclosure.
The General Data Protection Regulation (GDPR), among other data privacy and protection regulations, increases the importance of data classification for any organization that stores, transfers, or processes data. Classifying data helps ensure that anything covered by the GDPR is quickly identified so that appropriate security measures are in place. GDPR also increases protection for personal data related to racial or ethnic origin, political opinions, and religious or philosophical beliefs, and classifying these types of data can help to reduce the risk of compliance-related issues.
Organizations must meet the requirements of established frameworks, such as the GDPR, California Consumer Privacy Act (CCPA), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standard (PCI DSS), Gramm-Leach-Bliley Act (GLBA), Health Information Technology for Economic and Clinical Health (HITECH), among others. To do so, they must evaluate sensitive structured and unstructured data posture across Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) environments and contextualize risk as it relates to security, privacy, and other regulatory frameworks.
Data is every business’s most crucial asset – the foundation of any security program. Data Security Posture Management (DSPM) is an emerging security trend named by Gartner in its 2022 Hype Cycle for Data Security. The aim of DSPM solutions is to enable security and compliance teams to answer three fundamental questions:
The cloud has fundamentally changed how businesses function. Moving workloads and data assets is now simpler than ever, and is a boon for productivity, enabling businesses to quickly respond to customer demands and create new revenue opportunities. However, the pace and permissive nature of the cloud also dramatically expands a company’s threat surface and raises the likelihood of a data breach. Put simply, the distributed nature of the cloud seriously complicates data security.
Historically, a number of technologies have attempted to address challenges related to data security, including:
DSPM solutions combine capabilities from all three of these areas and represent the next-generation approach in cloud data security.
DSPM represents a next-generation approach to data security
DSPM vendors are taking a cloud-first approach to make it easier to discover, classify, assess, prioritize, and remediate data security issues. They are solving cloud security concerns by automating data detection and protection activities in a dynamic environment and at a massive scale.
Gartner Research summarizes the DSPM space, saying, “Data security posture management provides visibility as to where sensitive data is, who has access to that data, how it has been used, and what the security posture of the data store or application is. In simple terms, DSPM vendors and products provide “data discovery+” — that is, in-depth data discovery plus varying combinations of data observability features. Such features may include real-time visibility into data flows, risk, and compliance with data security controls. The objective is to identify security gaps and undue exposure. DSPM accelerates assessments of how data security posture can be enforced through complementary data security controls.” To summarize Gartner’s definition, DSPM provides visibility as to where sensitive data is, who has access to that data, how it has been used, and what the security posture of the data store or application is.
The foundation of a DSPM offering is data discovery and classification. Reports like Forrester’s Now Tech: Data Discovery And Classification, Q4 2020 dive deep into data discovery and classification technologies, which in Forrester’s case aligns to five segments: data management, information governance, privacy, security, and specialist concerns. These segments align with three major buying centers: global risk and compliance, security, and business units/product owners.
DSPM focuses on delivering automated, continuous, and highly accurate data discovery and classification for security teams. The following list provides clarity on how these approaches align with buying centers, all of which have data discovery and classification needs, but as you will see below, want to leverage it for different purposes:
Posture management solutions abound
Today there are three prevailing types of security tools that offer posture management solutions: cloud security posture management (CSPM), SaaS security posture management (SSPM), and data security posture management (DSPM). The solutions can be disintermediated as follows:
While DSPM solutions have focused on a cloud-first approach, data security is not limited only to cloud environments. Therefore more mature DSPM solutions will also include on-prem use cases since most businesses maintain some form of on-prem data, and will for years to come. In addition, as the DSPM space evolves, and solutions gain maturity, some will become more robust data security platforms, which will include the ability to:
DSPM solutions address key security use cases
Businesses thrive on collaboration. The current reality of highly distributed environments - many of which leverage cloud technologies - means that any file or data element can be easily shared at the click of a button. DSPM provides the missing piece to complete most security programs’ puzzles – a means of identifying, contextualizing, and protecting sensitive data.
DSPM solutions empower security teams to:
Data sprawl refers to the significant quantities of data many organizations create daily. Data sprawl can be defined as the generation of data, or digital information, created by businesses. Data is a valuable resource because it enables business leaders to make data-driven decisions about how to best serve their client base, grow their business, and improve their processes. However, managing vast amounts of data and so many data sources can be a serious challenge.
Large businesses, particularly enterprises, are generating a staggering amount of data due to the wide variety of software products in use, as well as newly introduced data formats, multiple storage systems in the cloud and in on-premises environments, and huge quantities of log data generated by applications. There is an overwhelming amount of data being generated and stored in the modern world.
As organizations scale and increasingly use data for analysis and investigation, that data is being stored in operating systems, servers, applications, networks, and other technologies. Many organizations generate massive quantities of new data all day, every day, including:
These files and records are dispersed across multiple locations, which makes inventorying, securing, and analyzing all that data extremely difficult.
Data sprawl is described as the ever-expanding amount of data produced by organizations every day. Amplified by the shift to the cloud, organizations can scale more rapidly, producing more and more data. New uses for big data continue to develop, requiring an increase in how much data is stored in operating systems, servers, networks, applications, and other technologies.
Further complicating matters, databases, analytics pipelines, and business workflows have been migrating rapidly to the cloud, moving across multiple cloud service providers (CSPs) and across structured and unstructured formats. This shift to the cloud is ongoing, and new data stores are created all the time. Security and risk management (SRM) leaders are struggling to identify and deploy data security controls consistently in this environment.
"...unstructured data sprawl (both on-premises and hybrid/multi-cloud) is difficult to detect and control when compared to structured data."
Gartner, Hype Cycle for Data Security, 2022
Organizations generate new data every hour of every day. The customer data in customer relationship management (CRM) systems may also include financial data, which is also in an accounting database or enterprise resource planning (ERP) system. Sales data and transactional data may be in those systems as well, and siloed by different departments, branches, and devices. To get the benefits promised by data analytics, data analysts need to cross reference multiple sources and therefore may have difficulty making accurate and informed decisions.
Ultimately, organizations need data to facilitate day-to-day workflows and generate analytical insights for smarter decision-making. The problem is that the amount of data organizations generate is spiraling out of control. According to a recent IDC study, the Global DataSphere is expected to more than double from 2022 to 2026. The worldwide DataSphere is a measure of how much new data is created, captured, replicated, and consumed each year, growing twice as fast in the Enterprise DataSphere compared to the Consumer DataSphere.
As organizations generate data at a faster pace, it is becoming harder to manage this information. Organizations might have data stored in various locations, making it hard to access business-critical information and generate accurate insights. Team members must cross-reference data in multiple formats from multiple sources, making analytics difficult. Managing dispersed information across different silos wastes time and money. Data may become corrupted during transmission, storage, and processing. Data corruption compromises the value of data, and the likelihood of corruption may increase alongside increasing data sprawl.
In addition, the effort is wasted when data is duplicated by employees who were not able to find the data needed where expected, which can then also result in ghost data. This duplicate data is considered redundant. Other data may be obsolete (out of date) or trivial (not valuable for business insights). This excess data results in excessive resource utilization and increases cloud storage costs.
Employees may be handling data carelessly, not understanding how the way they share and handle data can introduce risk. Unauthorized users may also have access to sensitive information, particularly when the data produced and stored is not appropriately managed. Manually classifying data is time-consuming and error-prone and may increase the risk of sensitive data exposure, so finding automated solutions is essential for managing large stores of data.
Data sprawl compromises data value and presents significant security risks. There are also security concerns because too much data can be difficult to control. This increases the chances of data breaches and other security risks. Furthermore, organizations that do not manage data sprawl may jeopardize the trust of customers and face strict penalties due to the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), or other data protection legislation for non-compliance.
Getting data sprawl under control requires a structured approach to data management. It is essential to have a solution in place to discover and classify data. Because data is spread across on-premises and cloud environments, it is critical to identify the environments where data is stored to ensure that all data is identified and managed. Tools that can discover and classify data in SaaS, IaaS, and PaaS environments are important, as are those that can find and classify structured and unstructured data. The goal of these tools is to create a unified view across the environment.
Identifying a central place to store data is one way to manage data sprawl. Cloud security standards continue to improve, making a centralized cloud repository an appealing option for many organizations. Cloud storage platforms are an excellent method of storing data in a way that creates a single source of truth that is more accessible to employees in many locations. At the same time, companies must establish data access governance (DAG) policies that outline how data should be collected, processed, and stored. These policies must also put policies in place to govern the data, including access controls, retention, risk management, compliance, and data disposition (how it is disposed of at the end of its lifecycle). DAG policies complement data loss prevention (DLP) programs. Data security posture management (DSPM) combines data discovery and classification, data loss prevention, and data access governance to create a next-generation approach to cloud data security.
For organizations that want to manage data sprawl, it is imperative to know what data exists in the environment, where it is located, and who has access to it. Different tools exist to manage all the data that organizations store, but few can prevent data sprawl.
Automated data discovery and data classification solutions must be able to identify and classify sensitive data. Artificial intelligence (AI) and machine learning (ML) can more accurately classify difficult-to-identify data, such as intellectual property and sensitive corporate data.
Data sprawl solutions can also increase overall data security by helping to locate and identify duplicate and redundant data. Once sprawling data has been identified and classified, it becomes easier to dispose of stale data or extraneous data. This can save on storage costs as well as eliminate duplicate and irrelevant data.
Enterprises collect data daily and it is easy to create multiple copies. The first step for companies that wish to manage access to data and prevent data loss is to fully understand their data — both where it is now, whether IT or security teams are aware of the data stores or not, and any data stores that are created in the future. Identifying sensitive data and who has access to them can help prevent data breaches by ensuring that appropriate security controls are enforced.
A repository for storing, managing and distributing data sets on an enterprise level.
Defense Industrial Base (DIB) contractors are companies that conduct business with the US military and are part of the military industry complex responsible for research, production, delivery, and service.
DIB contractors are responsible for meeting compliance requirements set by government policies and frameworks including the the Department of Defense Instruction (DoDI) 5200.48 and Cybersecurity Maturity Model Certification.
According to DoDi 5200.48, safeguarding Controlled Unclassified Information is a shared responsibility between DIB contractors and the Department of Defense.
Electronic Lab Notebooks (Electronic Laboratory Notebook or ELN) is the digital form of a paper lab notebook. In the pharmaceutical industry, it is used by researchers, scientists, and technicians to document observations, progress, and results from their experiments performed in a laboratory.
While ELN enables information to be documented and shared electronically, it also exposes proprietary information to malicious insiders or external hackers. As a result, ELN should be subject to appropriate security controls to prevent misuse or loss.
Encryption is the method of converting a plaintext into a cipher text so that only the authorized parties can decrypt the information and no third parties can tamper with the data. Unencrypted usually refers to data or information that is stored unprotected, without any encryption. Encryption is an important way for individuals and companies to protect sensitive information from hacking. For example, websites that transmit credit card and bank account numbers encrypt this information to prevent identity theft and fraud.
Where the a result of a query, algorithm or search only registers a match if there is a 100% match.
An unsupervised learning method whereby a series of files is divided into multiple groups, so that the grouped files are more similar to the files in their own group and less similar to those in the other groups.
Where scores of a result can fall from 0 - 100, based on the degree to which the search data and file data values match.
Ghost data is backups or snapshots of data stores where the original has been deleted. Ghost data is a type of shadow data, which includes unmanaged data store copies and snapshots or log data that are not included in an organization’s backup and recovery plans. Ghost data refers to data that still exists within a database or storage system but is no longer actively used or known to be accessible. For example, if a data store is created for a product and the product has been discontinued, the production data is usually removed as there is no longer a business justification to maintain it. However if copies of the data remain in staging or development environments they would be considered ghost data.
Ghost data occurs or is created due to a few reasons, such as when a user or program deletes a file or database entry, but the data is not permanently removed from the system. Ghost data also happens when data is migrated to a new system, but the old data is not completely erased from the original system.
Cloud adoption led to a proliferation of data. Much of that data is structured, secured, and monitored, but a considerable proportion of that data is unstructured, unsecured, and unmonitored. This presents real risks to organizations today. And while data collection and analysis can yield important business benefits, it can also increase the risk to the organization if not effectively managed. Ghost data presents significant risks to organizations because it cannot be effectively managed.
Ghost data can cause problems for organizations because it:
Ghost data may include sensitive data, including customer and employee personally identifiable information (PII).
“Over 30% of scanned cloud data stores are ghost data, and more than 58% of the ghost data stores contain sensitive or very sensitive data.” – Cyera Research
The problem with ghost data begins with how data is stored today. In the past, organizations had storage capability limited by hardware capacity. If an organization or team needed more storage space, the IT team purchased additional hardware, reviewed the data to determine what could be purged, or both.
The wide adoption of cloud storage and services changed that equation. Because the cloud is extensible, organizations can continually expand storage to accommodate the accumulation of data. Data is also being generated and stored at an unprecedented rate, creating an expansive data landscape. Further increasing the quantity of data, most organizations store multiple copies of data in different formats and different cloud providers. This makes it more difficult to identify duplicate and redundant data and much easier to have data stores that have become lost or are no longer tracked or protected.
Few companies delete older versions of data as it becomes obsolete. It is easy to store more data, but most organizations have no limits in place to trigger a review of the data across all environments, including multiple cloud environments. This results in data sprawl and creates challenges in data classification efforts.
“35% [of respondents] utilize at least two public cloud providers from a list that included Amazon Web Services, Google Cloud, Microsoft Azure, Alibaba, IBM, and Oracle; 17% of respondents rely on three or more.” – Cyera Research
Many organizations choose to keep older copies of data in case it is needed at some point. If there is no review or verification of the data — where it is, how much there is, what sensitive information exists in the data, or whether the data is securely stored — this ghost data both increases storage costs and poses a significant business risk.
Frequently, teams copy data to non-production environments. This not only creates an additional copy of the data but places it in a less secure environment. Non-production environments are not secured with the same rigor as production environments, therefore the sensitive data they contain is more susceptible to inadvertent disclosure or exfiltration. Frequently, ghost data is accessible to users that have no business justification for accessing that data, increasing data security risks.
These data copies also represent a potential EU General Data Protection Regulation (GDPR) violation. GDPR specifies that personal data be kept only as long as the data are required to achieve the business purpose it was collected for (except for scientific or historical research). After this period, the data must be disposed of appropriately, but when personal data exists in ghost data, it is likely to remain in the environment, increasing organizational risk. It can sometimes be difficult for IT teams to delete ghost data because they are unaware of it.
“60% of the data security posture issues present in cloud accounts stem from unsecured
sensitive data.” – Cyera Research
Sometimes, the database may be gone but snapshots are still there. Sometimes those snapshots are unencrypted, while other times the data stores exist in the wrong region. That exposes organizations to both increased costs and security risks. The additional data, unencrypted and in unknown locations, increases the attack surface for the organization.
Ghost data can increase the risk of ransomware because attackers do not care whether the data is up-to-date or accurate. They only care about what is easy to access and what is not being monitored. While the organization may not be aware of its ghost data, that lack of visibility does not protect it from attackers.
Stolen ghost data can be exfiltrated and used for malicious purposes. Cyber attackers can prove that they have access to the data and thereby execute a successful ransomware attack. Double extortion attacks are as successful with ghost data as with any other data because attackers have the same increased leverage. The attackers rely not only on encryption (which would not be of concern to an organization as it relates to ghost data). They can also publicly release the stolen data to encourage payment of the ransom. Robust backups cannot help with the issue of ghost data because the leverage to release data publicly remains the same.
Unfortunately, cloud providers offer limited visibility into what data customers have. Cloud service providers (CSPs) do not identify how sensitive data is. CSPs also do not provide specific advice on how to improve the security and risk posture of data across their cloud estate. This results in increased risks to cyber resilience and compliance. An organization’s lack of visibility into its data across all cloud providers increases the risk of exposing sensitive data. Similarly, ghost data or any other data store that is not adequately identified and classified is likely to have overly permissive access.
Significant changes in how data is managed and stored in cloud and hybrid environments have also led to new questions, including:
In modern corporate environments, it is important for all teams involved to understand their responsibilities when it comes to managing, security, and protecting data. It is a joint effort between builders and the security team. However, managing data on an ongoing basis remains a challenge without the technology to discover and classify sensitive data automatically.
Modern software solutions and products have had a significant impact in terms of creating data, increasing the data available for analytics, and growing complexity in corporate environments. AI/ML can help address the challenges created by these technological advances. In particular, AI/ML can help identify ghost data and increase data security by using continuous learning and automation to:
Robust AI/ML data classification solutions can accurately classify data that previously was challenging to identify, including intellectual property and other sensitive corporate data. AI/ML can also help enable organizations to make smarter decisions and create better policies about what to do with data and how to protect sensitive data.
To begin with, it is important to think of data as being at an advanced layer of security. In the past, data was not considered a layer of security. This is because there was no straightforward way to deal with data in the past. Today, with AI/ML, it is far easier to access, understand, and know the data within an organization and across all its disparate environments.
As technology has changed, the focus of security has moved from infrastructure to data-related security. While CISOs remain in charge of the technical aspects of security, new challenges in business and cybersecurity require more collaboration across the business team, IT, security, and privacy office to move forward and meet data security and data privacy requirements.
Regulations and requirements are becoming more stringent globally, requiring organizations to take more responsibility for the data they are collecting. This includes all the data spread across environments, including ghost data. Managing that data requires robust data discovery and data classification.
A database with storage, data, and compute services that is managed and maintained by a third-party provider instead of by an organization's IT staff.
Data that describes other data. For databases, metadata describes properties of the data store itself, as well as the definition of the schema.
The idea that organizations should only retain information as long as it is pertinent.
In cybersecurity, a risk assessment is a comprehensive analysis of an organization to identify vulnerabilities and threats. The goal of a risk assessment is to identify an organization’s risks and make recommendations for mitigating those risks. Risk assessments may be requested after a specific trigger, to complete an assessment before moving forward as part of larger governance and risk processes, or to assess a portfolio periodically as part of meeting an enterprise risk management or compliance objective.
Two popular risk assessment frameworks are the National Institute of Standards and Technology (NIST) Cybersecurity Framework and the International Organization for Standardization/ International Electrotechnical Commission (ISO)/IEC) 27001:2022 standard.
Risk assessments may be based on different methodologies: qualitative, quantitative, or a hybrid of the two. A quantitative assessment provides concrete data that includes the probability and potential impact of a threat based on data collection and statistical analysis. A qualitative assessment provides a more subjective, generalized view and what would happen to operations and productivity for different internal teams if one of the risks occurred.
A risk assessment should include an up-to-date inventory of the systems, vendors, and applications in scope for the assessment. This information helps security risk management leaders understand the risk associated with:
A single risk assessment provides a point in time snapshot of the current risks present and how to mitigate them. Ongoing or continuous risk assessments provide a more holistic view into the shifting risk landscape that exists in most organizations.
Risk assessments also help organizations assess and prioritize the risks to their information, including their data and their information systems. An assessment also helps security and technology leaders communicate risks in business terms to internal stakeholders, specifically the executive team and the board of directors. This information helps them make educated decisions about which areas of the cybersecurity program need to be prioritized and how to allocate resources in alignment with business goals.
The growth of digital business and related assets require better management of complex technology environments, which today include:
It also creates a growing volume of data, types of data, and technology assets. A comprehensive risk assessment should include these assets, allowing an organization to gain visibility into all its data, provide insight into whether any of that data is exposed, and identify any serious security issues. A data risk assessment can help an organization secure and minimize exposure of sensitive data by providing a holistic view of sensitive data, identifying overly permissive access, and discovering stale and ghost data.
Stale data is data collected that is no longer needed by an organization for daily operations. Sometimes the data collected was never needed at all. Most organizations store a significant amount of stale data, which may include:
Simply creating an updated version of a file and sharing it but not deleting the obsolete versions increases the quantity of stale or inactive data. This type of activity happens many times a day in the typical organization.
Increasingly, petabytes of data are stored in different public and private cloud platforms and are dispersed around the world. These file shares and document management systems, often poorly secured, present an appealing target for cyber attackers. If organizations store a significant amount of unstructured data, they are unlikely to have visibility into their data surface footprint, and even less likely to be protecting it adequately. Stale and unstructured data may be:
Stale data is also not relevant to daily operations and therefore can impede a business’s ability to make good business decisions based on current market conditions. A study by Dimensional Research showed that “82 percent of companies are making decisions based on stale information” and “85 percent state this stale data is leading to incorrect decisions and lost revenue.”
The shift to the cloud creates several challenges. Many organizations do not know what data it has, where it is located (on premises, in public or private cloud environments, or a mix of these), why it is being stored, and how the data is protected.
Although big data and data analysis can provide actionable insights and improve automation capabilities, much of the data organizations collect, process, and store is unorganized and unstructured. Unfortunately, stale or inactive data can increase storage costs and security risks alike, without providing any business benefit at all. To reduce risks, organizations must identify stale data and then decide whether to move the data (storing it more securely), archive the data, or delete it. Organizations must also establish a consistent policy to identify and manage stale data on an ongoing basis.
Data in a standardized format, with a well-defined structure that is easily readable by humans and programs. Most structured data is typically stored in a database. Though structured data only comprises 20 percent of data stored worldwide, its ease of accessibility and accuracy of outcomes makes it the foundation of current big data research and applications.
Tokenization entails the substitution of sensitive data with a non-sensitive equivalent, known as a token. This token then maps back to the original sensitive data through a tokenization system that makes tokens practically impossible to reverse without them. Many such systems leverage random numbers to produce secure tokens. Tokenization is often used to secure financial records, bank accounts, medical records and many other forms of personally identifiable information (PII).
Unmanaged data stores are deployments that must be completely supported by development or infrastructure teams, without the assistance of the cloud service provider. This additonal logistical burden may be undertaken by teams aiming to comply with data sovereignty requirements, abide by private network or firewall requirements for security purposes, or resource requirements beyond the provider's (database as a service) DBaaS size or IOPS
Data lacking a pre-defined model of organization or that does not follow one. Such data is often text-heavy, but can also include facts, figures and time and date information. The resulting irregularities and ambiguities make unstructured data much harder for programs to understand than data stored in databases with fields or documents with annotations. Many estimates claim unstructured data comprises the vast majority of global data, and that this category of data is growing rapidly.