Key Takeaways
- Effective data privacy and compliance start with understanding and remediating legacy data across unstructured data repositories.
- Building a strong project plan, assembling a team, and preparing resources and access credentials is critical before starting a data cleanup project.
- Identifying and defensibly deleting redundant, obsolete, and trivial (ROT) data, along with duplicates, streamlines sensitive data analysis.
- Following a cleanup project with ongoing sensitive data monitoring and reporting prevents future accumulation and mitigates compliance risks.
Robinson + Cole’s Kathryn Rattigan and Jim Merrifield wrote a recent blog post about the risks of storing sensitive data on network drives. Cleaning up legacy file shares to remediate sensitive data is one of the practical tips they shared. But when those file shares contain terabytes and even petabytes of legacy data, where do you start? And once the cleanup is completed, how do you take steps to ensure you don’t enter a vicious cycle of continuously conducting cleanup projects every few years as data reaccumulates?
In this post, we’ll share an approach for conducting a file share cleanup with the goal of supporting data privacy policies across the information governance and information security domains. In future posts, we’ll share thoughts on why labeling sensitive data alone is not sufficient and how to shift from project-based data cleanups to an ongoing sensitive data monitoring and minimization program.
Privacy-Driven Data Compliance Requires a Cleanup Before the Cleanup
Before you can move to a new state of managing sensitive data through its lifecycle, you need to first understand and remediate legacy data accumulated across your unstructured data repositories. Data discovery, conducted with file analysis tools, allows you to narrow down the scope of data that needs to be searched for sensitive content. ActiveNav Cloud is specifically designed to index and analyze unstructured digital information. It empowers our customers to quickly identify and address non-records, duplicates and redundant, obsolete, and trivial (ROT) information, then do a deep-dive analysis of the remaining data to find and manage sensitive information. Breaking a data cleanup into phases with defined outcomes is essential for achieving both goals.
Phase I: Project and Data Preparation
The first phase of data cleanup is to create a project plan, organize resources, and define the outcomes of your project. In this phase, you will complete the following before moving forward with discovery:
- Form a team: Identify a project sponsor, IT team members, and a project/program manager. The General Counsel or CIO are ideal sponsors, and Data Governance, Information Security, Information Governance, and Legal Privacy and Compliance employees bring important perspectives to the table.
- Define the content to discover: Work with stakeholders to identify the types of sensitive data your organization needs to remediate from the unstructured data sources.
- Secure technology: Select and onboard a file analysis technology to conduct discovery on your data repositories.
- Specify data sources: Prepare a list of file shares and other unstructured data repositories to be addressed, the locations, and their estimated volume.
- Prepare credentials: If needed, obtain and test access credentials for the data repositories and their underlying file systems.
- Engage data owners: Define the data repository content owners by business unit and ensure they are prepared to review and address the results of your data discovery project.
Phase II: Initial Discovery and Data Reduction
The goals for the second phase of the cleanup are to understand the characteristics of your data and take early action. In this phase, you will identify the immediate steps your data owners can take to defensibly delete redundant, obsolete, and trivial (ROT) and duplicate data before a more detailed file analysis for sensitive data. The following steps reduce your dataset and make sensitive data analysis and remediation faster and easier in Phase III of the cleanup process:
- Prioritize Repositories for First Discovery: Determine the order in which you will conduct a “first discovery” review of your repositories. Keep in mind the larger the estimated volume of data, the longer first discovery will take.
- Identify Criteria for Defensible Deletion: Create and secure approval for a deletion standard and rules for identifying redundant, outdated, and trivial (ROT) content, as well as duplicates, based on your organization’s policies. Example categories of ROT include non-records, transitory data, drafts, backups, and outdated content.
- Run Initial Discovery: Sequence and complete an initial discovery across the target repositories and locations to uncover the shape and nature of the content (e.g. age, location, file type, file size).
- Organize Data by Business Area: Group the data results by business function / department to make it easier for the data review and action.
- Analyze Results: Engage the data owners and subject matter experts to review results by business area in aggregate across scanned repositories or one repository at a time to identify ROT files and duplicates and generate manifests cataloging data eligible for deletion under the defensible deletion standard.
- Delete and Document: Confirm business owners remediate eligible ROT data and duplicates, by either quarantining data and setting an expiration or defensibly deleting it. Whichever course is chosen, the results should be documented and archived per the guidance of the legal department and information governance policy.
Conducted in sequence, the phases described above make the entire project go faster and generate insights for strengthening compliance practices going forward. This lays a solid foundation for your organization to act on the discovery of sensitive data uncovered in phase III. Skipping the first two phases adds unnecessary time and complexity to your cleanup project.
Linking Sensitive Data Discovery to Business Areas Drives Compliant Remediation
Taking the actions outlined in Phases I and II to organize your project and defensibly delete ROT and duplicates makes identifying sensitive data in the remaining content faster, easier to review, and more actionable. Following the steps below in Phase III helps your organization find and remediate sensitive data, as well as links it to the business processes that are driving its creation and retention.
Phase III: Sensitive Data Discovery, Process Identification, and Remediation
- Configure Sensitive Data Discovery Rules: Select and, if necessary, develop customized rules for identifying and classifying sensitive data in the repositories.
- Prioritize and Sequence Data Repositories for Sensitive Data Analysis: Based on the insights gained in phases I and 2, determine the order for repository analysis based on their risk potential, the size and scope of the remaining data, or the business areas using the repository.
- Complete Sensitive Data Analysis: Perform comprehensive sensitive data analysis on the selected repositories to locate and report occurrences of sensitive data.
- Report Results by Business Area: Group and score the sensitive data results by business areas and generate manifests cataloging sensitive data for review and action.
- Link Sensitive Data to Business Processes: Before remediating the data discovered, where appropriate have business areas connect any sensitive data found to the business processes linked to its collection. Take care to distinguish which processes are obsolete and which are still active.
- Remediate and Document: Each business area coordinates and documents action to label, review, and apply access permissions, move, or defensibly delete sensitive data based on business process requirements.
- Address Gaps: Business areas should review the process requirements for sensitive data lifecycle management including storage, labeling, access, retention, and disposition, and take appropriate action to address gaps that resulted in misfiled or mismanaged sensitive data (e.g., review and revise procedures, counsel data owners, conduct training).
A privacy-driven unstructured data cleanup is completed once the repositories are defensibly purged of ROT and duplicates, and the sensitive data discovered is remediated. When the sensitive data discovery project ends the program for ongoing sensitive data monitoring and response should begin. Regular discovery of sensitive data prevents its accumulation from reoccurring unchecked, in addition to mitigating risk from a data breach, or running afoul of outside counsel agreement terms.
However, simply monitoring and correcting misplaced and mismanaged sensitive data is not enough to ensure the adoption of data minimization practices going forward. The goal of every sensitive data cleanup project should be to ensure you never need to undergo another one. In our next post, we’ll share an approach for incorporating the findings of a sensitive data cleanup into new workflows to not only mitigate sensitive data, but to minimize it from the start.