Unlocking Sensitive Data Security with Snowflake's New UI
Written on
At Snowflake, our mission is to empower customers to fully leverage their data while adhering to strict compliance requirements and ensuring the protection of sensitive information. Recognizing the critical need for quick and effective identification of sensitive data, we consistently innovate with features such as classification, tag-based policies, and an intuitive Data Governance UI.
We are thrilled to announce the launch of the new Snowflake Data Classification UI in Snowsight, now available to all users. This feature simplifies the identification and tagging of sensitive information, streamlining data management and safeguarding processes for organizations. We will start with a brief overview, followed by an examination of the classification process using SQL and API calls, before delving into the specifics of this new user interface.
Simplifying Data Identification and Protection
Snowflake’s Data Classification feature equips organizations to effectively locate and label sensitive data. By automating the detection and tagging of personal information, users can bolster their data security while ensuring compliance with privacy laws.
Main Benefits
- Control Over Data Access: Enables informed decisions regarding who has access to sensitive information.
- Data Sharing Insights: Assists in recognizing Personally Identifiable Information (PII), facilitating better choices about third-party data sharing.
- Policy Enforcement: Aids in the implementation of masking or row access policies to protect sensitive information.
Functionality of the Feature
The Data Classification feature in Snowflake scans table columns to identify personal and sensitive information, applying predefined system tags to the detected data. These tags are divided into two primary categories:
- Semantic Category (SNOWFLAKE.CORE.SEMANTIC_CATEGORY): Marks personal attributes such as names, ages, or phone numbers.
- Privacy Category (SNOWFLAKE.CORE.PRIVACY_CATEGORY): Classifies data types into categories like Identifier, Quasi-Identifier, or Sensitive data.
These tags play an essential role for data engineers in monitoring and safeguarding information, ensuring compliance with privacy regulations.
Classification Management via SQL
To utilize Snowflake’s classification process, users need specific privileges, including roles with SELECT and APPLY TAG rights. Snowflake provides built-in views and functions to help track classification activities and tag assignments.
Classification can be performed through SQL commands or via the Snowsight interface, offering flexibility in data management. Users can opt to classify individual tables or conduct asynchronous classifications for all tables within a schema.
- To classify a specific table, utilize the system call SYSTEM$CLASSIFY.
CALL SYSTEM$CLASSIFY('<table_name>', {'auto_tag': true});
- To view tag assignments, query TAG_REFERENCES_ALL_COLUMNS.
SELECT * FROM TABLE(hr.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS('<table_name>', 'table'));
Data Classification via API
While the Data Classification UI in Snowsight provides an easy method for managing and applying tags to sensitive information, advanced users may prefer Snowflake’s traditional APIs for more detailed control. These APIs, including EXTRACT_SEMANTIC_CATEGORIES and ASSOCIATE_SEMANTIC_CATEGORY_TAGS, facilitate comprehensive data classification and tag management.
Overview of Traditional APIs
The classic APIs for data classification cater to users seeking additional control or wishing to automate their classification workflows. These APIs are operational, although they do not receive new feature updates. For more complex requirements, the following APIs are available:
- EXTRACT_SEMANTIC_CATEGORIES: Analyzes table columns to identify and extract semantic categories, such as names or ages.
- ASSOCIATE_SEMANTIC_CATEGORY_TAGS: Assigns classification tags to columns based on the results from EXTRACT_SEMANTIC_CATEGORIES.
Classifying Data with Traditional APIs
1. Classify a Single Table
To classify a specific table, follow these steps:
- Analyze: Execute the EXTRACT_SEMANTIC_CATEGORIES function to identify the semantic categories of the columns.
SELECT EXTRACT_SEMANTIC_CATEGORIES('my_db.my_schema.my_table');
- Review: Check the output to ensure the categories are accurate.
- Apply: Use the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure to automatically apply tags.
CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS('my_db.my_schema.hr_data', EXTRACT_SEMANTIC_CATEGORIES('my_db.my_schema.hr_data'));
Alternatively, tags can be applied manually using an ALTER TABLE statement.
ALTER TABLE my_db.my_schema.hr_data
MODIFY COLUMN fname
SET TAG SNOWFLAKE.CORE.SEMANTIC_CATEGORY='NAME';
Utilizing the Classification API unlocks powerful capabilities. For further details, refer to the official documentation linked below.
Data Classification via Snowsight
Within Snowsight, users can initiate a data classification job across an entire schema or select specific tables. This method allows for larger-scale classifications without needing to analyze results on a per-table basis. An auto-tagging feature is also available, enabling high-confidence classifiers to automatically tag objects, simplifying the classification process.
Users can choose to apply these tags automatically or review them manually prior to application. The user-friendly interface provides a clear overview of classification outcomes, facilitating the management and protection of sensitive information.
Steps to Classify and Tag Tables in a Schema:
- Start Classification and Tagging:
- Open Snowsight and navigate to the desired schema via the object explorer.
- Access options by selecting the More menu (...).
- Choose Classify and Tag Sensitive Data.
- Select Warehouse and Tables:
- If not already in use, select a warehouse.
- Choose the tables you wish to classify. By default, no tables are preselected.
- Configure Advanced Options:
- Auto-tagging Data: Automatically applies tags to columns after classification. This is enabled by default but can be disabled if needed.
- Include Custom Classifiers: Utilize custom classifiers available to you. Check access by selecting View custom classifiers and running the provided command in a worksheet.
- After reviewing and adjusting these settings, select Classify and Tag Sensitive Data. Note that Snowsight can classify up to 1,000 tables.
- Monitor and Review Classification:
- Allow the classification process to complete. A green checkmark will indicate completion in the CLASSIFICATION column.
- Click View Results to inspect and, if necessary, modify tag values. Ensure you have the necessary privileges on the SNOWFLAKE database to make changes.
- Finalize Classification:
- Follow prompts to review and approve classification results. Adjust tag values as necessary.
- Select Complete Classification to apply the reviewed tags.
- Verify Tag Assignments:
- To confirm tag assignments, select the table, navigate to the Columns tab, and review the TAGS column.
- Alternatively, use a worksheet to invoke the TAG_REFERENCES_ALL_COLUMNS function to view tag assignments for specific columns.
- Review Classification Records:
- Consult the Account Usage DATA_CLASSIFICATION_LATEST view for records by navigating through the object explorer or querying in a worksheet.
Best Practices for Classification
- Validation: Prioritize classifying frequently accessed data objects.
- Consistency: Employ clear, consistent column names and appropriate data types to improve classification accuracy.
Conclusion
Snowflake’s Data Classification feature represents a significant advancement in assisting organizations to identify, label, and protect sensitive data. By streamlining these processes, Snowflake enables users to accelerate data analysis while ensuring compliance with data privacy standards. Whether utilizing the Snowsight UI or SQL commands, Snowflake simplifies data protection, enhancing both data governance and security.
For further information, visit: https://docs.snowflake.com/en/user-guide/classify-intro