Hash Text
This section provides a comprehensive description of the Hash Text Rule.
For a summary of the rule and its compatibility with platform jobs and execution environments, see Masking Rule Types.
Data Types
The supported data types for this rule are:
Text
Description
The original value is completely replaced by a generated SHA256 salted hash of the value and a pepper (secret salt). By default, the hash will be a base 64 string; however, you can provide a regular expression for the hash output.
The rule does not utilize a Token Vault, so there is no support for Unmasking of any output value. As the input value is hashed, there is no way to return to the input value; the rule output is always irreversible.
Consistency in tokenization is achieved by the hashing function. Effectively, the function will always return the same output value for the same input value within a given PDD for a specific rule.
Note
This rule can only be used in a Job that has a KMS configured in the platform Environment. For more information, see Key Management Environment Configuration.
Masking Behavior
The options are described in the following table and assume that the original value is not null:
Option | Description |
---|---|
default | If you do not specify a regular expression, the default output is a (hashed) base 64 string. |
Regular expression | The pattern that the generated text should match. Using a regular expression with the rule ensures that it is possible to add a Watermark to the dataset. It is not possible to add a watermark if the rule is used without a regular expression. For more information, see Watermarking a Dataset. For more information about the regular expression syntax supported in the platform, see Regular Expression Syntax. (Click on the RegExp class in the Class Summary table.) |
Examples
The following diagram illustrates the output behavior of the rule. The first two examples show how the same input value produces the same output value. The final example shows how the output value can be changed using a regular expression:
Here are some other examples of regular expressions that could be used to match some example fields and formats:
Field | Format | Expression |
---|---|---|
Email address | xxxxxxx@xxxxx.com | [a-z]{7}\@[a-z]{5}\.com |
Surname | xxxxxxxx | [a-z]{8} |
Tokenization Behavior
Tokenization Behavior contains various settings that determine how tokenization is performed when the rule is applied to a dataset.
For the Hash text rule, the Behavior setting is fixed as:
Consistency enforced by hashing function but duplicate tokens are possible
This means that the hashing function ensures that the same input value will always return the same output value. But, there is the unlikely possibility of duplicate tokens being generated. (The collision resistance of the SHA256 hashing algorithm is discussed in many external publications.)
However, collisions are much more likely if a regular expression is specified with the rule. For example, if the regular expression defines an output that is smaller than the default hash output.
If Retain NULL values is checked, NULL values in the input will not be replaced or tokenized and will be retained as NULL in the output.
Hash Text Environment Requirements
The hash text rule can only be used in environments that have a KMS configured.
The rule does not require the user to specify any key details as the key will be created automatically on the first execution of any hash text rule in a given environment. The Privitar Platform does not support rotation or deletion of the key (even in case the environment is deleted or the KMS type is changed). In case the key is deleted or rotated by the user directly from the KMS this will result in any newly processed data to be inconsistent with previously tokenized data.
Required Permissions for using the Hash Text Rule with AWS Secrets Manager
For the Hash Text Rule all the communication with the KMS in this case AWS Secrets Manager is done on the on the execution engine (POD, Hadoop Batch Processor, SDK, other data flow processor) using the AWS SDK and will use the Region and Endpoint to connect to the AWS secrets manager.
We require both read and write access to AWS Secrets Manager for the Hash Text rule as we will create the secret if missing on first use. AWS has a managed policy for Secrets Manager that will grant the necessary permission, however customer might chose another policy or create its own in which case we will require that this grants the necessarily permissions for the following AWS Secrets Manager actions:
CreateSecret - see Minimum permissions required (we always require the secretsmanager:TagResource permission as we tag the secret with the key algorithm)
In case of using customer manager CMK they also need to grant access to the AWS KMS for the following actions:
GenerateDataKey - needed only if you use a customer-managed AWS KMS key to encrypt the secret. You do not need this permission to use the account default AWS managed CMK for Secrets Manager.
Decrypt - needed only if you use a customer-managed AWS KMS key to encrypt the secret. You do not need this permission to use the account default AWS managed CMK for Secrets Manager.
Required Permissions for using the Hash Text Rule with AWS Secrets Manager
By default, the secret created by the Hash Text rule will be able to seen as decided by the existing IAM policies. We recommend that you restrict this to a specific role, specifically the role that you create while following these steps. You can also do this manually by attaching an AWS Resource Policy to the secret. To learn more: AWS Documentation: Attaching a resource-based policy to a secret
To have the platform create the secret, run a job using the environment that you created when following the steps in Configure an AWS Secret Manager as a KMS in Policy Manager.
In the Amazon Elastic Compute Cloud (EC2) console, navigate to AWS Secrets Manager > Secrets.
Search for your environment ID.
Open the secret created by the platform, which will have a name with a format similar to “privitar-hash-rule-[YOUR_ENVIRONMENT_ID].”
Edit resource permissions and add a restriction like that in this example, replacing the Amazon Resource Name (ARN) with the ARN of the role that you just created.
To restrict access to just SecretsManager_Limited in account 12345555555, the resource policy would have the form:
{ "Version" : "2012-10-17", "Statement" : [ { "Effect" : "Deny", "Principal" : "*", "Action" : "secretsmanager:*", "Resource" : "*", "Condition" : { "ArnNotLike" : { "aws:PrincipalArn" : "arn:aws:iam::1234555555:role/Secrets_manager_limited" } } } ] }
The user updating the policy must assume the role to which access will be restricted, otherwise the user will be locked out. If they can’t assume this role, then either they should receive temporary access to be able to update the resource policy, or the resource policy should include the user, in a form similar to the following:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Deny", "Principal": "*", "Action": "*", "Resource": "*", "Condition": { "StringNotLike": { "aws:PrincipalARN": [ "<ARN-OF-ALLOWED-ROLE>", "<ARN-OF-ALLOWED-USER>" ] } } } ] }
Glue Environments
Glue deployments have permission to create secrets and the first time a hash text rule is used, it will create one for that rule.
The following resource policy is automatically attached to the secret created.
{ "Version": "2012-10-17", "Statement" : [ { "Effect" : "Deny", "Principal": "*", "NotAction" : [ "secretsmanager:DeleteSecret", "secretsmanager:RestoreSecret", "secretsmanager:GetResourcePolicy" ], "Resource" :"*", "Condition" : { "ArnNotLike" : { "aws:PrincipalArn" : <Glue job IAM role arn> } } }] }
This denies access to all principals except the IAM role that the Glue job uses. To allow the secret to be deleted in the future, it doesn't deny DeleteSecret permissions. Note that the secret would have to be deleted via the AWS CLI command below, because deletion from the AWS console UI requires DescribeSecret permissions.
aws secretsmanager delete-secret --secret-id <secret name> --force-delete-without-recovery
Configure an AWS Secret Manager as a KMS in Policy Manager
Open Privitar Policy Manager.
Choose
from the navigation menu.Create a new environment.
Select
on the KMS tab.
Configuring the region, endpoint, and AWS key is optional but highly recommended for production environments.
Region; AWS region with which the AWS SDK should communicate. Best practice is to specify the same region as the Data Processor Instance.
Endpoint; URL representing the endpoint with which the AWS SDK should communicate. Best practice is to configure a virtual private cloud (VPC) endpoint to connect to AWS Secrets Manager. To learn more: AWS Documentation: Using Secrets Manager with VPC endpoints
KMS Key; Enter the key ID of the key that you created in Create a KMS Key to Be Used by the AWS Secrets Manager.
Save the environment and make note of the environment ID for future use.