Amazon RDS Multi-Cloud Scanning Connector for Azure Purview (Public preview)
The Multi-Cloud Scanning Connector for Azure Purview allows you to explore your organizational data across cloud providers, including Amazon Web Services, in addition to Azure storage services.
This article describes how to use Azure Purview to scan your structured data currently stored in Amazon RDS, including both Microsoft SQL and PostgreSQL databases, and discover what types of sensitive information exists in your data. You'll also learn how to identify the Amazon RDS databases where the data is currently stored for easy information protection and data compliance.
For this service, use Purview to provide a Microsoft account with secure access to AWS, where the Multi-Cloud Scanning Connectors for Azure Purview will run. The Multi-Cloud Scanning Connectors for Azure Purview use this access to your Amazon RDS databases to read your data, and then reports the scanning results, including only the metadata and classification, back to Azure. Use the Purview classification and labeling reports to analyze and review your data scan results.
Important
The Multi-Cloud Scanning Connectors for Azure Purview are separate add-ons to Azure Purview. The terms and conditions for the Multi-Cloud Scanning Connectors for Azure Purview are contained in the agreement under which you obtained Microsoft Azure Services. For more information, see Microsoft Azure Legal Information at https://azure.microsoft.com/support/legal/.
Important
Purview support for Amazon RDS is currently in PREVIEW. The Azure Preview Supplemental Terms include additional legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
Purview scope for Amazon RDS
Supported database engines: Amazon RDS structured data storage supports multiple database engines. Azure Purview supports Amazon RDS with/based on Microsoft SQL and PostgreSQL.
Maximum columns supported: Scanning RDS tables with more than 300 columns is not supported.
Public access support: Purview supports scanning only with VPC Private Link in AWS, and does not include public access scanning.
Supported regions: Purview only supports Amazon RDS databases that are located in the following AWS regions:
- US East (Ohio)
- US East (N. Virginia)
- US West (N. California)
- US West (Oregon)
- Europe (Frankfurt)
- Asia Pacific (Tokyo)
- Asia Pacific (Singapore)
- Asia Pacific (Sydney)
- Europe (Ireland)
- Europe (London)
- Europe (Paris)
IP address requirements: Your RDS database must have a static IP address. The static IP address is used to configure AWS PrivateLink, as described in this article.
Known issues: The following functionality is not currently supported:
- The Test connection button. The scan status messages will indicate any errors related to connection setup.
- Selecting specific tables in your database to scan.
- Data lineage.
For more information, see:
- Manage and increase quotas for resources with Azure Purview
- Supported data sources and file types in Azure Purview
- Use private endpoints for your Purview account
Prerequisites
Ensure that you've performed the following prerequisites before adding your Amazon RDS database as Purview data sources and scanning your RDS data.
- You need to be an Azure Purview Data Source Admin.
- You need a Purview account. Create an Azure Purview account instance, if you don't yet have one.
- You need an Amazon RDS PostgreSQL or Microsoft SQL database, with data.
Configure AWS to allow Purview to connect to your RDS VPC
Azure Purview supports scanning only when your database is hosted in a virtual private cloud (VPC), where your RDS database can only be accessed from within the same VPC.
The Azure Multi-Cloud Scanning Connectors for Azure Purview service run in a separate, Microsoft account in AWS. To scan your RDS databases, the Microsoft AWS account needs to be able to access your RDS databases in your VPC. To allow this access, you’ll need to configure AWS PrivateLink between the RDS VPC (in the customer account) to the VPC where the Multi-Cloud Scanning Connectors for Azure Purview run (in the Microsoft account).
The following diagram shows the components in both your customer account and Microsoft account. Highlighted in yellow are the components you’ll need to create to enable connectivity RDS VPC in your account to the VPC where the Multi-Cloud Scanning Connectors for Azure Purview run in the Microsoft account.
Important
Any AWS resources created for a customer's private network will incur extra costs on the customer's AWS bill.
Configure AWS PrivateLink using a CloudFormation template
The following procedure describes how to use an AWS CloudFormation template to configure AWS PrivateLink, allowing Purview to connect to your RDS VPC. This procedure is performed in AWS and is intended for an AWS admin.
This CloudFormation template is available for download from the Azure GitHub repository, and will help you create a target group, load balancer, and endpoint service.
If you have multiple RDS servers in the same VPC, perform this procedure once, specifying all RDS server IP addresses and ports. In this case, the CloudFormation output will include different ports for each RDS server.
When registering these RDS servers as data sources in Purview, use the ports included in the output instead of the real RDS server ports.
If you have RDS servers in multiple VPCs, perform this procedure for each of the VPCs.
Tip
You can also perform this procedure manually. For more information, see Configure AWS PrivateLink manually (advanced).
To prepare your RDS database with a CloudFormation template:
Download the CloudFormation RDSPrivateLink_CloudFormation.yaml template required for this procedure from the Azure GitHub repository:
At the right of the linked GitHub page, select Download to download the zip file.
Extract the .zip file to a local location so that you can access the RDSPrivateLink_CloudFormation.yaml file.
In the AWS portal, navigate to the CloudFormation service. At the top-right of the page, select Create stack > With new resources (standard).
On the Prerequisite - Prepare Template page, select Template is ready.
In the Specify template section, select Upload a template file. Select Choose file, navigate to the RDSPrivateLink_CloudFormation.yaml file you downloaded earlier, and then select Next to continue.
In the Stack name section, enter a name for your stack. This name will be used, together with an automatically added suffix, for the resource names created later in the process. Therefore:
- Make sure to use a meaningful name for your stack.
- Make sure that the stack name is no longer than 19 characters.
In the Parameters area, enter the following values, using data available from your RDS database page in AWS:
Name Description Endpoint & port Enter the resolved IP address of the RDS endpoint URL and port. For example: 192.168.1.1:5432
- If an RDS proxy is configured, use the IP address of the read/write endpoint of the proxy for the relevant database. We recommend using an RDS proxy when working with Purview, as the IP address is static.
- If you have multiple endpoints behind the same VPC, enter up to 10, comma-separated endpoints. In this case, a single load balancer is created to the VPC, allowing a connection from the Amazon RDS Multi-Cloud Scanning Connector for Azure Purview in AWS to all RDS endpoints in the VPC.Networking Enter your VPC ID VPC IPv4 CIDR Enter the value your VPC's CIDR. You can find this value by selecting the VPC link on your RDS database page. For example: 192.168.0.0/16Subnets Select all the subnets that are associated with your VPC. Security Select the VPC security group associated with the RDS database. When you're done, select Next to continue.
The settings on the Configure stack options are optional for this procedure.
Define your settings as needed for your environment. For more information, select the Learn more links to access the AWS documentation. When you're done, select Next to continue.
On the Review page, check to make sure that the values you selected are correct for your environment. Make any changes needed, and then select Create stack when you're done.
Watch for the resources to be created. When complete, relevant data for this procedure is shown on the following tabs:
Events: Shows the events / activities performed by the CloudFormation template
Resources: Shows the newly created target group, load balancer, and endpoint service
Outputs: Displays the ServiceName value, and the IP address and port of the RDS servers
If you have multiple RDS servers configured, a different port is displayed. In this case, use the port shown here instead of the actual RDS server port when registering your RDS database as Purview data source.
In the Outputs tab, copy the ServiceName key value to the clipboard.
You'll use the value of the ServiceName key in the Azure Purview portal, when registering your RDS database as Purview data source. There, enter the ServiceName key in the Connect to private network via endpoint service field.
Register an Amazon RDS data source
To add your Amazon RDS server as an Azure Purview data source:
In Azure Purview, navigate to the Data Map page, and select Register
.On the Sources page, select Register. On the Register sources page that appears on the right, select the Database tab, and then select Amazon RDS (PostgreSQL) or Amazon RDS (SQL).
Enter the details for your source:
Field Description Name Enter a meaningful name for your source, such as AmazonPostgreSql-UpsServer name Enter the name of your RDS database in the following syntax: <instance identifier>.<xxxxxxxxxxxx>.<region>.rds.amazonaws.com
We recommend that you copy this URL from the Amazon RDS portal, and make sure that the URL includes the AWS region.Port Enter the port used to connect to the RDS database:
- PostgreSQL:5432
- Microsoft SQL:1433
If you've configured AWS PrivateLink using a CloudFormation template and have multiple RDS servers in the same VPC, use the ports listed in the CloudFormation Outputs tab instead of the read RDS server ports.Connect to private network via endpoint service Enter the ServiceName key value obtained at the end of the previous procedure.
If you've prepared your RDS database manually, use the Service Name value obtained at the end of Step 5: Create an endpoint service.Collection (optional) Select a collection to add your data source to. For more information, see Manage data sources in Azure Purview (Preview). Select Register when you’re ready to continue.
Your RDS data source appears in the Sources map or list. For example:
Create Purview credentials for your RDS scan
Credentials supported for Amazon RDS data sources include username/password authentication only, with a password stored in an Azure KeyVault secret.
Create a secret for your RDS credentials to use in Purview
Add your password to an Azure KeyVault as a secret. For more information, see Set and retrieve a secret from Key Vault using Azure portal.
Add an access policy to your KeyVault with Get and List permissions. For example:
When defining the principal for the policy, select your Purview account. For example:
Select Save to save your Access Policy update. For more information, see Assign an Azure Key Vault access policy.
In Azure Purview, add a KeyVault connection to connect the KeyVault with your RDS secret to Purview. For more information, see Credentials for source authentication in Azure Purview.
Create your Purview credential object for RDS
In Azure Purview, create a credentials object to use when scanning your Amazon RDS account.
In the Purview Management area, select Security and access > Credentials > New.
Select SQL authentication as the authentication method. Then, enter details for the Key Vault where your RDS credentials are stored, including the names of your Key Vault and secret.
For example:
For more information, see Credentials for source authentication in Azure Purview.
Scan an Amazon RDS database
To configure an Azure Purview scan for your RDS database:
From the Purview Sources page, select the Amazon RDS data source to scan.
Select
New scan to start defining your scan. In the pane that opens on the right, enter the following details, and then select Continue.- Name: Enter a meaningful name for your scan.
- Database name: Enter the name of the database you want to scan. You’ll need to find the names available from outside Purview, and create a separate scan for each database in the registered RDS server.
- Credential: Select the credential you created earlier for the Multi-Cloud Scanning Connectors for Azure Purview to access the RDS database.
On the Select a scan rule set pane, select the scan rule set you want to use, or create a new one. For more information, see Create a scan rule set.
On the Set a scan trigger pane, select whether you want to run the scan once, or at a recurring time, and then select Continue.
On the Review your scan pane, review the details and then select Save and Run, or Save to run it later.
While you run your scan, select Refresh to monitor the scan progress.
Note
When working with Amazon RDS PostgreSQL databases, only full scans are supported. Incremental scans are not supported as PostgreSQL does not have a Last Modified Time value.
Explore scanning results
After a Purview scan is complete on your Amazon RDS databases, drill down in the Purview Data Map area to view the scan history. Select a data source to view its details, and then select the Scans tab to view any currently running or completed scans.
Use the other areas of Purview to find out details about the content in your data estate, including your Amazon RDS databases:
Explore RDS data in the catalog. The Purview catalog shows a unified view across all source types, and RDS scanning results are displayed in a similar way to Azure SQL. You can browse the catalog using filters or browse the assets and navigate through the hierarchy. For more information, see:
View Insight reports to view statistics for the classification, sensitivity labels, file types, and more details about your content.
All Purview Insight reports include the Amazon RDS scanning results, along with the rest of the results from your Azure data sources. When relevant, an Amazon RDS asset type is added to the report filtering options.
For more information, see the Understand Insights in Azure Purview.
View RDS data in other Purview features, such as the Scans and Glossary areas. For more information, see:
Configure AWS PrivateLink manually (advanced)
This procedure describes the manual steps required for preparing your RDS database in a VPC to connect to Azure Purview.
By default, we recommend that you use a CloudFormation template instead, as described earlier in this article. For more information, see Configure AWS PrivateLink using a CloudFormation template.
Step 1: Retrieve your Amazon RDS endpoint IP address
Locate the IP address of your Amazon RDS endpoint, hosted inside an Amazon VPC. You’ll use this IP address later in the process when you create your target group.
To retrieve your RDS endpoint IP address:
In Amazon RDS, navigate to your RDS database, and identify your endpoint URL. This is located under Connectivity & security, as your Endpoint value.
Tip
Use the following command to get a list of the databases in your endpoint:
aws rds describe-db-instancesUse the endpoint URL to find the IP address of your Amazon RDS database. For example, use one of the following methods:
Ping:
ping <DB-Endpoint>nslookup:
nslookup <Db-Endpoint>Online nslookup. Enter your database Endpoint value in the search box and select Find DNS records. NSLookup.io shows your IP address on the next screen.
Step 2: Enable your RDS connection from a load balancer
To ensure that your RDS connection will be allowed from the load balancer you create later in the process:
Find the VPC IP range.
In Amazon RDS, navigate to your RDS database. In the Connectivity & security area, select the VPC link to find its IP range (IPv4 CIDR).
In the Your VPCs area, your IP range is shown in the IPv4 CIDR column.
Tip
To perform this step via CLI, use the following command:
aws ec2 describe-vpcsFor more information, see ec2 — AWS CLI 1.19.105 Command Reference (amazon.com).
Create a Security Group for this IP range.
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ and navigate to Security Groups.
Select Create security group and then create your security group, making sure to include the following details:
- Security group name: Enter a meaningful name
- Description: Enter a description for your security group
- VPC: Select your RDS database VPC
Under Inbound rules, select Add rule and enter the following details:
- Type: Select Custom TCP
- Port range: Enter your RDS database port
- Source: Select Custom and enter the VPC IP range from the previous step.
Scroll to the bottom of the page and select Create security group.
Associate the new security group to RDS.
In Amazon RDS, navigate to your RDS database, and select Modify.
Scroll down to the Connectivity section, and in the Security group field, add the new security group that you created in the previous step. Then scroll down to the bottom of the page and select Continue.
In the Scheduling of modifications section, select Apply immediately to update the security group immediately.
Select Modify DB instance.
Tip
To perform this step via CLI, use the following commands:
aws ec2 create-security-group--description <value>--group-name <value>[--vpc-id <value>]For more information, see create-security-group — AWS CLI 1.19.105 Command Reference (amazon.com).
aws rds --db-instance-identifier <value> --vpc-security-group-ids <value>For more information, see modify-db-instance — AWS CLI 1.19.105 Command Reference (amazon.com).
Step 3: Create a target group
To create your target group in AWS:
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ and navigate to Load Balancing > Target Groups.
Select Create target group, and create your target group, making sure to include the following details:
- Target type: Select IP addresses (optional)
- Protocol: Select TCP
- Port: Enter your RDS database port
- VPC: Enter your RDS database VPC
Note
You can find the RDS database port and VPC values on your RDS database page, under Connectivity & security
When you’re done, select Next to continue.
In the Register targets page, enter your RDS database IP address, and then select Include as pending below.
After you see the new target listed in the Targets table, select Create target group at the bottom of the page.
Tip
To perform this step via CLI, use the following command:
aws elbv2 create-target-group --name <tg-name> --protocol <db-protocol> --port <db-port> --target-type ip --vpc-id <db-vpc-id>For more information, see create-target-group — AWS CLI 2.2.7 Command Reference (amazonaws.com).
aws elbv2 register-targets --target-group-arn <tg-arn> --targets Id=<db-ip>,Port=<db-port>For more information, see register-targets — AWS CLI 2.2.7 Command Reference (amazonaws.com).
Step 4: Create a load balancer
You can either create a new network load balancer to forward traffic to the RDS IP address, or add a new listener to an existing load balancer.
To create a network load balancer to forward traffic to the RDS IP address:
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ and navigate to Load Balancing > Load Balancers.
Select Create Load Balancer > Network Load Balancer and then select or enter the following values:
Scheme: Select Internal
VPC: Select your RDS database VPC
Mapping: Make sure that the RDS is defined for all AWS regions, and then make sure to select all of those regions. You can find this information in the Availability zone value on the RDS database page, on the Connectivity & security tab.
Listeners and Routing:
- Protocol: Select TCP
- Port: Select RDS DB port
- Default action: Select the target group created in the previous step
At the bottom of the page, select Create Load Balancer > View Load Balancers.
Wait few minutes and refresh the screen, until the State column of the new Load Balancer is Active.
Tip
To perform this step via CLI, use the following commands:
aws elbv2 create-load-balancer --name <lb-name> --type network --scheme internal --subnet-mappings SubnetId=<value>For more information, see create-load-balancer — AWS CLI 2.2.7 Command Reference (amazonaws.com).
aws elbv2 create-listener --load-balancer-arn <lb-arn> --protocol TCP --port 80 --default-actions Type=forward,TargetGroupArn=<tg-arn>For more information, see create-listener — AWS CLI 2.2.7 Command Reference (amazonaws.com).
To add a listener to an existing load balancer:
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ and navigate to Load Balancing > Load Balancers.
Select your load balancer > Listeners > Add listener.
On the Listeners tab, in the Protocol : port area, select TCP and enter a new port for your listener.
Tip
To perform this step via CLI, use the following command: aws elbv2 create-listener --load-balancer-arn <value> --protocol <value> --port <value> --default-actions Type=forward,TargetGroupArn=<target_group_arn>
For more information, see the AWS documentation.
Step 5: Create an endpoint service
After the Load Balancer is created and its State is Active you can create the endpoint service.
To create the endpoint service:
Open the Amazon VPC console at https://console.aws.amazon.com/vpc/ and navigate to Virtual Private Cloud > Endpoint Services.
Select Create Endpoint Service, and in the Available Load Balancers dropdown list, select the new load balancer created in the previous step, or the load balancer where you'd added a new listener.
In the Create endpoint service page, clear the selection for the Require acceptance for endpoint option.
At the bottom of the page, select Create Service > Close.
Back in the Endpoint services page:
- Select the new endpoint service you created.
- In the Allow principals tab, select Add principals.
- In the Principals to add > ARN field, enter
arn:aws:iam::181328463391:root. - Select Add principals.
Note
When adding an identity, use an asterisk (*****) to add permissions for all principals. This enables all principals, in all AWS accounts to create an endpoint to your endpoint service. For more information, see the AWS documentation.
Tip
To perform this step via CLI, use the following commands:
aws ec2 create-vpc-endpoint-service-configuration --network-load-balancer-arns <lb-arn> --no-acceptance-requiredFor more information, see create-vpc-endpoint-service-configuration — AWS CLI 2.2.7 Command Reference (amazonaws.com).
aws ec2 modify-vpc-endpoint-service-permissions --service-id <endpoint-service-id> --add-allowed-principals <purview-scanner-arn>For more information, see modify-vpc-endpoint-service-permissions — AWS CLI 2.2.7 Command Reference (amazonaws.com).
To copy the service name for use in Azure Purview:
After you’ve created your endpoint service, you can copy the Service name value in the Azure Purview portal, when registering your RDS database as Purview data source.
Locate the Service name on the Details tab for your selected endpoint service.
Tip
To perform this step via CLI, use the following command: Aws ec2 describe-vpc-endpoint-services
For more information, see describe-vpc-endpoint-services — AWS CLI 2.2.7 Command Reference (amazonaws.com).
Troubleshoot your VPC connection
This section describes common errors that may occur when configuring your VPC connection with Azure Purview, and how to troubleshoot and resolve them.
Invalid VPC service name
If an error of Invalid VPC service name or Invalid endpoint service appears in Azure Purview, use the following steps to troubleshoot:
Make sure that your VPC service name is correct. For example:
Make sure that the Microsoft ARN is listed in the allowed principals:
arn:aws:iam::181328463391:rootFor more information, see Step 5: Create an endpoint service.
Make sure that your RDS database is listed in one of the supported regions. For more information, see Purview scope for Amazon RDS.
Invalid availability zone
If an error of Invalid Availability Zone appears in Azure Purview, make sure that your RDS is defined for at least one of the following three regions:
- us-east-1a
- us-east-1b
- us-east-1c
For more information, see the AWS documentation.
RDS errors
The following errors may appear in Azure Purview:
Unknown database. In this case, the database defined doesn't exist. Check to see that the configured database name is correctFailed to login to the Sql data source. The given auth credential does not have permission on the target database.In this case, your username and password is incorrect. Check your credentials and update them as needed.
Next steps
Learn more about Azure Purview Insight reports: