Google Data Catalog and Cloud Dataprep Tags

Create or update Google Cloud Data Catalog tags on BigQuery tables with Cloud Dataprep Metadata and Column’s Profile via a Cloud Function.

The 2 Data Catalog tags created or updated:

To activate, learn and use Cloud Data Catalog, go to https://cloud.google.com/data-catalog and https://console.cloud.google.com/datacatalog.

This repository contains the Cloud Function Python code triggered from a Dataprep Webhook to create or update 2 Data Catalog tags.

This Cloud Function uses:

In your Cloud Function, you need the 5 files:

Before running the Cloud Function (and create or update tags), you need to create the 2 Data Catalog Tag Templates for Dataprep (Job Metadata and Job Column Profile). You can use:

To use the Cloud Function you just have to pass the Dataprep Job ID in a JSON format like {"job_id":"7827359"}.

And to trigger it from a Cloud Dataprep flow, you can use a Webhook on the Cloud Function endpoint with {"job_id":"$jobId"} in the POST body.

When Data Catalog template tags are created and when tags are created or updated on BigQuery tables, you can find all results from https://console.cloud.google.com/datacatalog.

Finally, you can also search BigQuery tables in Cloud Data Catalog with a Dataprep tag from your own application like https://github.com/victorcouste/dataprep-datacatalog-explorer


Happy wrangling and tagging !


image

image

image

image

image