Cross Project Integration
  • 18 Jul 2024
  • Dark
    Light
  • PDF

Cross Project Integration

  • Dark
    Light
  • PDF

Article summary

Overview

Dataloop allows users with a GCP project to integrate their GCS bucket with the Dataloop platform and create datasets. The GCP Cross Project integration stands as the best choice for GCP due to its significant advantages.

Important

Organizations can create up to 3 Cross Projects integrations.
To increase the number of Cross Projects integrations for your organization, please contact our Customer support team.

The GCP cross-project integration process involves:


Create a Cloud Storage Bucket

  1. Log in to Google Cloud Console.
  2. From the left portal menu, go to Cloud Storage > Buckets.
  3. Click Create bucket.
  4. Enter a Name for the storage bucket.
  5. Click Continue.
  6. In the Choose where to store your data, select a location for the bucket.
  7. Click Continue.
  8. Click Create bucket.
Note: For all other optional settings, use the default values.

Create an IAM Role

  1. Log in to the Google Cloud Console.
  2. From the left portal menu, select the IAM & admin > Roles.
  3. Click +Create Role.
  4. Enter a role Title.
  5. (Optional) Enter a Description.
  6. (Optional) Enter a role ID. By default, the role ID is generated.
  7. Click +Add Permissions, then search for and add the following permissions:
    1. storage.objects.create
    2. storage.objects.delete
    3. storage.objects.get
    4. storage.objects.list
    5. storage.buckets.get
    6. storage.buckets.getIamPolicy
Note:
  • The storage.objects.delete permission allows the Dataloop platform to delete dataset items. For more information, see the Downstream section.
  • The storage.buckets.getIamPolicy permission allows the Dataloop platform to validate that the integration was created successfully.
  1. Click Create.
Tip
  • See Create and manage custom roles for more information on creating an IAM role in GCP.
  • To display roles information, you must select a project in the Google Cloud Console. If not available, create a project.

Start the GCP Cross Project Integration on the Dataloop platform

Maximum GCP Cross Project Integrations

You can create only 3 GCP Cross Project Integrations.

  1. Log in to the Dataloop platform.
  2. From the left-side panel, select Data Governance.
  3. Click Create Integration. A pop-up window is displayed on the right-side.
  4. Integration Name: Enter a Name for the integration.
  5. Provider: Choose GCP from the list.
  6. Integration Type: Select the Cross Project from the list.
  7. Click Get Service Account. In case you already created one, you can choose from the list of created Service Accounts that have not been assigned to an integration.
  8. Service Account ID: Copy the Service Accounts ID (Email).
  9. Resource Name: Follow this steps and enter the resource name.
  10. Click Create Integration.

Grant Dataloop Service account permissions to access the Cloud Storage Bucket

  1. Log in to Google Cloud Console.
  2. From the left portal menu, go to Cloud Storage > Buckets.
  3. Select the Storage bucket for which you want to add permissions, and then click on it.
  4. Select the Permissions tab.
  5. Click Grant access. The Add Principals dialog box appears.
  6. Under the Add principals, add the Service account ID provided by the Dataloop platform.
  7. Under the Assign roles, choose custom and choose the role you recently created.
  8. Click Save. A confirmation message is displayed.
Tip: See Use IAM with buckets for more information on adding permissions to a storage bucket in GCP.

Complete the GCP Cross Project Integration on the Dataloop platform

  1. Log in to the Dataloop platform.
  2. Under the Resource name, provide the name of the bucket you wish to integrate its data with the Dataloop.
  3. Click Create. A confirmation message is displayed.

Create a GCS Storage Driver on the Dataloop Platform

For more information, see the Create a GCS Storage Driver on the Dataloop Platform topic.