Integrations
  • 21 Jan 2025
  • Dark
    Light
  • PDF

Integrations

  • Dark
    Light
  • PDF

Article summary

Overview

Integrations are your secure connections to a variety of data providers, which include cloud providers like AWS, GCP, or Azure. With multiple ways to connect to each provider, they offer flexibility while ensuring your data's security. They are responsible for the crucial tasks of authentication, authorization, and the secure storage of secrets. This ensures that your data is not only accessible but also well-protected. With a focus on adaptability and security, these connections provide you with the reliability necessary for safe data management, while strictly adhering to the highest standards of data privacy and compliance.

The Dataloop platform offers two types of integrations: Storage Integration and Registry Integration.


Access Integrations

Access the data Integrations page by clicking on the Integrations from the left-side panel. The Integrations page displays Integrations available in your organization.

The main sections of the Integrations page are explained below.

Section 1: Total Numbers of Integrations, Secrets, and Alerts

Integrations page displays number of Integrations, Secrets, and alerts of integrations that are untrusted.

Section 2: Integrations tab

Integrations tab displays the list of Integrations available in your organization. The features of Integrations page for the Integrations tab is:

  • Create Storage and Registry Integrations.
  • Create Secrets
  • SDK: It displays SDK codes for creating Integrations and Secrets. When you click on the SDK, the following sub-menus are displayed:
    • Integration SDK: The system displays SDK codes to create Integrations using the SDK.
      • AWS
      • GCP
      • Azure
    • Secret SDK: The system displays SDK codes to create Secrets using the SDK.
  • Refresh tabs

Section 3: Search and Filter

The following list provides the specific criteria of search and filters for Integrations:

  • To search: Search Integrations by Integrations Name.
  • To Filter: Filter the listed integrations by the following criteria:
    • Provider: The available storage providers for the datasets.
      • AWS
      • Azure
      • GCP
      • Dataloop
    • Type: The type of the Integrations.
      • AWS
        • Cross Account
        • Access Key
        • STS
        • Container Registry
      • GCP
        • Cross Project
        • Private Key
        • Container Registry
      • Azure
        • Client Secret
    • Untrusted Integrations
    • Filter integrations based on the email ID of the creator.

Section 4: List of Integrations

Integrations page displays available Integrations in your organization in a list view. The column values are populated according to the available Integrations.

Column NameDescription
Provider iconThe icon of the cloud storage provider.
Integration NameThe name of the integrations.
Integration TypeThe type of the integration depends on the cloud storage provider.
CompletedIt displays whether the necessary steps are completed to establish trust on the Integration. A green-tick mark is displayed if the integration is trusted. If not, a warning icon with a tool-tip explaining the current status is displayed.
Created AtThe creation date of the Integration.
Created ByThe Avatar of the user who created the Integration. You can see the email ID of the user when you hover.

Click on the More actions (three dots) to view and perform the following actions:

  • Rename Integration
  • Edit Integration: When you click, you can modify Access Key integrations' Key and Secret values.
    • Container Registry integration is editable only from Dataloop SDK.
    • You cannot edit cross integration.
  • Copy Integration ID
  • Delete Integration

Storage Integration

This integration allows Dataloop to connect to external storage services, such as cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage) or on-premises systems, for seamless access to datasets and files.

  • You can manage and import/export datasets between Dataloop and external storage systems.
  • This is especially important for handling large volumes of data used in machine learning and AI projects.

Refer to the following articles to use various types of cloud storage providers:

AWS

GCP

Azure


Docker Registry Integration

This integration allows Dataloop to interact with Docker registries, enabling the platform to pull and push Docker images. These images can contain models, training scripts, or any other components required for running tasks or deployments.

It simplifies the deployment of machine learning models or services that are containerized, making it easier to deploy workflows or production environments directly from Docker containers within Dataloop.


Use Docker Registry Integrations

Dataloop enables you to utilize your private Docker images stored in cloud-based Docker registries such as AWS Elastic Container Registry (ECR), Google Container Registry (GCR), and Google Artifact Registry (GAR):

Getting Started with Docker Registry Integration

To begin, integrate your Docker registries in Dataloop with services like AWS ECR, GCP Google Artifact Registry, or GCP Google Container Registry. For detailed instructions, refer to the following guides:

Linking Your Private Docker Registry

Once the integration is complete, follow these steps to create an application and connect your private Docker registry:

  1. Navigate to the Marketplace and click Create an application.
  2. In the App Config section, update the Docker Image with the URL of your private Docker image registry.

Your application will now be connected to your private Docker registry and ready for use.

Good to Know

AWS: Why choosing Cross Account Integration Over Other Integrations is Better?

  • The Least privilege principle: AWS Cross-Account integration empowers you to grant finely-tuned IAM roles and permissions to users or service accounts across different AWS accounts. This enables you to restrict access to specific actions and resources, thereby minimizing the attack surface in the event of a compromise.

  • Rotational capability: With AWS Cross-Account, you have the flexibility to effortlessly rotate IAM roles. This facilitates periodic refreshment of the credentials used by a third-party service, a significantly more secure approach compared to the use of long-lived access keys that persist until you revoke them.

  • Better auditability: AWS Cross-Account integration produces comprehensive logs of all actions taken by the third-party service, facilitating more effective monitoring and detection of any suspicious activity. This gives you better visibility and control over third-party access to your AWS resources.

  • Separation of responsibilities: The Cross-Account feature in AWS allows you to maintain control over your AWS accounts and delegate specific tasks to third-party services without exposing your credentials. This ensures that third-party services only have access to the resources they require for their tasks, thus minimizing the risk of credential theft or misuse.

In conclusion, the use of AWS Cross-Account integration instead of Access Key or STS integrations offers a more secure and flexible mechanism for granting access to third-party services in AWS. It allows you to implement the principle of least privilege, easily rotate credentials, and provide better auditability and separation of responsibilities.

To learn how to do this, see the AWS Cross Account Integration article.


GCP: Why choosing Cross Project Integration over other integrations is better

  • The Least privilege principle: Cross-Project access in GCP allows you to grant fine-grained IAM roles and permissions to users or service accounts across different GCP projects. This means you can restrict access to specific actions and resources, reducing the attack surface in case the third-party service is compromised.

  • Rotational capability: Cross-Project access in GCP enables you to easily rotate IAM roles, allowing you to periodically refresh the credentials used by a third-party service. This is a more secure approach than using long-lived access keys that persist until you revoke them.

  • Better Audibility: Cross-Project access in GCP generates detailed logs of all actions taken by the third-party service, making it easier to monitor and detect suspicious activity. This provides better visibility and control over third-party access to your GCP resources.

  • Separation of responsibilities: The Cross-Project access in GCP enables you to maintain control over your GCP projects and delegate specific tasks to third-party services without exposing your credentials. This ensures that third-party services only have access to the resources they need to perform their tasks and reduces the risk of credential theft or misuse.

In conclusion, the use of GCP Cross-Project integration instead of Private Key integration offers a more secure and flexible mechanism for granting access to third-party services in GCP. It allows you to implement the principle of least privilege, easily rotate credentials, and provide better auditability and separation of responsibilities.

To learn how to do this, see the GCP Cross Project Integration article.


Important considerations when setting up external cloud storage

  • Consider storing your files in a region close to your annotators, for faster file serving. In annotation work, files are streamed from your storage directly to the end user, without having to go through Dataloop servers first, so faster serving can be key for efficient work.
  • Write access is required, to allow saving thumbnails, modalities, and converted files to a hidden 'dataloop' folder on your storage. A permission "test-file" will be written to your storage when the platform validates permissions.
  • Annotations and metadata are stored in the Dataloop platform: if you delete a file from your external storage, you'll need to trigger a file delete process in Dataloop, or setup Upstream-sync in advance to ensure these events are covered.

Integration Validation for all Supported Cloud Integrations

  • All supported integrations will now be validated for actual resource creation on the customer's side
  • This does not impact existing storage integrations. Only applicable to new integrations.
  • Supported on:
    • GCP: Cross Project, Private Key
    • Azure: Client Secret
    • AWS: Cross Account, STS

Supports Duplicate Naming Enforcement for New Integrations & Secrets

  • Newly created integrations and secrets will now be enforced to be unique by name
  • This does not impact existing integrations and secrets. Only applicable to new integrations and secrets.