Overview
  • 28 Apr 2025
  • Dark
    Light
  • PDF

Overview

  • Dark
    Light
  • PDF

Article summary

Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.
The data management page now enables you to manage your datasets, storage drivers, feature sets (embeddings), and create integrations.


Key Features

Dataloop's Data Management offers a powerful and user-friendly platform to organize, query, and manipulate data efficiently. Key capabilities include:

  • Browser & File Management: Browse datasets with filters, manage folders, move/clone/delete files.
  • Task Integration: Quickly create models, annotation tasks, or trigger pipelines directly from selected data.
  • Metadata & Item Management: Access item metadata, logs, and perform data import/export.
  • Data Analysis & Cleanup: Use clustering tools (e.g., UMAP) for insights, and maintain data quality by detecting duplicates, missing annotations, unlabeled data, and metadata issues.
  • Embeddings: Access the Embeddings to visualize and interact with numerical feature sets derived from datasets. These embeddings help models understand and learn from the data effectively.
  • Cloud Support: Sync data from cloud providers (AWS, GCP, Azure) or upload directly; also supports linked (URL-only) items.
  • Advanced Querying (DQL): Execute rapid queries across millions of files based on attributes and metadata.
  • Version Control & Privacy: Manage dataset versions and ensure compliance with privacy standards.
  • Developer Access: All features are accessible via API and SDK, with provided code samples for quick integration.
  • Data Management Page Layout: Manage Datasets, Storage Drivers, and Embeddings through dedicated tabs for a more organized, provider-focused view.

Resource Creation

The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.

To access the Data Management Resource Creation feature:

  1. Open the Data page.
  2. Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.

Cloud Providers & Features

Cloud ProviderResource TypeIntegration Type
AWSS3 BucketCross Account
AWSS3 BucketAccess Key
AWSS3 BucketSTS
GCPGCS BucketPrivate Key
GCPGCS BucketCross Project
AzureBlobClient Secret
AzureDatalake Gen2Client Secret

Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data. Learn more about the Specifications.