Overview
  • 24 Dec 2024
  • Dark
    Light
  • PDF

Overview

  • Dark
    Light
  • PDF

Article summary

Overview

Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.

The data management page now enables you to manage your datasets, storage drivers, feature sets (embeddings), and create integrations.


Data Management Features

The important features of data management are listed below:

Browser

A user-friendly interface to browse and manage data with the following capabilities:

View Options:

  • Thumbnails view with adjustable thumbnail size.
  • List view with file details.

Filters: Based on item data and annotation data.

DQL Queries:

  • Direct DQL queries.
  • Save and reuse DQL queries.

Folders Management

Create, rename, or delete folders.

File Management

  • Move files between folders.
  • Clone files.
  • Delete files.

Task and Function Integration

  • Create models from selected data.
  • Create annotation or QA tasks from selected data.
  • Trigger selected data to an application or pipeline.

Item Management

  • View item metadata.
  • Access item function execution logs.
  • Export data (Item JSON file).
  • Upload data (when using file system storage).

Data Clustering

Integrate clustering and visualization tools (e.g., UMAP) for efficient analysis and insights extraction via a user-friendly interface.

Data Insights

Access the Insights Tab for detailed analysis:

  • Annotation location heat map.
  • Histogram of annotation labels.
  • Detailed attributes per label.

Data Cleanup

Access the Data Cleanup Tab for efficient data quality management:

  • Duplicate Item Detection: Identify and manage duplicate items in your dataset.
  • Missing Annotations: Highlight and address items without annotations.
  • Unlabeled Data: Detect and clean up items without any assigned labels.
  • Annotation Overlap: Identify and resolve overlapping or conflicting annotations.
  • Metadata Consistency: Check for and correct inconsistent metadata across items.

Cloud-Native Support

  • Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.

  • Dataloop and Cloud Storage Options:

    • Upload file binaries to Dataloop (optional).
    • Sync cloud storage directly to Dataloop.

Linked Items

Create URL items without storing them on the Dataloop platform or connecting to cloud storage.

Metadata Layer

  • Automatically populated metadata with item attributes when added to a dataset.
  • Add custom user metadata anytime.

DQL (Dataloop Query Language)

Query by:

  • Item Attributes: Mime type, file name, creation/update time, size, etc.
  • Item Metadata: Annotations, labels & attributes, users working on items, etc.
  • User Metadata: Contextual data like order number, GEO location, camera number, etc.

Performance: Sub-second queries on millions of files by attributes or metadata.

Version Control

Clone and merge actions to version data in alignment with model versions.

Privacy

Comply with data privacy standards.

Developer Tools

Access all data management actions through API and SDK interfaces, including:

  • DQL filters.
  • Version control.
  • Data import/export.

Data Management Page

The Data Management page displays Datasets, Storage Drivers, and Embeddings available in your project by tabs and enable a more provider-focused view.

The common features available in the Data Management page are:


The Main Sections of the Data Management Page

The main sections of the Data Management page are explained in the below pages:


Data Management Resource Creation

The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.

To access the Data Management Resource Creation feature:

  1. Open the Data page.
  2. Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.

Data Management Specifications

Cloud Providers & Features

Cloud ProviderResource TypeIntegration Type
AWSS3 BucketCross Account
AWSS3 BucketAccess Key
AWSS3 BucketSTS
GCPGCS BucketPrivate Key
GCPGCS BucketCross Project
AzureBlobClient Secret
AzureDatalake Gen2Client Secret
  • Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Platform Specifications

Find the Dataloop Specifications in the specifications page.



What's Next