Overview
  • 26 Nov 2024
  • Dark
    Light
  • PDF

Overview

  • Dark
    Light
  • PDF

Article summary

Overview

Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.

The data management page now enables you to manage your datasets, storage drivers, feature sets (embeddings), and create integrations.


Data Management Features

The important features of the data management are listed below.

  • Browser: Browse data from a user-friendly interface and it supports different view options.
    • Thumbnails view with adjustable thumbnail size
    • List view with file details
    • Filters based on item data and annotation data.
    • Direct DQL queries
    • Save and reuse DQL queries
    • Folders management: Create, rename, or delete folders
    • File management: Move between folders, clone, delete
    • Create models from selected data
    • Create annotation or QA tasks from a selected data
    • Trigger the selected data to a function (FaaS) or Pipeline
    • View item metadata
    • Item function executions log
    • Export data (Item JSON file)
    • Upload data (when using File system storage)
  • Data Insights: The Insights Tab provides deep visibility into your annotations, offering features like an annotation location heat map, a histogram of annotation labels, and detailed attributes per label, among others.
  • Data Clustering: Integrating clustering and visualization tools like UMAP, t-SNE, and PCA into the Dataloop platform enhances data analysis, enabling users to efficiently extract insights from complex datasets through a user-friendly interface.
  • Cloud native: Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
  • Dataloop's and cloud storage: Optionally, upload file binaries to Dataloop, or sync cloud storage to Dataloop.
  • Linked items: Create URL items without storing them on the Dataloop platform or even connecting to cloud storage.
  • Metadata layer: Every item has metadata that is populated automatically with item-attributes when the item is added to a dataset. User metadata can be added anytime.
  • DQL: Dataloop Query Language allows querying by:
    • Item attributes: Mime type, file name, creation/update time, size, etc.
    • Item metadata: Annotations, labels & attributes added to items, users working on items, etc.
    • User metadata: Any context added to the item metadata, such as order-number, GEO location, camera number, etc.
  • Performance: Sub-second queries on millions of files by item attributes, item metadata, or user metadata.
  • Version control: Clone and Merge actions to version the data accordingly with the model version.
  • Privacy: Meet data privacy standards
  • Developer tools: All Data-management actions are available from API and SDK interfaces, such as DQL filters, versioning control, import, export, etc.

Data Management Resource Creation

The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.

To access the Data Management Resource Creation feature:

  1. Open the Data page.
  2. Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.

Data Management Page

The Data Management page displays Datasets, Storage Drivers, and Features Sets (Embeddings) available in your project by tabs and enable a more provider-focused view.

The common features of Data Management page for both Datasets and Storage Drivers tabs are:

  • Create Dateset
  • Create Storage Driver
  • Create Integration
  • SDK: It displays SDK codes for creating Datasets and Storage Driver based on your tab selection.
    • For the Datasets tab: The system displays codes based on the selected internal or external storage provider.
      • Internal Storage Based Dataset
      • External Storage Based Dataset
    • For the Storage Drivers tab: The system displays codes based on the external storage driver.
      • AWS
      • GCP
      • Azure
  • Refresh tabs
  • Pagination

The Main Sections of the Data Management Page

The main sections of the Data Management page are explained in the below pages:


Data Management Specifications

Cloud Providers & Features

Cloud ProviderResource TypeIntegration Type
AWSS3 BucketCross Account
AWSS3 BucketAccess Key
AWSS3 BucketSTS
GCPGCS BucketPrivate Key
GCPGCS BucketCross Project
AzureBlobClient Secret
AzureDatalake Gen2Client Secret
  • Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Platform Specifications

Find the Dataloop Specifications in the specifications page.



What's Next