- 24 Dec 2024
- Print
- DarkLight
- PDF
Overview
- Updated On 24 Dec 2024
- Print
- DarkLight
- PDF
Overview
Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.
The data management page now enables you to manage your datasets, storage drivers, feature sets (embeddings), and create integrations.
Data Management Features
The important features of data management are listed below:
Browser
A user-friendly interface to browse and manage data with the following capabilities:
View Options:
- Thumbnails view with adjustable thumbnail size.
- List view with file details.
Filters: Based on item data and annotation data.
DQL Queries:
- Direct DQL queries.
- Save and reuse DQL queries.
Folders Management
Create, rename, or delete folders.
File Management
- Move files between folders.
- Clone files.
- Delete files.
Task and Function Integration
- Create models from selected data.
- Create annotation or QA tasks from selected data.
- Trigger selected data to an application or pipeline.
Item Management
- View item metadata.
- Access item function execution logs.
- Export data (Item JSON file).
- Upload data (when using file system storage).
Data Clustering
Integrate clustering and visualization tools (e.g., UMAP) for efficient analysis and insights extraction via a user-friendly interface.
Data Insights
Access the Insights Tab for detailed analysis:
- Annotation location heat map.
- Histogram of annotation labels.
- Detailed attributes per label.
Data Cleanup
Access the Data Cleanup Tab for efficient data quality management:
- Duplicate Item Detection: Identify and manage duplicate items in your dataset.
- Missing Annotations: Highlight and address items without annotations.
- Unlabeled Data: Detect and clean up items without any assigned labels.
- Annotation Overlap: Identify and resolve overlapping or conflicting annotations.
- Metadata Consistency: Check for and correct inconsistent metadata across items.
Cloud-Native Support
Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
Dataloop and Cloud Storage Options:
- Upload file binaries to Dataloop (optional).
- Sync cloud storage directly to Dataloop.
Linked Items
Create URL items without storing them on the Dataloop platform or connecting to cloud storage.
Metadata Layer
- Automatically populated metadata with item attributes when added to a dataset.
- Add custom user metadata anytime.
DQL (Dataloop Query Language)
Query by:
- Item Attributes: Mime type, file name, creation/update time, size, etc.
- Item Metadata: Annotations, labels & attributes, users working on items, etc.
- User Metadata: Contextual data like order number, GEO location, camera number, etc.
Performance: Sub-second queries on millions of files by attributes or metadata.
Version Control
Clone and merge actions to version data in alignment with model versions.
Privacy
Comply with data privacy standards.
Developer Tools
Access all data management actions through API and SDK interfaces, including:
- DQL filters.
- Version control.
- Data import/export.
Data Management Page
The Data Management page displays Datasets, Storage Drivers, and Embeddings available in your project by tabs and enable a more provider-focused view.
The common features available in the Data Management page are:
- Create Dateset
- Create Storage Driver
- Create Integration
- SDK: It displays SDK codes for creating Datasets and Storage Driver based on your tab selection.
- For the Datasets tab: The system displays codes based on the selected internal or remote storage provider.
- For the Storage Drivers tab: The system displays codes based on the external storage driver.
- AWS
- GCP
- Azure
- Refresh tabs
- Pagination
The Main Sections of the Data Management Page
The main sections of the Data Management page are explained in the below pages:
Data Management Resource Creation
The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.
To access the Data Management Resource Creation feature:
- Open the Data page.
- Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.
Data Management Specifications
Cloud Providers & Features
Cloud Provider | Resource Type | Integration Type |
---|---|---|
AWS | S3 Bucket | Cross Account |
AWS | S3 Bucket | Access Key |
AWS | S3 Bucket | STS |
GCP | GCS Bucket | Private Key |
GCP | GCS Bucket | Cross Project |
Azure | Blob | Client Secret |
Azure | Datalake Gen2 | Client Secret |
- Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Find the Dataloop Specifications in the specifications page.