- 26 Nov 2024
- Print
- DarkLight
- PDF
Overview
- Updated On 26 Nov 2024
- Print
- DarkLight
- PDF
Overview
Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.
The data management page now enables you to manage your datasets, storage drivers, feature sets (embeddings), and create integrations.
Data Management Features
The important features of the data management are listed below.
- Browser: Browse data from a user-friendly interface and it supports different view options.
- Thumbnails view with adjustable thumbnail size
- List view with file details
- Filters based on item data and annotation data.
- Direct DQL queries
- Save and reuse DQL queries
- Folders management: Create, rename, or delete folders
- File management: Move between folders, clone, delete
- Create models from selected data
- Create annotation or QA tasks from a selected data
- Trigger the selected data to a function (FaaS) or Pipeline
- View item metadata
- Item function executions log
- Export data (Item JSON file)
- Upload data (when using File system storage)
- Data Insights: The Insights Tab provides deep visibility into your annotations, offering features like an annotation location heat map, a histogram of annotation labels, and detailed attributes per label, among others.
- Data Clustering: Integrating clustering and visualization tools like UMAP, t-SNE, and PCA into the Dataloop platform enhances data analysis, enabling users to efficiently extract insights from complex datasets through a user-friendly interface.
- Cloud native: Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
- Dataloop's and cloud storage: Optionally, upload file binaries to Dataloop, or sync cloud storage to Dataloop.
- Linked items: Create URL items without storing them on the Dataloop platform or even connecting to cloud storage.
- Metadata layer: Every item has metadata that is populated automatically with item-attributes when the item is added to a dataset. User metadata can be added anytime.
- DQL: Dataloop Query Language allows querying by:
- Item attributes: Mime type, file name, creation/update time, size, etc.
- Item metadata: Annotations, labels & attributes added to items, users working on items, etc.
- User metadata: Any context added to the item metadata, such as order-number, GEO location, camera number, etc.
- Performance: Sub-second queries on millions of files by item attributes, item metadata, or user metadata.
- Version control: Clone and Merge actions to version the data accordingly with the model version.
- Privacy: Meet data privacy standards
- Developer tools: All Data-management actions are available from API and SDK interfaces, such as DQL filters, versioning control, import, export, etc.
Data Management Resource Creation
The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.
To access the Data Management Resource Creation feature:
- Open the Data page.
- Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.
Data Management Page
The Data Management page displays Datasets, Storage Drivers, and Features Sets (Embeddings) available in your project by tabs and enable a more provider-focused view.
The common features of Data Management page for both Datasets and Storage Drivers tabs are:
- Create Dateset
- Create Storage Driver
- Create Integration
- SDK: It displays SDK codes for creating Datasets and Storage Driver based on your tab selection.
- For the Datasets tab: The system displays codes based on the selected internal or external storage provider.
- Internal Storage Based Dataset
- External Storage Based Dataset
- For the Storage Drivers tab: The system displays codes based on the external storage driver.
- AWS
- GCP
- Azure
- For the Datasets tab: The system displays codes based on the selected internal or external storage provider.
- Refresh tabs
- Pagination
The Main Sections of the Data Management Page
The main sections of the Data Management page are explained in the below pages:
Data Management Specifications
Cloud Providers & Features
Cloud Provider | Resource Type | Integration Type |
---|---|---|
AWS | S3 Bucket | Cross Account |
AWS | S3 Bucket | Access Key |
AWS | S3 Bucket | STS |
GCP | GCS Bucket | Private Key |
GCP | GCS Bucket | Cross Project |
Azure | Blob | Client Secret |
Azure | Datalake Gen2 | Client Secret |
- Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Find the Dataloop Specifications in the specifications page.