- 12 Sep 2024
- Print
- DarkLight
- PDF
Overview
- Updated On 12 Sep 2024
- Print
- DarkLight
- PDF
Overview
Dataloop brings enterprise level performances for unstructured data management and versioning. Enables sub-second queries on millions of files by item attributes, item metadata, or user metadata.
The data management page now enables you to manage your datasets and storage drivers, as well as create integrations.
Data Management Features
The important features of the data management are listed below.
- Browser: Browse data from a user-friendly interface and it supports different view options.
- Thumbnails view with adjustable thumbnail size
- List view with file details
- Filters based on item data and annotation data.
- Direct DQL queries
- Save and reuse DQL queries
- Folders management: Create, rename, or delete folders
- File management: Move between folders, clone, delete
- Create models from selected data
- Create annotation or QA tasks from a selected data
- Trigger the selected data to a function (FaaS) or Pipeline
- View item metadata
- Item function executions log
- Export data (Item JSON file)
- Upload data (when using File system storage)
- Data Insights: The Insights Tab provides deep visibility into your annotations, offering features like an annotation location heat map, a histogram of annotation labels, and detailed attributes per label, among others.
- Data Clustering: Integrating clustering and visualization tools like UMAP, t-SNE, and PCA into the Dataloop platform enhances data analysis, enabling users to efficiently extract insights from complex datasets through a user-friendly interface.
- Cloud native: Ingest and sync from popular cloud storage providers, such as AWS, GCP, Azure, etc.
- Dataloop's and cloud storage: Optionally, upload file binaries to Dataloop, or sync cloud storage to Dataloop.
- Linked items: Create URL items without storing them on the Dataloop platform or even connecting to cloud storage.
- Metadata layer: Every item has metadata that is populated automatically with item-attributes when the item is added to a dataset. User metadata can be added anytime.
- DQL: Dataloop Query Language allows querying by:
- Item attributes: Mime type, file name, creation/update time, size, etc.
- Item metadata: Annotations, labels & attributes added to items, users working on items, etc.
- User metadata: Any context added to the item metadata, such as order-number, GEO location, camera number, etc.
- Performance: Sub-second queries on millions of files by item attributes, item metadata, or user metadata.
- Version control: Clone and Merge actions to version the data accordingly with the model version.
- Privacy: Meet data privacy standards
- Developer tools: All Data-management actions are available from API and SDK interfaces, such as DQL filters, versioning control, import, export, etc.
Data Management Resource Creation
The Data Management Resource Creation feature of Dataloop enables you to create Integrations, Storage Drivers, and Datasets (both internal and cloud storage) all in one place, streamlining the process and eliminating the need to navigate multiple locations.
To access the Data Management Resource Creation feature:
- Open the Data page.
- Click Create Dataset. The Data Management Resource Creation window will be displayed on the right-side, where you can view Integrations, Storage Drivers, and Datasets sections.
Data Management Page
The Data Management page displays Datasets and Storage Drivers available in your project by tabs and enable a more provider-focused view.
The common features of Data Management page for both Datasets and Storage Drivers tabs are:
- Create Dateset
- Create Storage Driver
- Create Integration
- SDK: It displays SDK codes for creating Datasets and Storage Driver based on your tab selection.
- For the Datasets tab: The system displays codes based on the selected internal or external storage provider.
- Internal Storage Based Dataset
- External Storage Based Dataset
- For the Storage Drivers tab: The system displays codes based on the external storage driver.
- AWS
- GCP
- Azure
- For the Datasets tab: The system displays codes based on the selected internal or external storage provider.
- Refresh tabs
- Pagination
The main sections of the Data Management page are explained below.
Section 1: Datasets and Storage Drivers tabs
The Data Management page displays Datasets and Storage Drivers available for your project in a list view. By default, the Datasets tab is displayed and the search & filter criteria are also displays according to the Datasets.
Section 2: Total Numbers of Datasets, Drivers, and Items
Data Management page displays the number of Datasets, Storage Drivers and the total number of items available in all the datasets.
Section 3: Search and Filter
By default, the Datasets tab is displayed and the search & filter criteria are also displays according to the Datasets. The following list provides the specific criteria of search and filters for both Datasets and Storage Drivers:
Datasets
- To search: You can search datasets by Dataset Name.
- To Filter: You can filter the listed datasets by the following criteria:
- Type: The type of the datasets, whether the dataset is cloned, merged, or the original (master).
- Master
- Clone
- Merge
- Provider: The available storage providers for the datasets.
- Dataloop
- AWS
- GCP
- Azure
- Driver Type: The type of driver used from the storage provider.
- File System
- S3 Bucket
- GCS Bucket
- Blob Storage
- Data Lake Storage Gen2
- Type: The type of the datasets, whether the dataset is cloned, merged, or the original (master).
Storage Drivers
- To search: You can search storage drivers by Driver Name.
- To Filter: You can filter the listed drivers by the following criteria:
- Provider:
- AWS
- GCP
- Azure
- Driver Type:
- S3 Bucket
- GCS Bucket
- Blob Storage
- Data Lake Storage Gen2
- Provider:
Section 4: List of Datasets and Storage Drivers
The Data Management page displays available Datasets and Storage Drivers in your project in a list view. The column values are populated according to the datasets and storage drivers. The following tables provide the available columns for both datasets and storage drivers.
Datasets
Column Name | Description |
---|---|
Provider | It displays the name of the storage provider. |
Dataset Name | The name of the dataset. Clicking on it will open the Data Browser page. |
Items | The number of items available in the dataset. |
Feature Sets | The number of Feature Sets available in the dataset. |
Annotated | It displays the percentage of items that are annotated. |
Type | It displays the type of the dataset, whether it is master (original), cloned, or merged. |
Driver Type | It displays the name of the storage driver type. |
Open Tasks | It displays the number of the tasks that are open. |
Created at | The creation date of the dataset. |
Created by | The Avatar of the user who created the dataset. You can see the email ID of the user when you hover. |
Storage Drivers
Column Name | Description |
---|---|
Provider | It displays the name of the storage provider. |
Driver Name | The name of the storage driver. Click on the Copy Driver ID to copy it. |
Driver Type | It displays the name of the storage driver type. |
Resource Name | It displays the name of the driver type. |
Integration Name | It displays the name of the integration you created on the Dataloop platform. |
Created at | The creation date of the storage driver. |
Created by | The Avatar of the user who created the storage driver. You can see the email ID of the user when you hover. |
Section 5: Dataset and Storage Driver Actions
Data Management page allows you to execute various tasks specific to your datasets and storage drivers. The following content provides the available actions for your datasets and datasets without even going to their detailed pages.
Datasets
- Merge Datasets: It allows you to clone two or more datasets after entering necessary details on the Merge Datasets window.
- Upload Items: Clicking on the Upload Items icon allows you to upload files and folders to the selected dataset.
- Dataset Recipe: Clicking on the Dataset Recipe icon allows you to open to make changes on the Recipe page.
- Dataset Analytics: Clicking on the Dataset Analytics icon allows you to open and view the Analytics page of the selected dataset.
When you click on the Ellipsis (three dots) icon, the following options are displayed. Clicking on the link provides you with more information.
- Rename Dataset: It allows you to rename the dataset.
- Copy Dataset ID: It allows you to copy the ID of the dataset.
- Download Data: It allows you to download the dataset after entering necessary details on the Export window.
- Clone Dataset: It allows you to clone two or more datasets after entering necessary details on the Clone Datasets/Items window.
- Open Annotation Studio: It opens the annotation studio based on the item type, including audio, video, image, etc.
- Switch Recipe: It allows you to select a new recipe for your dataset.
- Rescan Cloud Storage: It allows you to sync your data on your cloud storage driver and Dataloop's storage.
- Delete Dataset: It allows you to delete the selected dataset.
Storage Drivers
- Edit Storage Driver: Clicking on the Edit icon allows you to edit storage driver details including driver name, allow deleting items, etc.
- Delete Storage Driver: Clicking on the Delete icon allows you to delete the selected storage driver.
Data Management Specifications
Cloud Providers & Features
Cloud Provider | Resource Type | Integration Type |
---|---|---|
AWS | S3 Bucket | Cross Account |
AWS | S3 Bucket | Access Key |
AWS | S3 Bucket | STS |
GCP | GCS Bucket | Private Key |
GCP | GCS Bucket | Cross Project |
Azure | Blob | Client Secret |
Azure | Datalake Gen2 | Client Secret |
- Dataloop supports sub-folder specific access in buckets, which offers security and versatility in managing your data
Find the Dataloop Specifications in the specifications page.