Dataset Browser
  • 21 Jan 2025
  • Dark
    Light
  • PDF

Dataset Browser

  • Dark
    Light
  • PDF

Article summary

Overview

The dataset browser allows you to explore and navigate through dataset items. It typically provides a user-friendly interface for searching, visualizing, and accessing data within a dataset. You can filter, sort, and view the dataset items, including images, text, audio, video, LiDAR, etc. making it easier to analyze and work with large volumes of data.


Upload Items

Dataloop enables you to seamlessly manage datasets by either directly uploading items into the Dataloop file system or integrating external storage solutions. This provides flexibility in organizing and accessing your data within the platform.


The Dataset Browser page features are explained in the following sections (marked on the screenshot):


Section 1: Access the Dataset Browser

Access the dataset browser using one of the following options:

  • In the Dataloop left-side menu, select Data.
    • For existing datasets, double-click on the name datasets to open the dataset browser.
    • For a new dataset, see the Create a dataset article.
  • In the Project Overview > Data Management widget, click on your Dataset name.
User Roles and Permissions

In the Dataset browser, you can carry out a range of functions. Make sure you refer to the Roles and Permissions article to understand the user roles and permissions.


Thumbnail or Icons View

  • Initially set to Thumbnail view, showcasing smaller versions of items in a dataset includes images, documents, and video files arranged in a preview grid.
  • Audio, Lidar, and RLHF files are represented by a thumbnail indicating the data type, without an actual preview.
  • Items are organized by their latest upload date.
  • Allows for quick browsing and selection of specific content without needing to open each item.

In the Thumbnail view, you can:

  • Adjust the thumbnail size using the slider control at the bottom left of the page.
  • Use smaller thumbnails to view more items on a page.
  • By default, the Show File Name option in the Settings is enabled to show the file names. You can toggle it to hide the file names.
  • Use the Sort By option to sort the items in the dataset. By default, the items are displayed according to the Creation Date.
Items color-coded as Green

Dataloop displays items with distinct colors corresponding to their annotation status:

  • Green: The Green color indicates that the item is annotated.
  • No Color: The absence of color indicates that the item is not annotated.

Types of icons for Items

Each item in the Data Browser is represented by a specific thumbnail icon, determined by its type. The available types of icons are shown below:


Section 2: Search and Filter Items

The dataset browser filter enables you to refine datasets by filtering them based on:

  • The item features, such as filename, media type, etc.
  • Annotation features, such as filter by label, type, etc.
  • Task ID and Task Names
  • CLIP Based - Free Text Search

For more information about the Search Filter, see the Schema Based Search article.


Section 3: Items View Options

Folders View

You can utilize Items or Folders based view contexts to showcase items in the Dataset browser. When applying a filter, it is implemented within the scope chosen by the user, whether it's the entire dataset or a specific folder. By default, Folders view is displayed.

The dataset browser enables organizing file items in file-system-like:

  • Items based: By default, this view is displayed, and it shows all items regardless of their folder structure, enabling the application of filters and displaying all items at the Root Folder (Dataloop). When you select a folder, it shows Items Only, and it does Not show any sub-folders if available.
  • Folders based: It shows items based on the folders or subfolders you selected. When you select the Root Folder (Dataloop), it shows items and folders if available in the Root folder.

You can perform the following actions:

  • Click Folder based to view items and sub-folders in a folder.
  • Create folders: Select the root folder and click Add Folder or the new folder icon when you hover Root Folder (Dataloop).
  • Create Sub-Folders: Select the folder and click Add Folder or the new folder icon when you hover the selected folder.

  • Move items between folders:

    1. Select one or more items from the current page,
    2. Right-click and select File Actions > Move to Folder.
    3. Select the folder from the list.
    4. Click Move.
  • Edit the folder name, hover over the folder, and click on the Pencil icon.

  • Select a folder and right-click to:
    • Rename: Rename the selected folder.
    • Move: Moved items in the selected folder to another folder.
    • Copy item path: It copies the complete item path.
    • Create Trigger: It allows creating a trigger function for the selected folder items.
    • Delete: It allows you to delete the selected folder and items in the folder.

The Dataset browser offers two viewing options, Thumbnail and Details. By default, the thumbnail view is shown, and you can use the respective View icons to switch between these options.

Info

In case there are no items in the Dataset:

Details View

  • In this view, users can see a list of items with their associated details including File Name, File Created Date, Media Type, Item's annotation status, etc.
  • This makes it easier to access and review specific information about each item in a structured and organized manner.

In the Details view, you can:

  • Click on the checkbox next to the file name to select all the items in the current page.
  • Click on the Manage Columns to hide columns.
  • Sort items based on the Columns.

Common Features in the Thumbnail and Details Views

  • The number of selected items is highlighted.
  • Click on the Select All option to select all the items in the current page.
  • Total number of dataset items, and breadcrumbs navigation to give a clear path back to higher levels, such as sub-folder and folder.
  • Settings > Show Hidden Files: By default, the Show Hidden Files option is disabled. You can toggle it to show the hidden files.
  • Items per page: By default, 100 items are displayed per page. You can select 2, 25, 50, 100, 250, 500, and 1000 items per page.
  • Use the page navigation options to view next and previous pages, or enter a specific page number and click Go to view the page.

Collections View

The Collections feature in Dataloop's Data Browser helps streamline data management by allowing you to tag or group specific sets of items based on task needs (e.g., annotation, review, training).

Key Features of Collections in Data Browser

  • Selective Grouping: Choose specific items from a dataset to move into a collection based on criteria like image type, labeling status, or annotation requirements.
  • Easy Access: Collections provide a convenient way to quickly access and manage items that are relevant to a particular task, without sifting through the entire dataset.
  • Enhanced Collaboration: Collections can be shared with team members, allowing specific data subsets to be easily shared and collaboratively worked on without impacting the primary dataset.
  • Task-Specific Organization: Create collections based on different stages of your workflow, like "Pending Annotation," "Quality Review," or "Model Training Set," which helps keep the data organized according to your project’s progress.
  • Filter Collection Items Using Smart Search: Use the Items field in the smart search to apply a collection filter query to find specific items within a collection. For instance, using the query metadata.system.collections.c0 = true will filter items that are part of the first collection (.c0 ID for the firstly created collection folder and .c9 ID for the lastly created collection folder).

How to Access Collections?

To access the Collections in the Data Browser:

  1. Open the Data Browser: Log in to your Dataloop account and navigate to the Data Browser.
  2. Click the Collections Icon: In the left-side panel, click the Collections icon, situated below the Folder icon. This will display all your existing collections, allowing you to create new ones or manage current collections as needed.

For more collection actions, refer to the following links:


Tasks View

The Task-based view in Dataloop's Data Browser allows you to filter and search for items based on task-related attributes, such as task status or lack of assignments. This feature offers advanced filtering options, enabling better data management.

Task Status Filter Component

The Data Browser includes a dedicated filter component for tasks and task statuses on the left-side panel. This component enables search and filtering items by task-related statuses.

How to Use Task View Option in the Data Browser?

  1. Go to the Data Browser.
  2. Select the Tasks option from the left-side panel.
  3. Use the following options to view items:
  • All Tasks: Displays items that are part of any task.
  • No Task Assignment: Displays items that are not part of any task.

Search Tasks: Search any tasks using its name to view items for the searched task.

Note

When using the match operator in smart search, the behavior of the task filter changes. The task-related filters in the smart search will be cleared. The "Task" filter section in the smart search will become inactive, indicating that it is not being used.

Filtering Items by Task Status: Use the filter to view items based on the following task status options. Use the Timestamp to select a time period to view the items.

  • Items With Status: Filter items that have an assigned task status.
  • Items Without Status: Filter items that do not have an assigned task status. When this option is selected, the checkboxes for including or excluding specific statuses will be disabled. If certain statuses were selected before applying this filter, they will remain selected but will appear disabled in the UI. The DQL (Data Query Language) query will update accordingly.

Task Status Filter Option:

  • Completed: View items with a Completed status (Include) or exclude them (Not Completed).
  • Discard: View items marked as Discard (Include) or exclude those not marked as such (Not Discarded).

If no tasks are available for the items, create a task by following the instructions in the Create Task article.

Item Status

Difference Between Annotated, Completed, Approved, and Not Annotated Items

An item earns the status annotated if it receives any form of annotation, such as a classification, a note, or any other tool-generated annotation, regardless of whether this occurs in the dataset browser or during a task. However, being annotated doesn't necessarily mean an item is completed.

  • This situation may arise if the item is annotated via the dataset browser, where annotations can be saved without assigning a status, or if it's annotated during a task without the complete button being engaged. Perhaps because the annotator plans to return to it later.
  • Completed status is assigned to an item when an annotator finalizes their work on it by clicking the complete button during a task.
  • An approved status is granted after an item undergoes a QA task and the QA tester decides to approve it by clicking the respective button.
  • An item is deemed not annotated if it lacks any annotations. Interestingly, an item can be marked as completed and yet be considered not annotated if the complete button was clicked without any actual annotation work being done on it.

ML Subsets View

The ML Subsets View in Dataloop's Data Browser is a dedicated feature designed to enhance machine learning workflows by organizing and managing your dataset effectively. It allows you to classify and filter dataset items based on their ML Subset assignments, such as train, validation, and test, which are commonly used in the ML lifecycle for model development and evaluation.

Filtering by ML Subset Assignment:

Easily locate items in your dataset based on their subset classification:

  • Unassigned items: Items that have not been allocated to any specific subset.
  • Train: Items designated for training the ML model.
  • Validation: Items used to tune hyperparameters and evaluate the model during training.
  • Test: Items reserved for final evaluation of the trained model's performance.

Use Cases:

  • Training Pipeline: Quickly access the train subset to prepare data for model training.
  • Model Validation: Focus on the validation subset to monitor the model’s performance during tuning.
  • Final Testing: Access the test subset to evaluate the final accuracy, precision, or other metrics.

Select and right-click on any item from the folders and perform actions available.

Why Use the ML Subsets View?

  • It simplifies dataset management by clearly segregating data for training, validation, and testing, reducing errors in ML workflows.
  • Enhances collaboration within teams by providing a consistent structure for dataset organization.
  • Saves time by offering intuitive filtering and search options for specific subsets.

How to View Items Based on the ML Subsets Assignment?

  1. Go to the Data Browser.
  2. Select the Model icon from the left-side panel. The available items are displayed in folder wise.
your title goes here

If there are no items added to ML Subsets, click Split Into Subsets.


Section 4: Settings, Actions, and Details

Data Browser allows users to perform various actions, manage settings, and access detailed information about specific items or features.

Dataset Details

The Dataset Details provides the following information related to the dataset. To view the Dataset Details, no items should be selected.

General Details

The following details are part of the general details of the Dataset.

  • Dataset ID: The ID of the dataset. Click on the Copy icon to copy the dataset ID.
  • Recipe: The name and the link of the recipe that is configured to the dataset. Click on the link to open the recipe page.
  • Analytics: Dataset analytics refers to the process of collecting, analyzing, and deriving insights from a dataset. Click on the link to view the Progress tab of the Analytics page.
  • Project: The project name of the dataset. Click on the Copy icon to copy the project ID.
  • Owning Organization: The name of the organization to which the dataset is affiliated. Click on the Copy icon to copy the Organization ID.

ML Data Split

ML Data Split Chart displays the distribution of items across machine learning subsets, including Test, Train, and Validation, showing both the total numbers and percentages.

  • This chart is generated when an item is assigned to one of the ML subsets.
  • When hovering over a subset, the chart specifically highlights the subset being hovered over.
  • Clicking on a subset grays it out, removing it from the chart.
  • Clicking on it again restores its visibility on the chart.

Embeddings

The Embeddings section in the bottom right corner of the Datasets page provides crucial information about the feature vectors associated with the dataset. Here's a detailed breakdown:

  • Feature set: The label indicates that the feature vector's name. For example, clip, nnlm, text-embeddings, etc.
  • Feature vectors: The value refers to the number of feature vectors generated, corresponding to the number of items in the dataset.
  • Status: The status (for example, Success) signifies that the embedding process has been completed successfully, meaning all the items have been processed to generate their respective feature vectors without any issues. Available Statuses are Success, Running, Created, and Failed.
  • Updated At: This field shows the date that the embeddings were last updated. It indicates the most recent time the feature vectors were recalculated or updated based on changes in the dataset.
  • Model Application Name: If available, it displays the name of the model used to extract the feature vectors.

Add Embeddings

  • Click Add to select a model to extract embeddings for the selected items. It displays the Extract Embeddings side-panel, where it allows you to select the deployed model to start the extraction process by clicking on the Embed.
  • Also, select the checkbox to enable the Automatically run on new dataset items feature.

If you have extracted embeddings using a model, the following actions are available when you click the three dots:

  • Open Model: Click this option to open the Model's details page.
  • Run on New Items: Click this option to initiate the extraction process for new items added to the dataset.
  • View Logs: Click this option to open the Service Logs page.
  • View Executions: Click this option to open the Service Execution page.
  • Select Model: If the Dataset has no embeddings, click Select Model to browse the project model registry to choose an embedding model. It displays the Extract Embeddings side-panel, where it allows you to select the deployed model to start the Extraction Process by clicking on the Embed.

Item Tab

Choose an item to view the following details in the right-side panel:

  • ML Subset: Displays the ML Subset tab if the item has been assigned to a subset, such as Validation, Test, or Train.
  • Collections: Displays the collection folder name of the selected item.
  • File Name: The name of the selected item. Click on the copy icon to copy the file name.
  • Created at: The creation date of the selected item.
  • Description: The text description of the item. Click on the pencil icon to add or edit descriptions. Also, item descriptions can be added during file uploads, serving as an additional way to search for items containing specific text or descriptions.
  • File path: The folder path where the file is located. Click on the copy icon to copy the file path.
  • Item ID: Unique identification for the item. Click on the copy icon to copy the item ID.
  • Item path: A URL link to the item on the Dataloop platform. Click on the copy icon to copy the item path.
  • Parent Item Link: If the item is a clone, it shows the parent item as a link. The link takes to the datasets where the original item is located, filtered to that item.
  • Labeling Tasks: This section provides the number of annotations and classifications associated with the selected item.

Automation Tab

The Automation tab allows your selected item to run with FaaS, Pipeline, or Model Predictions. Click on the following options to create a function, pipeline, or model prediction execution:

  • Run with FaaS: It lists the functions of the all the activated FaaS services. Select a function to execute with the selected items.
  • Run with Pipeline: It allows you to select a pipeline to execute with the selected items.
  • Run Model Predictions: It allows you to select a model to generate predictions for the selected items. Only trained and deployed models are available for selection.

When executions are available, you can search executions by function, application, or pipeline. Also, the following details are displayed:

  • Pipeline: The name and link of the pipeline. Click on the link to view the pipeline.
  • Application name: Name of the application.
  • Function name: Name of the function.
  • Execution Status: Success, Running, Created, and Failed.
  • Updated At: Date and time of the execution update.
  • Rerun: If needed, click the Play icon to rerun the execution.
  • Filter icon: Filter executions based on the status, such as Success, Failed, Running, and Pending.
Logs and Executions

Click the link to access a comprehensive overview on the Executions or Logs page.

Metadata Tab

Item metadata refers to the descriptive information and attributes associated with individual items within a dataset.

You can perform the following actions:

  • Click on the copy icon to copy the metadata.
  • To edit the metadata:
    1. Click the Edit icon to open the editor.
    2. Make changes as required.
    3. Click on the Save icon to save the changes.

Dataset Actions

Dataset Browser allows you to perform the following actions based on dataset and item level. The following actions are available to perform when you click on the Dataset Actions. A few actions are not applicable if you select more than one item.

Right-Click: You can also use the right-click to perform the following actions. The menu is opened only for the individual item on which the right-click is performed.

Once you click on either Dataset Actions or Right-Click on an Item, the following options are listed:

File Actions

You can perform the following action on a file within the Dataset. Please note that actions from the right-click menu cannot apply to multiple selected items simultaneously.

Labeling Tasks

Models

Deployment Slot

Run with FaaS

Run with FaaS

Run with Pipeline

Run with Pipeline

Download Annotations

Download Annotations

To view more actions available on the dataset actions, refer to the Manage Datasets article.


Section 5: View Number of Items and Annotations

The Data Browser provides you the number of items and annotations available in the dataset. You can view the following information on the top-right side of the data browser page:

  • All Dataset Items: It displays the number of items available in the dataset.
  • Annotated Items: It displays the number of items that are annotated.
  • Annotations: It displays the number of annotations available in the all annotated items.