Quick References

This guide provides a structured overview of key functionalities and step-by-step instructions to help you efficiently navigate and utilize the Dataloop platform.

Quick Start References

Sign In – Learn how to create an account, log in, and reset your password.
Create or Join an Organization – Set up your workspace by creating a new organization or joining an existing one.
Add Members to an Organization – Invite team members for seamless collaboration.
Create a Project – Manage datasets and tasks within a dedicated project.
Upload or Sync Your External Data – Bring in your data via uploads or cloud storage integration.
Manage Your Datasets – Organize datasets, set up cloud storage drivers, and integrate cloud services.
Set Up Ontology (Labels & Attributes) – Define labels and attributes for annotation tasks.
Create a Labeling Task – Assign annotation tasks using Dataloop’s Annotation Studios.
Use Labeling Studios – Perform annotations on images, videos, text, and PDFs.
Create a QA/QC Task – Ensure high-quality labeled data with Quality Assurance and Control tasks.
Install & Use AI Models – Integrate ML models and automate workflows.
Create a Pipeline – Automate processes by connecting annotation, QA, and ML services.
Create an Active Learning Pipeline – Use model predictions and confidence scores to prioritize uncertain data for annotation, improving ML model performance iteratively.
Deploy & Install Application Services – Enhance your project with custom or marketplace services.

User Management & Permissions

Access Control - Project & Organization Level – Manage roles and permissions for teams.

Managing Datasets

Creating & Importing Datasets – Set up and structure data efficiently.
Dataset Organization – Use collections, metadata, and ML subsets for better management.
Filtering & Searching – Locate relevant data quickly with advanced search tools.

Annotation & Labeling

Labeling Studio – Annotate images, audios, videos, LiDAR, GIS, text, RLHF, and PDFs.
Annotation Automation – Leverage AI-assisted labeling and automation tools.
Quality Assurance – Validate annotations to maintain high-quality labeled data.

ML & AI Integration

ML Subsets & Data Splitting – Organize datasets into train, validation, and test subsets.
Model Deployment – Manage and deploy AI models within Dataloop.

API & SDK References

SDK Documentation – Programmatically interact with Dataloop.
API Guide – Integrate Dataloop with external applications.

Terms and Definitions

This table provides simple explanations of key terms used across the Dataloop platform. It is designed to help users—whether new or experienced—quickly understand the main concepts, roles, and components that power data management, annotation workflows, machine learning integration, and automation in Dataloop.

Term	Definition
Admin *(organization role)*	A user who can manage projects and members, similar to an owner.
Annotation	A label (e.g., bounding boxes, tags, text) that can be queried.
Annotation Manager *(project role)*	Manages annotation and QA tasks, redistributes assignments, and reviews annotation quality.
Annotator *(project role)*	Works on assigned annotation or QA tasks only, without permissions to view or manage others.
Application	Software built using the Function-as-a-Service (FaaS) model, where functionality is implemented as independent functions.
Assignment	A subset of items from the labeling task that are distributed to a specified annotator.
Attribute	Metadata about annotations (e.g., Yes/No answers, sliders, free text), providing more context.
Bot	A system-generated user used to run services. Bots log in automatically and make API calls with tokens.
Cycle	A pipeline cycle representing the full sequence of node executions during a single pipeline run, typically tied to a specific item.
Dataset	A collection of items (files) with metadata and annotations. Acts as the main unit of data storage and management.
Developer *(project role)*	Can manage datasets, create tasks, set recipes, and export data. Typically handles technical aspects of a project.
DQL (Dataloop Query Language)	A query language used to filter, sort, and search data items and annotations across datasets.
Execution	Functions are the basic units of an application. Defined in a class, they can be executed once the service is deployed.
Filter	A tool to narrow down datasets, items, or tasks by specific criteria. Alongside basic filtering, use Smart Search for intuitive queries, or DQL for advanced exploration.
Function	A single operation inside a module that can be executed when a service is deployed.
Group	Created within an organization, groups are reusable teams of users that can be assigned roles collectively in workflows and projects.
Integration/Secrets	Secure connections that store encrypted credentials (tokens, keys, passwords) for accessing cloud resources like GCS, S3, STS, and registries.
Item	A single file (image, video, text, audio, etc.) in a dataset. Each item can be annotated, labeled, or processed.
Item Status	A label showing progress or outcome of an item (e.g., Completed, Approved, Discarded).
Labels	Categories or classes used to create annotations (e.g., “cat,” “car,” “tree”).
Master Dataset	The original dataset that stores the actual files.
Member *(organization role)*	A user with limited permissions: can create projects and view members but cannot manage others or delete the organization.
Modality	A link between items, used to show overlays (e.g., multi-sensor images), replacements, or previews.
Model	A trained machine learning algorithm used for predictions or data labeling.
Node	A single step in a pipeline, such as dataset management, function execution, or ML training.
Ontology	The structure of labels and attributes in a project, forming the building blocks for training models.
Organization	A workspace in Dataloop, made up of users who collaborate on projects, share data, and manage resources together.
Owner *(organization role)*	The creator of an organization with full control (create projects, manage members, delete organization). Cannot be removed.
Package	A collection of static code (modules, functions, schema) used to build models or deploy services, processed using application technology.
Pipeline	An automated workflow of connected nodes that process data using humans, code, and ML models.
Project	A high-level workspace within an organization, focused on a specific goal. It manages datasets, tasks, and annotations.
Project Owner *(project role)*	Has full control of a project: can manage datasets, assign contributors, export data, set workflows, and delete the project.
Recipe	A set of labeling rules and instructions linked with an ontology, defining how data should be processed or annotated.
Service	A serverless computing service for running code without managing infrastructure. Conceptually, a deployed package serving the code.
Storage Driver	Bucket-specific connectors attaching external storage (e.g., GCS, S3, databases) to Dataloop, created from a storage integration.
Task	A unit of work (annotation, QA, review) assigned by project managers and containing assignments.
Trigger	A rule that starts an action automatically when conditions are met, either cron-based or event-based.
UI Slot	A button in the Dataloop UI that allows users to run functions directly within Annotation Studio, Dataset Browser, and other pages.
Worker *(organization role)*	A user role for annotators or workforce teams. They can work on tasks but have no access to secrets, projects, or organization settings. This role overrides project-level access.