Glossary
A
Annotation
An Annotation entity refers to a Label, tag, or Metadata associated with an Item in a Dataset. Annotations are used to provide additional context or information about an Item, and to enable machine learning models to understand and interpret the data. Annotations are typically created by human annotators, who use the UI to draw bounding boxes, select Labels, or enter text.
Applications
Functions and services can be deployed to execute automated processes on any entity within the Dataloop platform, including data items, annotations, tasks, datasets, and projects.
Artifact
Large files (binaries) are used to deploy the package. They are uploaded separately and are downloaded when the service is initiated. The main types of artifacts are Item, Local, and Link.
Assignment
An Assignment is a specific part of a larger Task, containing an Item or a collection of Items allocated to an Annotator for manual annotation and/or review, and includes all the information necessary for Annotators to complete the work. As Annotators work on the Assignment, the Dataloop system automatically collects analytics data and provides project managers with real-time insights into the progress of each Assignment, the quality of the Annotations, and the overall status of the Project.
Attribute
A specific property or characteristic associated with an Annotation. An attribute is a piece of additional information that provides context or metadata about an item, beyond what is captured through Annotation Labels alone. Type of attributes includes single-choice, multiple choice and free-text. An attribute has a ‘Section ID’ that identifies it in the JSON.
Autoscaler
A component of the Dataloop FaaS system that automatically adjusts the number of serverless computing Resources allocated to the FaaS Service based on current demand. The purpose of the Autoscaler is to ensure that the system can handle peaks in demand without overprovisioning Resources and incurring unnecessary costs.
B
Binaries
The contents of any file type, such as an image, a video, a PDF file, etc. Binaries are managed by the Master Dataset entity in Dataloop.
Bot
A dummy Project user with developer role permissions. Bots are used to run FaaS services and execute API/SDK requests. When deploying a service, a random bot is selected from the project to which the service was deployed. If a bot doesn't exist, the SDK automatically creates a new one or prompts you to create one from the UI.
Bounding Box
The bounding box is an annotation type. It represents two points for top-left and bottom-right. Bounding boxes can also be rotated, in which case, rotation angle information is added.
C
Cache
A ComputerCache or Cache in Dataloop can refer to a component that stores data temporarily in memory, making the data faster and easier to access. When data is requested by a user or an Application from within the FaaS (see entry for Package) namespace, it is retrieved from the cache instead of being fetched from the Services - reducing API calls and improving performance. By default, Cache is not active in projects. For cache activation, contact the Dataloop team.
Classification Label
A type of Annotation that is used to categorize Items based on the sum of their characteristics. Usually, the text is attributed to a classification Label that describes the category in which the Item was classified. For example, shirts, pants, and coats would all fall under the Clothing classification.
Clone
Dataset clones contain pointers to the original file binaries, enabling management of virtual Items that do not replicate the binaries (this clone is created without copying the file binaries) in the underlying storage once cloned. When cloning a Dataset, users can decide if the new virtual copy will contain Metadata and Annotations created on the original.
CodeBase
A CodeBase package is the code you import into the platform that contains all the modules and functions. When you upload the code to the platform, either from your computer or Github, it is saved on the platform as an item in a .zip file. There are four types of code bases, which are all limited to 100MB: Item Codebase, git Codebase, local Codebase, and filesystem (it is likely used when working in a remote container).
Command
A program or utility that runs from the command line is known as a command. An interface that accepts lines of text and converts them into instructions for your computer is known as a command line. A graphical user interface (GUI) is simply a command-line program abstraction.
Contributor
A project contributor is a user with access to a specific project and is authorized to contribute to its data, tasks, and annotations. Project contributors can have different levels of access and permissions, derived from their granted role. For example, an Annotator can view and annotate data, while a ‘Developer’ can also train machine learning models or manage the entire workflow.
Cron
Cron trigger executes functions at specific time patterns with constant input using Cron syntax. In the Cron trigger specification, you specify when the trigger should start, end, run, and what input should send to the action.
Cycle
A pipeline cycle refers to all node executions performed in a single pipeline run (usually over a specific Item). The executions are listed in the order in which they occurred. Each Cycle may have a different number of executions, because some Items in the pipeline may be routed differently due to filters in the pipeline and user actions.
D
Dataset
A dataset is a bucket collection of Items (files), their metadata and annotations. It can have a file-system-like structure with folders and subfolders at any level. A dataset is mapped to a Driver (which derives from an Integration), to contain items synced from external cloud storage. Dataset versioning actions include cloning and merging.
DQL
Dataloop Query Language - a query language syntax is used in the Dataloop platform to query for entities, mainly data items. DQL queries can define additional fields, such as sort order and page size. Every DQL query has the following components: Resource - The target Resource for the Query; the Resource can be Items or Annotations; Filter - The Filter includes Attributes and logical operators to filter Items.
Driver
A storage driver that uses Integration for connecting to cloud-storage systems (AWS S3, GCP, Azure and more) and facilitating specific connectivity protocols to buckets and folders, to allow indexing the stored binary files into datasets.
E
Entity
An entity is a Dataloop data model object represented by a JSON. It contains information about the various Dataloop entities and their related functions/operations and data. For example, for “Item” entities - Download, Update (e.g. update its Metadata), or update its status in Task.
Event
Event (Trigger) contains a project for which it monitors events, and resource types, such as Item, Annotation, Task, and so on. The action that happens to the resource, such as created, updated, deleted, status changed, etc. A DQL filter (The Data Query Engine) checks whether or not to invoke the operation based on the resource JSON and an operation.
Execution
Execution refers to the process of executing a function within the FaaS service. When a user submits a function for execution, the FaaS service creates a container to execute the function and provides all required input. The function is executed within the container and the results are returned to the user.
ExecutionIO
The execution input is the same input that the function needs. See Class Functions Input types for reference. The input is made available to the method invoking execution. The Dataloop input type, such as Item, Dataset, Annotation, etc. must be passed to the appropriate entity with an ID, or it can be a JSON type input which can have any JSON serializable value and is passed to the method unmodified.
F
Feature Vector
A Feature Vector or Set is a numerical representation of an object or entity, typically used in machine learning and data analysis. It consists of a list of Features or Attributes that quantitatively describe the object. A Feature set, on the other hand, is a collection of Feature Vectors that are used to train a machine-learning algorithm. The Feature Set contains all the necessary Features that are relevant to the problem being solved.
Filter
Filters are part of the Dataset and Task Browsers, and allow you to Filter Items by any aspect of your files. When multiple Filters are used, the relationship between them is established by the logical operator AND. However, the relationship between multiple values in each filter is established by the logical operator OR. For example, if you enter dog and cat in the Labels filter, all Items with the label dog OR cat are displayed.
Function
Functions are the basic functional units of FaaS. You can define the functions in the class, and when the service is deployed, you can run any of them. Functions are defined within a module, and multiple functions can be an access point to the FaaS.
G
Group
User groups allow you to create teams as a resource that you can reuse together in the Dataloop system, especially for workflow tasks. You can grant Project roles to Groups.
I
Integration
The Integration allows Dataloop organizations to define secrets for accessing cloud resources, including cloud storage (GCS/S3), Secure Token Service (STS), Container Registry Services (ECR/GCR), and others. Once the integration is defined for cloud storage, a storage driver must be created with storage details, such as bucket, folder, etc.
Item
An Item in Dataloop is a unit of data that represents a ‘single instance’ or ‘file’ of a larger Dataset. It can be an image, a video, a sound recording, a text document, or any other type of digital asset that needs to be labeled, annotated, or analyzed. Each Item in the Dataloop system is typically associated with one or more Tasks, which define the specific operations that need to be performed on the Item.
Item Status
When a worker finishes working on an Item in their Assignment, they act to set a status for the Item. For example, complete an action on an Item so it will have "Completed" status. Users can customize their status and use it later as a trigger event, analysis report, or search and filter activity.
L
Label
A piece of text that contains information about an annotation instance. Also, it includes the color and display name. Labels can be structured hierarchically.
Labeling Tasks
Labeling is formalized within a project by creating a labeling task for an entire dataset or specific items. Items completed by annotators (with a current status of "Completed") can then be assigned to a QA task for approval (status of "Approved"). If errors are found, an issue is assigned to the annotation, sending it back to the original annotator for correction. Once corrected, the annotation is reassigned for review. If the correction is satisfactory, it is marked as "Approved."
Linked Item
Linked items are a way to connect files hosted in your external storage to the platform, using URL links. The Dataloop platform supports displaying JSON files through Annotation Studios as Items that can be annotated, downloaded, and treated as images. Links enable displaying an Image in the Dataloop platform without storing it on Dataloop servers. The JSON file serves as a pointer to the binary file stored on the client storage.
M
Member
A Member of an Organization can open new Projects and view the Organization's Members. Members cannot add/remove other Members or delete the Organization.
Metadata
A dictionary object that contains metadata of the Dataloop object. Usually contains a “system” area, reserved for use by the Dataloop platform. Any fields outside the ‘System’ are considered as ‘User’ custom metadata.
Modality
Modalities represent a feature in the Dataloop platform that allows defining relationships between the main Item (where the relation is created) and other Items. Overlay modalities are overlaid on top of each other in the image Annotation studio, Replace modalities are open in the studio instead of the main item (for example when following a format change), and Preview modalities presenting reference Items related to the main Item.
Model
A Model (architecture) entity in Dataloop refers to a machine learning algorithm that has been trained on labeled data to make predictions or perform other Tasks. Models are a key component of the Dataloop system, as they enable users to apply machine-learning techniques to a wide range of data types, including images, text, and audio. A Dataloop Model is a combination of data configurations and code to represent a learnable instance of the data.
Model Adapter
A python class to wrap a generic ML code (train, predict, etc.) for standardizing models (and frameworks) to match the requirements of the Dataloop API. It can contain Log Sample to measure and compare different models and metrics. e.g. saving the training/validation loss and accuracy of a training session.
Model Version
A baseline version created from a model architecture in the AI Library (usually untrained, and then starts training on selected data), or from an existing Model Version.
Model Weights or Artifacts
Model Weights / Model Artifacts files are saved or created by training a machine learning algorithm with a dataset. They are customized using optimization algorithms and are unique for each training session.
Module
Modules are a reference to a Python file that contains the Python class (ServiceRunner by default) with functions inside it. Modules can contain functions, classes, or other components of code, and can be used to perform specific data-related Tasks.
N
Neural Network
A model of machine learning composed of layers consisting of simply connected units or neurons followed by nonlinearities.
Node
The Pipeline consists of different Nodes, each of which has a different role in the Pipeline, such as storing data, executing functions, training models, or sending data to Annotation or QA Tasks. The main types are Dataset, Workflow, FaaS, Code, and Utilities.
O
Ontology
A set of definitions that defines the structure and relationships of your labels. The Ontology of a Dataset is a building block of your model and defines the object detection provided by your trained model. It contains two important components, labels and attributes, that are used in your Project.
Organization
Consists of one or more users collaborating on data-related projects and sharing resources and data. An Organization is composed of multiple elements like Integration/Secrets, Members, Bots, and Computer-Cache.
Owner
An Owner in Dataloop represents the user who created the Organization. This user cannot be removed. An owner can delete/rename an Organization, create Projects, and add/remove Organization Members.
P
Package
A Package is a collection of static code with a schema that holds all the Modules, functions, and the code base from which they can be taken. Packages are used to build models or deploy services and refer to an entity that is processed using the "Function-as-a-Service" (FaaS) technology. Packages can be public, global, or specific to a particular Project and are limited to a maximum size of 100MB.
Pagination
Dataloop uses pages instead of a list when there is an object that contains many entries. The page object divides a large list into pages (with a default of 1000 items) to save time when going over the entries. You can redefine the number of entries per page with the ‘page_size’ attribute. When going over all entries in a page out of multiple pages, Dataloop uses nested loops to first go to the pages and then go over the entities for each page.
Pipeline
A collection of functions (machine processing via FaaS, models and code snippets) and tasks (human processing in labeling and QA tasks) that creates a processing flow. Structured from nodes (processing units) and connections (to move data types between nodes).
Polygon or Polyline
Polygon is an Annotation type. Represented by a list of (x,y) points.
Project
A project is a high-level organizational entity that defines and contains a scope of work and includes entities, such as Datasets, Recipes, Workflows, Contributors, FaaS, etc. It provides a centralized location for managing data, tasks, and annotations related to a specific work context of ML/AI development. It improves collaboration between team members, consistency of data and annotations, and tracking of progress and results.
R
Recipe
A Recipe is a set of instructions or rules that define how data must be processed, labeled, or analyzed within a Project. It can be templates or workflows that provide a standardized way of working with data and can help to streamline the process of generating labeled Datasets for machine learning and other applications. Linked with an Ontology, the Recipe adds labeling instructions and settings, such as labeling tools, mapping of tools to specific labels/Attributes, PDF instructions file, etc.
Repository
A collection of entities, which are usually queried (e.g. using a filter), or referred to (for example all Items in a Dataset entity). It allows performing bulk operations (for example, Deleting all items), or addressing each entity within the repository (for example every Item in an Items collection).
S
Secrets
Secrets enables Dataloop Organizations to define tokens for accessing cloud Resources, including cloud storage such as GCS/S3, Secure Token Service (STS), container registry Services (ECR/GCR), and others. Once the integration is defined for cloud storage, a storage driver must be created with storage details such as bucket, folder, etc.
Semantic Segmentation Masks
A binary mask Annotation type. Represented by a mask of the same size as the image.
Service
A serverless computing Service that allows users to run code without the need to manage servers or infrastructure. It can also be thought of as a deployed Package that serves the code. Given the matching input to a function, it will run and return the output.
T
Task
A unit of work that is completed by an individual or a team. It can be annotating data, reviewing Annotations, labeling images, performing quality assurance checks, or any other data-related Task that requires human input. Tasks are created by managers, who define the requirements for each Task, such as the data type to be labeled, Annotation instructions, due date, priority, and the number of annotators required. Tasks are then assigned to individual annotators or teams of annotators.
Transcription
A type of Annotation used to convert spoken words or written text into machine-readable formats.
Trigger
A rule-based mechanism that initiates an action when a specific event occurs. Triggers are used to automate workflows and streamline data processing. They are created by defining a set of conditions that must be met for the Trigger to be activated. These conditions can be based on a variety of factors, such as the content of data, the time of day, or the occurrence of specific events. See ‘Event’ and ‘Cron’ trigger types.
U
UI Slot
UI slots create a button in the Dataloop platform, allowing managers to associate to that location FaaS functions, so users can invoke them when needed. Once a Slot is activated, users can execute the function through the UI Slot in the Dataset browser, Task browser, or Annotation studio.
Users
Users are added to the project either from the organization that owns the project (typically for higher-level roles) or from other organizations (or possibly no organization at all). A project can have one or more Project Owners and Developers. The Project Owner can add Annotation Managers, who oversee the day-to-day work of all Annotators.
W
Worker
The Worker role as part of an Organization - a role designated for adding users with no permissions to the Organization itself. Typically used for adding Annotation workforce and arranging them into Task Groups, which can be added to Projects managed by the Organization. Accordingly, workers cannot view the list of Organization Members, access any Secrets, or open new Projects.