- 12 Mar 2025
- Print
- DarkLight
- PDF
Data nodes
- Updated On 12 Mar 2025
- Print
- DarkLight
- PDF
Dataset Nodes
The Dataset node offers flexibility in data pipelines, allowing you to generate a new dataset or use an existing one for filtering or storage. It can be placed at the beginning, middle, or end of the pipeline to streamline data processing.

- At the beginning: The Dataset node filters triggered items, ensuring only selected dataset items (and folders, if specified) are processed. However, these items won’t automatically trigger the pipeline; you must use an event trigger, manually execute via the Dataset Browser, or use the Dataloop SDK.
- In the middle or end: The Dataset node clones processed items to the specified dataset or folder. If an item already exists at the destination, it is skipped.
When you click on a Dataset Node, its details, such as Configuration, Executions, Logs, and available Actions are shown on the right-side panel.
Config Tab
- Node Name: Provide a name for the dataset node. By default, the name of the node will be displayed.
- Dataset: Select an existing dataset, or click Create Dataset to create a new dataset. A Dataset node can only have one output channel and one input channel.
- Set Fixed Dataset or Set Variable: It allows you to set the selected dataset as a fixed dataset or set a Pipeline Variable (follow from the Step 3).
- Folder (Optional): Select a folder within the selected dataset. This option will not be accessible if no dataset is selected.
- Trigger Existing Dataset and Folder Data to the Pipeline:
- Enable this option to automatically load existing data into the pipeline dataset node upon activation, based on the chosen dataset, folder, and any DQL filter in the trigger.
- This option is only available when this node is the start node.
- Note: This is a one-time action and does not re-trigger after changes to the dataset, folder, or filters, or if the pipeline is paused and resumed.
- Node Input: Input channels are set to be of type: item by default. Click Set Parameter to set input parameter for the dataset node. For more information, see the Node Inputs article.
- Node Output: Output channels are set to be of type: item by default.
- Trigger (Optional): An Event/Cron Triggers can be set on this node, enabling you to initiate the pipeline run from this specific point.
For information on the Executions and Logs tabs, see the Node Details article.
Dataset node actions
Dataset node allows you to perform the following actions:

Update variable nodes
The Update Variable node allows you to manage the pipeline variables and update their values dynamically in real-time during pipeline execution.
- You can select the required variables from the dropdown list.
- The node input/output will be updated automatically according to your selection.
- When the Update Variable node executes, the node delivered input will be set as the new value for the variable.

When you click on an Update Variable node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.
Config Tab
- Node Name: By default, Update Variable is displayed as name. Make changes accordingly.
- Variables: Select required variables from the dropdown list, or create a new one.
- Node Input: It will be set automatically after selecting a variable. Click Set Parameter to set input parameter for the dataset node. For more information, see the Node Inputs article.
- Node Output: It will be set automatically after selecting a variable.
- Trigger (Optional): An Event/Cron Triggers can be set on this node, enabling you to initiate the pipeline run from this specific point.
For information on the Executions, Logs, and Instances tabs, see the Node Details article.
Update variable node actions
Update Variable node allows you to perform the following actions:

Data split nodes
The Data Split node is a powerful data processing tool that allows you to randomly split your data into multiple groups at runtime. Whether you need to sample items for QA tasks or allocate your ground truth into training, test, and validation sets, the Data Split node simplifies the process.

Simply define the groups, set their distribution, and optionally tag each item with its assigned group. The tag will be appended to the item's metadata under metadata.system.tags (list)
. Use the Data Split node at any point in the pipeline to tailor the data processing.
Minimum groups: 2
Maximum groups: 5
Distribution must sum up to 100%
For instance, to sample 20% of the annotated data for review (QA Task), create two groups ("Sampled"/"Not_Sampled") and set the required distribution (20-80). Afterward, add a node connection from the "Sampled" group to the QA task, ensuring that only 20% of the data is directed for QA during runtime.

The Data Split node details are presented in four tabs as follows:
Config
- Node Name: Display name on the canvas.
- Groups and Distribution: Allows to create groups and manage data distribution (%). At least 2 groups must be specified, and no more than 5 groups.
- Distribute equally: Mark this option to force equal distribution between the groups.
- Group Name and Distribution fields: Enter the name for the groups and add distribution percentages.
- Item Tags:
- Tag items based on their assigned group name: By default, this option allows you to add a metadata tag items once they are assigned to a group. The tag will be the group name and will be added to the item's metadata field:
metadata.system.tags (list)
. - Override existing item tags: When you select this option, the tags that are already available in the items will be replaced with the newly assigned tag. This option will be disabled, if you unselect the above option.
- Tag items based on their assigned group name: By default, this option allows you to add a metadata tag items once they are assigned to a group. The tag will be the group name and will be added to the item's metadata field:
Node Input: item that will be automatically assigned to a group (randomly, based on the required distribution). Click Set Parameter to set input parameter for the Data Split node. For more information, see the Node Inputs article.
Node Output: The output will be set automatically according to the defined groups.
Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
For information on the Executions, Logs, and Instances tabs, see the Node Details article.
Data split node actions
Data split node allows you to perform the following actions:
