- 12 Mar 2025
- Print
- DarkLight
- PDF
Model nodes
- Updated On 12 Mar 2025
- Print
- DarkLight
- PDF
Overview
Pipeline automation involves creating an end-to-end automated workflow that includes data ingestion, preprocessing, model training, evaluation, and prediction. The Model Nodes are essential components in this automated pipeline. Here’s a detailed look at how each Model Node functions within pipeline automation.
A warning notification will be displayed to prompt the user to set an Integration and Secret if they have not been configured yet.
Predict model node
The Predict Model node in Dataloop Pipelines automates annotation by using pre-trained or custom models (e.g., SAM, YOLO, ResNet). It identifies samples needing human review, speeds up annotation, and optimizes resources. Additionally, it enhances model performance through continuous learning and prepares data for honeypot or qualification tasks. Unlike FaaS, it manages computing settings automatically.

A notification with a link to set an integration or a secret will appear if not available in the node.
When you click on a Predict Model node, its details, such as Configuration, Executions, Logs, Instances, and available actions are shown on the right-side panel.
Config Tab
- Node Name: By default, Predict Model is displayed as name. You can change it accordingly.
- Model: Select a fixed model or variable for prediction. You can select a specific trained, pre-trained, or deployed fixed model version from the list or use a variable that can be updated during the pipeline execution.
- Set Fixed Model: It allows you to set the selected model as a fixed model for prediction
- Set Variable: It allows you to set a Pipeline Variable (follow from the Step 3).
- Node Input: Input channels are set to be of type: item by default. Click Set Parameter to set input parameter for the Predict Model node. For more information, see the Node Inputs article.
- To use the predict node, connect it to another node with the following inputs:
- Item & annotation[]: Each predict node execution will initiate one execution of this node.
- Item & annotation: Each predict node execution will initiate N executions of this node, according to the number of items in the annotation list. Each execution will run with an item-annotation pair.
- Item/Annotation: Each predict node execution will initiate one execution of this node.
- Annotation[]: Each predict node execution will initiate one execution of this node.
- To use the predict node, connect it to another node with the following inputs:
- Node Output: Output channels are set to be of type: Annotation by default.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Predict node actions
Predict node allows you to perform the following actions:

- Open Service Page
- Edit Service Settings
- Edit Access Credentials
- Open Model
- Open GitHub Code
- Copy Node ID
- View Analytics
- View Executions
- View Logs
Generate model node
The Generate Model node integrates a generative AI model into your pipeline. It processes a prompt item to generate a response, returning the result as an annotation. You can customize settings such as system prompt, max tokens, temperature, and more directly from the node configuration.

When you click on the Generate node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.
A notification with a link to set an integration or a secret will appear if not available in the node.
The Generate model node details are presented in four tabs as follows:
Config Tab:
- Node Name: By default, Generate is displayed as name. You can change it if required.
- Model: Select a Generate model from the list, either Set a fixed model or Set variable (Type: model) that can be updated dynamically during pipeline execution. If there is no model available, click on the Install Foundation Model link to install it from the Marketplace. Also, you can click on the model and application name to view more information.
- Model Configuration: You can easily adjust the model's configuration directly from the node settings, including options like:
- System prompt: The system prompt provides context or sets the tone for the model's responses. It acts as an initial instruction or guidance that influences how the model generates its output. Click Edit to make changes in the System Prompt.
- Max tokens: It controls the maximum number of tokens (words or word pieces) the model can generate in response to a given prompt. By default, 1024 is displayed.
- Temperature: It controls the randomness of the model's output. It influences how creative or conservative the generated text will be. The default temperature is set to 0.5, which strikes a balance between deterministic and random output. A lower temperature (closer to 0) makes the model more focused and deterministic, often generating more predictable responses. A higher temperature (closer to 1) increases randomness, leading to more diverse and creative responses.
- Additional Parameters: Click Edit to make changes in the model configuration parameters.
- Input: By default, Type: Item is selected.
- Output: By default, Type: Item and Annotations are available.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Generate node actions
Generate node allows you to perform the following actions:

- Open Service Page
- Edit Service Settings
- Edit Access Credentials
- Open Model
- Open GitHub Code
- Copy Node ID
- View Analytics
- View Executions
- View Logs
Train model node
The Train Model node trains a new model version using labeled data, optimizing its parameters. It integrates into your pipeline, taking an untrained model as input. Once trained, the model seamlessly continues in the pipeline. The node triggers the train()
function for accessible models with a created status.

When you click on a Train Model node, its details, such as Configuration, Executions, Logs, and available Actions are shown on the right-side panel.
A notification with a link to set an integration or a secret will appear if not available in the node.
Config Tab:
- Node Name: By default, Train Model is displayed as name. You can change it accordingly.
- Model Application: It allows you to select an installed trainable-model for the training process. If there is no model available, click on the Install Foundation Model link to install a trainable model from the Marketplace. Once the model is installed, select it from the list.
- Input: By default, Type: Model is selected. The input for the node is a model that you want to train.
- Output: By default, Type: Model is selected. The output of the node is the trained model. The status will be updated to Trained.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Train node actions
Train node allows you to perform the following actions:

- Open Service Page
- Edit Service Settings
- Edit Access Credentials
- Open GitHub Code
- Copy Node ID
- View Executions
- View Logs
Evaluate model node
The Evaluate Model node generates predictions from a trained model on a test dataset and compares them to ground truth annotations. It uploads evaluation scores for downstream use, such as in the Compare Models node. Performance is assessed using predefined or custom evaluation metrics.

A notification with a link to set an integration or a secret will appear if not available in the node.
When you click on an Evaluate Model node, its details, such as Configuration, Executions, Logs, and available Actions are shown on the right-side panel.
Config Tab
- Node Name: By default, Evaluate Model is displayed as name. You can change it accordingly.
- Model Application: It allows you to select an installed model for the evaluation process. If there is no model available, click on the Install Foundation Model link to install a model from the Marketplace. Once the model is installed, select it from the list.
- Node Input: By default, the following are selected:
- Type: Model: The trained model used for evaluation.
- Type: Dataset: The dataset used to do the evaluation of the trained model. For example, the ground truth for the active learning pipeline.
- Type: JSON (filters): The filter used to do the evaluation of the trained model.
- Node Output: By default, the following are selected:
- Type: Model: The output model has been evaluated.
- Type: Dataset: The dataset that has been evaluated by using the trained model.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Evaluate node actions
Evaluate node allows you to perform the following actions:

Embeddings model node
The Embedding node allows you to integrate any embedding model into your pipeline, either directly from the Dataloop marketplace or using your own custom model. Extract embeddings from your data for tasks such as similarity search, clustering, data curation, RAG pipelines, and more.

A notification with a link to set an integration or a secret will appear if not available in the node.
When you click on the Embeddings node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.
Config Tab:
- Node Name: By default, Embeddings is displayed as name. You can change it if required.
- Model: Select an Embedding model (listing all model entities with ‘embed’/'embed_items' function in the dpk.module that are available in the project) from the list, either Set a fixed model or Set a variable (Type: model) that can be updated dynamically during pipeline execution. If there is no model available, click on the Install Foundation Model link to install it from the Marketplace. Also, you can click on the model and application name to view more information.
- Input: By default, Type: Item is selected.
- Output: By default, Type: Item and JSON are available.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Embeddings node actions
Embeddings node allows you to perform the following actions:

- Edit Service Settings
- Edit Access Credentials
- Open Model
- Open GitHub Code
- Copy Node ID
- View Executions
- View Logs
Compare models node
The Compare Models node compares two trained model versions using evaluation results from the same test set or training metrics. It requires a comparison configuration (JSON) and two models: Previous and New. If the New model outperforms the Previous model, it is marked as "Update model" for deployment; otherwise, it is labeled "Discard."

The parameters that need defining are:
- Node Inputs:
- previous_model: By default, Type: Model is selected.
- new_model: By default, Type: Model is selected.
- compare_config: The configurations for the comparison (JSON): By default, Type: JSON is selected.
- dataset: By default, Type: Dataset is selected.
- Output:
- winning_model: By default, Type: Model is selected. Available statuses are Update Model and Discard.
Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
- winning_model: By default, Type: Model is selected. Available statuses are Update Model and Discard.
Learn about the Executions, Logs, and Instances articles.
Compare node actions
Compare node allows you to perform the following actions:

Create new model node
The 'Create New Model' node clones an existing model to create a new version for fine-tuning. It requires a base model, model configurations (in JSON), and datasets with subsets for training and validation. The node outputs the new model after execution.

The parameters that need defining are:
- Node Inputs:
- base_model - the model to clone:
- dataset - the dataset to train on
- train_subset - the DQL query for the subset of training items (JSON)
- validation_subset - the DQL query for the subset of validation items (JSON)
- model_configuration - the model configurations to use for training (JSON)
- Output:
- new_model: The newly created model.
- base_model: the model to clone.
Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Create new model node actions
Create new model node allows you to perform the following actions:

Model data split node
The 'Model Data Split' node efficiently divides your data into training, validation, and test sets during runtime. You can define the distribution, and the node automatically assigns items to subsets using metadata tags for smooth data preparation.

The parameters that need defining are:
- Node Name: Provide a name for the split node.
- Subset Distribution: It allows you to distribute the data into train, validation, and test subsets.
- (Optional) Distribute equally: Select the ☑️ check to split the data equally into the subsets.
- Modify the distribution % if required.
- Item Tags: By default, items are tagged based on their assigned subset name.
- Input: By default, Type: Item is selected.
- Output: By default, Type: Item is selected.
- Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.
Learn about the Executions, Logs, and Instances articles.
Model data split node actions
Model data split node allows you to perform the following actions:

Model application nodes
Model application nodes in Dataloop's pipeline allow flexible interaction with machine learning models. They provide an interface to perform various operations, like prediction, generation, or embedding, and adapt easily to different tasks in your workflow without reconfiguring the model.
- Predict Operation: This function is typically used for generating outputs based on new input data. For example, if your model is designed for image classification, the "Predict" operation would produce the most likely category or label for a given image.
- Embed Operation: This function is used to create embeddings, which are compact, numerical representations of input data. These embeddings can be useful for various tasks, including clustering, similarity search, or as inputs to other models for more complex processing.
- Generative AI Operation: The Generative AI Operation refers to the functionality within a machine learning model that is specifically designed to create new, original content rather than merely analyzing or predicting outcomes based on existing data.
Install model application nodes: You can add Dynamic Model Nodes by clicking the Plus icon on the left-side panel and install them from the marketplace model library. If needed, you can also remove them from the list at any time.

Embeddings nodes
The Model application embeddings nodes are the shortcuts in the pipelines that provide a convenient way to interact with ML Embeddings models. It allows users to quickly choose and apply different operations that the model supports, such as Predict or Embed.
When configuring these nodes, you will find an Operation field in the Node Configuration tab. This field allows you to select which operation you'd like the node to perform. If the ML model you're working with supports both Predict and Embed operations, the Predict operation will be selected by default. However, you can manually switch to the Embed operation if that better suits your needs.
Embeddings node actions
Model application embeddings' node allows you to perform the following actions:

- Edit Service Settings
- Edit Access Credentials
- Open Model
- Open GitHub Code
- Copy Node ID
- View Executions
- View Logs
Generative model
The Model application generate nodes are the shortcuts in the pipelines that provide a convenient way to interact with ML Generative models. It allows users to quickly select and apply different operations that the model supports, such as Generate or Embed.
When configuring these nodes, you will find an Operation field in the Node Configuration tab. This field allows you to select which operation you'd like the node to perform. If the ML model you're working with supports both Generate and Embed operations, the Generate operation will be selected by default. However, you can manually switch to the Embed operation if that better suits your needs.
For the actions available on each node in the right-side panel, see the Pipeline Node Actions.
Dynamic generative node actions
Dynamic generative node allows you to perform the following actions:

- Edit Service Settings
- Edit Access Credentials
- Open Model
- Open GitHub Code
- Copy Node ID
- View Executions
- View Logs