Model Nodes
  • 15 Aug 2024
  • Dark
    Light
  • PDF

Model Nodes

  • Dark
    Light
  • PDF

Article summary

Model Nodes

Pipeline automation involves creating an end-to-end automated workflow that includes data ingestion, preprocessing, model training, evaluation, and prediction. The Model Nodes are essential components in this automated pipeline. Here’s a detailed look at how each Model Node functions within pipeline automation.

Mandatory Secrets & Integrations

A warning notification will be displayed to prompt the user to set an Integration and Secret if they have not been configured yet.

Predict Model

The pipeline described begins with the Predict component, which uses a pre-trained model to make predictions on unlabeled data. These predictions help identify samples that need human annotation. Unlike FaaS, the pipeline handles computing settings and resources for you.

The Predict Model node in Dataloop Pipelines is a valuable feature that allows users to use pre-trained, trained, deployed models like SAM, YOLO, Resnet, or their own custom models for data annotation. This speeds up the annotation process and saves time and resources by automating annotation and creating pre-annotated data for human review and deployment.

Additionally, the node improves model performance through continuous learning by adding predicted annotations to the ground truth dataset. It can also be used for preparing data for honeypot or qualification tasks, customizing datasets for high-quality annotation work.

Details

When you click on a Predict Model node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.

The Predict Model node details are presented in four tabs as follows:

Config Tab

  • Node Name: By default, Predict Model is displayed as name. You can change it accordingly.
  • Set Fixed Model or Set Variable: It allows you to set the selected model as a fixed model for prediction or set a Pipeline Variable (follow from the Step 3).
  • Model: Select a model for prediction. You can select a specific trained, pre-trained, or deployed fixed model version from the list or use a variable that can be updated during the pipeline execution.
  • Node Input: Input channels are set to be of type: item by default. Click Set Parameter to set input parameter for the Predict Model node. For more information, see the Node Inputs article.
  • Node Output: Output channels are set to be of type: Annotation by default.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

To use the predict node, connect it to another node with the following inputs:

  1. Item & annotation[]: Each predict node execution will initiate one execution of this node.
  2. Item & annotation: Each predict node execution will initiate N executions of this node, according to the number of items in the annotation list. Each execution will run with an item-annotation pair.
  3. Item/Annotation: Each predict node execution will initiate one execution of this node.
  4. Annotation[]: Each predict node execution will initiate one execution of this node.

For information on the Executions and Logs tabs, see the Node Details article.

Generate Model Node

The Generate model node incorporates a generative AI model node into your pipeline. The input is a prompt item that the model uses to generate a response, with the generated result returned as an annotation. You can easily adjust the model's configuration directly from the node settings, including options like system prompt, max tokens, temperature, and more.

Details

When you click on the Generate node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.

The Generate model node details are presented in four tabs as follows:

Config Tab:

  • Node Name: By default, Generate is displayed as name. You can change it if required.
  • Model: Select a Generate model from the list, either Set a fixed model or Set variable (Type: model) that can be updated dynamically during pipeline execution. If there is no model available, click on the Install Foundation Model link to install it from the Marketplace. Also, you can click on the model and application name to view more information.
  • Model Configuration: You can easily adjust the model's configuration directly from the node settings, including options like:
    • System prompt: The system prompt provides context or sets the tone for the model's responses. It acts as an initial instruction or guidance that influences how the model generates its output. Click Edit to make changes in the System Prompt.
    • Max tokens: It controls the maximum number of tokens (words or word pieces) the model can generate in response to a given prompt. By default, 1024 is displayed.
    • Temperature: It controls the randomness of the model's output. It influences how creative or conservative the generated text will be. The default temperature is set to 0.5, which strikes a balance between deterministic and random output. A lower temperature (closer to 0) makes the model more focused and deterministic, often generating more predictable responses. A higher temperature (closer to 1) increases randomness, leading to more diverse and creative responses.
    • Additional Parameters: Click Edit to make changes in the model configuration parameters.
  • Input: By default, Type: Item is selected.
  • Output: By default, Type: Item and Annotations are available.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

For information on the Executions, Logs, and Instances tabs, see the Node Details article.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.


Train Model

The Train Model node conducts the training of a newly created model version, optimizing its parameters using labeled data. You can include this AI model training node in your pipeline, using an untrained model created from a dataset as input. After training is complete, the trained model seamlessly continues to operate in the subsequent stages of the pipeline. The Train Model node allows you to trigger the train() function for any models that have a created status and are accessible to you.

Details

When you click on a Train Model node, its details, such as Configuration, Executions, Logs, and available Actions are shown on the right-side panel.

Set an Integration

A notification with a link to set an integration will appear if no integration is set in the node.

The Train Model node details are presented in four tabs as follows:

Config Tab:

  • Node Name: By default, Train Model is displayed as name. You can change it accordingly.
  • Model Application: It allows you to select an installed trainable-model for the training process. If there is no model available, click on the Install Foundation Model link to install a trainable model from the Marketplace. Once the model is installed, select it from the list.
  • Input: By default, Type: Model is selected. The input for the node is a model that you want to train.
  • Output: By default, Type: Model is selected. The output of the node is the trained model. The status will be updated to Trained.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

For information on the Executions, and Logs tabs, see the Node Details article.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.

Evaluate Model

The Evaluate Model node creates predictions for a test dataset from the newly trained model and compares it against the ground truth annotations. The output is the same model used to generate predictions. During function execution, scores are uploaded to the platform to be available down the pipeline (i.e. for the “compare models” node). In this node, the performance of the trained model is evaluated using predefined evaluation metrics or custom evaluation logic.

Details

When you click on an Evaluate Model node, its details, such as Configuration, Executions, Logs, and available Actions are shown on the right-side panel.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.

The Evaluate Model node details are presented in four tabs as follows:

Config Tab

  • Node Name: By default, Evaluate Model is displayed as name. You can change it accordingly.
  • Model Application: It allows you to select an installed model for the evaluation process. If there is no model available, click on the Install Foundation Model link to install a model from the Marketplace. Once the model is installed, select it from the list.
  • Node Input: By default, the following are selected:
    • Type: Model: The trained model used for evaluation.
    • Type: Dataset: The dataset used to do the evaluation of the trained model. For example, the ground truth for the active learning pipeline.
    • Type: JSON (filters): The filter used to do the evaluation of the trained model.
  • Node Output: By default, the following are selected:
    • Type: Model: The output model has been evaluated.
    • Type: Dataset: The dataset that has been evaluated by using the trained model.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

For information on the Executions and Logs tabs, see the Node Details article.


Embeddings Model

The Embedding node allows you to integrate any embedding model into your pipeline, either directly from the Dataloop marketplace or using your own custom model. Extract embeddings from your data for tasks such as similarity search, clustering, data curation, RAG pipelines, and more.

Details

When you click on the Embeddings node, its details, such as Configuration, Executions, Logs, Instances, and available Actions are shown on the right-side panel.

The Embeddings model node details are presented in four tabs as follows:

Config Tab:

  • Node Name: By default, Embeddings is displayed as name. You can change it if required.
  • Model: Select an Embedding model (listing all model entities with ‘embed’/'embed_items' function in the dpk.module that are available in the project) from the list, either Set a fixed model or Set a variable (Type: model) that can be updated dynamically during pipeline execution. If there is no model available, click on the Install Foundation Model link to install it from the Marketplace. Also, you can click on the model and application name to view more information.
  • Input: By default, Type: Item is selected.
  • Output: By default, Type: Item and JSON are available.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

For information on the Executions, Logs, and Instances tabs, see the Node Details article.
For the actions available on each node in the right-side panel, see the Pipeline Node Actions.


Compare Models

The 'Compare Models' node performs a comparison between two trained model versions based on their evaluation results using the same test set or the metrics generated during the training process.

This node requires the following mandatory inputs: a comparison configuration (in JSON format) and two models for comparison—one designated as the Previous model and the other as the New model. The New model is tested, and if it demonstrates superior performance, it will be output as 'Update model,' indicating readiness for deployment. If not, it will be labelled as 'Discard.' For more details, refer to our Active Learning documentation.

The parameters that need defining are:

  • Node Inputs:
    • previous_model: By default, Type: Model is selected.
    • new_model: By default, Type: Model is selected.
    • compare_config: The configurations for the comparison (JSON): By default, Type: JSON is selected.
    • dataset: By default, Type: Dataset is selected.
  • Output:
    • winning_model: By default, Type: Model is selected. Available statuses are Update Model and Discard.
      Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

Create New Model

The 'Create New Model' node allows you to generate a new model version by cloning an existing model, making it ready for fine-tuning.

This node requires the following inputs: a base model to clone (limited to trained or deployed models), model configurations (in JSON format), and datasets along with subsets (DQL filters) for training and validation. Inputs can be provided through parameters, either as fixed values or dynamic variables, or via node connections.

Upon execution, the node will produce the new model as output. For more details, refer to our Active Learning documentation.

The parameters that need defining are:

  • Node Inputs:
    • base_model - the model to clone:
    • dataset - the dataset to train on
    • train_subset - the DQL query for the subset of training items (JSON)
    • validation_subset - the DQL query for the subset of validation items (JSON)
    • model_configuration - the model configurations to use for training (JSON)
  • Output:
    • new_model: The newly created model.
    • base_model: the model to clone.
      Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

Model Data Split

The 'Model Data Split' node is a powerful data processing tool designed to streamline the process of dividing your data into subsets during runtime. This node allows you to easily segment your ground truth data into training, validation, and test sets, making data preparation more efficient.

You can define the desired distribution of these subsets, and the Data Split node will automatically assign each item to its respective subset by applying metadata tags, ensuring a smooth and organized data splitting process.

The parameters that need defining are:

  • Input: By default, Type: Item is selected.
  • Output: By default, Type: Item is selected.
  • Trigger (Optional): An Event/Cron trigger can be set on this node, enabling you to initiate the pipeline run from this specific point.

Dynamic Model Nodes

Dynamic Model Nodes are a versatile component within the Dataloop's pipeline that allows you to interact with machine learning models dynamically. These nodes provide an adaptable interface for selecting and performing various operations supported by the model, such as prediction, generate, or embedding.

The flexibility of Dynamic Model Nodes lies in their ability to adapt to different tasks within your workflow. Whether you need to generate predictions from new data or extract embeddings for further analysis, these nodes make it easy to switch between functions without needing to reconfigure the entire model.

  • Predict Operation: This function is typically used for generating outputs based on new input data. For example, if your model is designed for image classification, the "Predict" operation would produce the most likely category or label for a given image.
  • Embed Operation: This function is used to create embeddings, which are compact, numerical representations of input data. These embeddings can be useful for various tasks, including clustering, similarity search, or as inputs to other models for more complex processing.
  • Generative AI Operation: The Generative AI Operation refers to the functionality within a machine learning model that is specifically designed to create new, original content rather than merely analyzing or predicting outcomes based on existing data.

Install Dynamic Model Nodes: You can add Dynamic Model Nodes by clicking the Plus icon on the left-side panel and install them from the marketplace model library. If needed, you can also remove them from the list at any time.


Dynamic Embeddings Nodes

The Dynamic Embeddings Nodes are the shortcuts in the pipelines that provide a convenient way to interact with ML Embeddings models. It allows users to quickly choose and apply different operations that the model supports, such as Predict or Embed.

When configuring these nodes, you will find an Operation field in the Node Configuration tab. This field allows you to select which operation you'd like the node to perform. If the ML model you're working with supports both Predict and Embed operations, the Predict operation will be selected by default. However, you can manually switch to the Embed operation if that better suits your needs.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.


Dynamic Generative Model

The Dynamic Generate Nodes are the shortcuts in the pipelinesPipelines that provide a convenient way to interact with ML Generative models. It allows users to quickly select and apply different operations that the model supports, such as Generate or Embed.

When configuring these nodes, you will find an Operation field in the Node Configuration tab. This field allows you to select which operation you'd like the node to perform. If the ML model you're working with supports both Generate and Embed operations, the Generate operation will be selected by default. However, you can manually switch to the Embed operation if that better suits your needs.

For the actions available on each node in the right-side panel, see the Pipeline Node Actions.