Model Evaluation
  • 03 Jun 2024
  • Dark
    Light
  • PDF

Model Evaluation

  • Dark
    Light
  • PDF

Article summary

Overview

After training a model, you can evaluate the quality and accuracy of the new model using Dataloop's Model Management tool.

The Model Management provides model evaluation metrics to help you determine the performance of your models, such as precision and recall metrics. It calculates the evaluation metrics by using any specified test set.

Evaluating a model version in the Dataloop system creates a dedicated FaaS service that uses your data to create an evaluation execution.


Evaluation Tab Details

The list of models and its details are available in the following table:

FieldsDescription
NameThe name of the model. Click on it to open the model's detailed page.
Created AtThe timestamp of the model's evaluation process creation.
StatusThe status of the model evaluation. To learn more, refer to the Models Evaluation Status section.
ProgressThe progress bar of the model evaluation process.
DurationThe time duration taken for the evaluation process.
ServiceThe name of the evaluation service. Click on it to open the service page.

Search and Filter Models in Evaluation

Dataloop platform allows you to search models that are in evaluation using Model Name and Model ID, providing users with the capability to refine and narrow down the displayed models.
Use the Search field to search and filter the models.

Filter Models by Evaluation Status

Dataloop platform allows you to search models that are in evaluation by Status. Use the Filter by Training Status field to filter models by status. To learn more, refer to the Models Evaluation Status section.


Models Evaluation Status

A model evaluation status is indicated in the evaluation table. The status on a model version can be one of the following:

  • Created: The model evaluation process has been created.
  • In Progress: The model evaluation process has been started and in progress.
  • Success: The model evaluation has been successfully completed.
  • Failed: Model evaluation failed. This could be because of a problem with the model adaptor, the evaluation method, configuration or compute resources issues. Refer to logs for more information.
  • Aborted: The execution process of the model evaluation has been aborted. Your model's training stopped unexpectedly, maybe due to an unforeseen issue.
  • Terminated: The model evaluation has been terminated. User intentionally ended the model's evaluation.
  • Rerun: The model evaluation status is at rerun. It might need to rerun the evaluation process.
  • Pending: The model version is pending evaluation. Evaluation will begin when there are available resources on the evaluation service (begins with 'mgmt-train' and ends with the model-id of the respective model-architecture.

Evaluate a Model

Prerequisites

To be able to evaluate a model, you first new to make sure:

  1. The models are located in any of your projects. Public models cannot be evaluated directly (you first need to import them yo your project).
  2. The models must be Pre-Trained, Trained, or Deployed.
  3. The model owns predict() function.

To evaluate a Pre-Trained, Trained, or Deployed model version:

  1. From the Models -> Versions tab, search and locate the model-version you would like to start evaluating.
  2. Select the Evaluate option from the 3-dots actions button of the model. A Model Version Evaluation window is displayed.
  3. In the Evaluation:
    1. Select a Dataset from the list to use it for the evaluation. By default, the model's dataset is displayed.
    2. Test Subset:
      1. Select items marked with Test tag. Click Show Filter to view the filter.
      2. Select a saved DQL filter.
      3. Select a specific folder to define the data scope for the test subset.
  4. Click Next button. The Service Settings section is displayed.
  5. Review the following sections and hover over and click on each to edit them.
    1. Secrets: The secret values are displayed, if the secret is defined in the dpk/dpkConfig. Click on the Edit Secret to select multiple secrets from your organization, or click Edit Secret -> Add New Secret to create a new secret for the model. Make sure you have enough permission to view or add a secret.
    2. Service Configuration: Click on the Edit Configuration to make changes to the following fields, if needed:
      1. Machine Types:
        1. Change the machine type from the list.
        2. Click on the Machine Settings to enable Preemptible Machine.
      2. Docker Image
      3. Concurrency: By default, 1 is displayed.
      4. Scaling Type:
        1. Autoscale: A component of the Dataloop FaaS system that automatically adjusts the number of serverless computing Resources allocated to the FaaS Service based on current demand. The purpose of the Autoscaler is to ensure that the system can handle peaks in demand without over provisioning Resources and incurring unnecessary costs.
          1. Min Instances: By default, 0. You cannot make changes.
          2. Max Instances: By Default, 1. You cannot make changes.
          3. Queue Size: By default, 1000.
        2. Fixed size: By default, 1 instance is selected. You cannot make changes.
    3. Execution Configuration: Click on the Execution Configuration to make changes on the following fields:
      1. SDK Version:
      2. Execution Timeout: By default, 3600 is displayed.
      3. Drain Time: By default, 600 is displayed.
      4. On Reset Action: Select one of the following when you reset the execution.
        1. Fail Execution.
        2. Rerun Execution.
      5. Max Attempts: Maximum number of execution attempts allowed. By default, 3 is displayed.
      6. Rerun executions as process: Enable to make the execution as a process.
  6. Once you complete, click Evaluate Model. A confirmation message is displayed.

Evaluation Service & Resources

For every public Model-Architecture, there's a matching training service with the respective configuration. This service is then used to train all model-versions derived from this architecture. For example - all ResNet based version are trained using the same service.

The service is created upon the first installation - it is not pre-installed in a project, therefore you won't be able to see or configure it until you start your first training process.
The training service naming convention is "mgmt-train"+model-ID, when the model-ID is the model-architecture ID (e.g. ResNet, YoloV5, etc.).

Any private model added to the Dataloop platform must facilitate a Train function that performs the training process. When training a model, a Dataloop-service will be started with the model, using its Training function.

As any service in Dataloop's FaaS, it has its compute settings. 2 Settings worth mentioning here are:

  • Instance type - by default, all training services use GPU instances

  • Auto-scaling - set to 1. Increase the auto-scaler parameter if you intend to simultaneously train multiple snapshots.


Evaluation Metrics

While training is in process, metrics are recorded and can be viewed from the Explore tab. To learn more, refer to the Training Metrics article.


Copy Evaluation Model ID

  1. Go to the Models page from the left-side menu.
  2. Select the Evaluation tab.
  3. Find the model to copy the ID.
  4. Click on the three-dots and select the Copy Model ID from the list.

Copy Execution ID of a Model in Evaluation

  1. Go to the Models page from the left-side menu.
  2. Select the Evaluation tab.
  3. Find the model to copy the execution ID.
  4. Click on the three-dots and select the Copy Execution ID from the list.

Abort the Evaluation Process of a Model

  1. Go to the Models page from the left-side menu.
  2. Select the Evaluation tab.
  3. Find the model to abort the evaluation process.
  4. Click on the three-dots and select the Abort Evaluation from the list.

Error and Warning Indications

When a service encounters errors, such as Crashloop, ImagePullBackOff, OOM, etc., or when the service is paused, pending executions will get stuck in the queue (in "created" status). You can click or hover over the error or warning icon to view the details.

  • An error icon is displayed when a service fails to start running. The service link is provided, and you click on it to view the respective executions tab of the service. The pending evaluation execution status will be Created, In progress, or Rerun. For example,
  • A warning icon is displayed when a service is inactive. The service link is provided, and you click on it to view the respective service page. The pending evaluation execution status will be Created or Rerun. For example,