GenAI Evaluation Studio is a powerful, user-friendly tool for evaluating GenAI responses, provide feedbacks, etc. It enables you to design custom evaluation forms using the Multimodal Layout Builder and run structured assessments directly within the Dataloop platform.

Recipe Type - GenAI / Multimodal Recipe

Key Capabilities

This studio allows you to evaluate the GenAI output—structured as a .json item—within a custom-designed layout using the Multimodal Layout Builder.

These layouts represent structured evaluation forms that can include:
- Text inputs
- Dropdowns, radio buttons, and checkboxes
- Star ratings and sliders
- Multimodal content such as conversations, URLs, images, audio, and video
- Custom validations, JavaScript logic, and CSS styling
Flexible Assignment Modes:
- Unified Layout for all items
- Per-Item Layouts for varied formats
Integrated Annotations: Every evaluator action is stored as an annotation.
Validation Controls: Built-in and custom rules to ensure completeness.
Multimodal Support: Evaluate text, images, audio, video, and mixed formats in a single interface.

Use Cases

Model evaluation & benchmarking
Human-in-the-loop (HITL) reviews
Multi-modal model assessment
Data quality scoring
Feedback collection for fine-tuning

Workflow

Design the Layout: Create a Multimodal Layout Recipe in the Layout Builder to define the evaluation form.
Assign the Layout: Apply the layout at task level (Unified Layout) or per individual item (Per-Item Layout).
Upload Items: Import evaluation items manually or via SDK.
Run the Task: Annotators evaluate items in the GenAI Evaluation Studio using your custom form.
Export Results: Export annotations for reporting, analysis, or integration into downstream workflows.

Any modifications made within the Evaluation Studio are treated as annotations.
This includes actions such as providing a rating, selecting options, entering comments, or any other form of input captured during the evaluation process.

Use GenAI Evaluation Studio

The GenAI Evaluation Studio is a dedicated interface within Dataloop’s platform designed for reviewing, scoring, and validating outputs from GenAI models using structured forms and logic defined in a Multimodal Recipe.

Choosing the Right Layout Mode

Unified Layout for All Items (Default Approach): In this mode, all items in the task share the same layout recipe.
1. Use Case: Ideal for structured evaluations where every item follows a common annotation format.
2. Recommendation: Use this as your go-to method for most evaluation scenarios.
Different Layout Assignment per Items: In this approach, different items within the same task or dataset can have distinct layout recipes.
1. Use Case: Suitable for mixed evaluation scenarios, such as comparing chats, images, and structured responses within the same dataset.
2. Important Note: This overrides the layout connected to the task or dataset.
3. Reference: A full example script can be found under Layout Editor → Sample Data → </> icon (Python script generator).

Using a Unified Layout

All items share the same structure and evaluation format.

Example: Each item has one prompt, two images, and the evaluator selects the best image.

Create a Multimodal Recipe via Layout Builder.

Create a dataset and connect the recipe to it. If the dataset is already available, use the Switch Recipe option to change it to the newly created recipe.

Upload your GenAI items (JSON files) to the dataset. You can upload via manually or SDK. An example JSON structure is given below:

{
  "image": "https://upload.wikimedia.org/wikipedia/en/b/b9/MagrittePipe.jpg",
  "prompt": [
    {
      "role": "",
      "content": "Create an image of a pipe"
    }
  ]
}

If your items are in a CSV file, make sure to convert the CSV into JSON files as required.
Create a task with the same recipe.
Users with an Annotator roles can open the task in GenAI Evaluation Studio.
Perform the feedback and ratings as per the layout structure.
Export annotations when completed.

Using Different Layouts

Different items within the same dataset or task can have unique layouts.

Example: One prompt produces a text response, another produces an image.

Create a Multimodal Recipe via Layout Builder for each unique layout. You can use the Template Library to duplicate and modify similar layouts.

Create a dataset for the items, if it is not available.
If your items are in a CSV file, make sure to convert the CSV into JSON files as required.

Upload your GenAI items (JSON files) via SDK and include layout mentioned in the metadata as given below:

'system': {
            'shebang': {
                'dltype': 'evaluation-studio'
            },
            'evaluation': {
                'layoutName': '{INSERT RELEVANT LAYOUT RECIPE ID HERE}'
            }
        }

To receive the Recipe ID, read here.
An example JSON structure is given below:

{
  "image": "https://upload.wikimedia.org/wikipedia/en/b/b9/MagrittePipe.jpg",
  "prompt": [
    {
      "role": "",
      "content": "Create an image of a pipe"
    }
  ]
}

Create a task and you can choose any recipe. Even if a recipe is assigned at the task level, item metadata overrides it.
Users with an Annotator roles can open the task in GenAI Evaluation Studio.
Perform the feedback and ratings as per the layout structure.
Export annotations when completed.

Open the JSON file

Open the Dataset Browser and select the JSON file.
Double-click on the JSON file. The GenAI Multimodal Studio is displayed.

Make the required evaluation steps as per the structure. The structure will vary according to the GenAI recipe structure.
Click Save.

JSON Formats

You can export the items and items with annotations as JSON files.