RLHF JSON Format

Prev Next

This page describes RLHF JSON, a data format for describing RLHF data in JavaScript Object Notation (JSON).

For more information on RLHF Studio, see the RLHF Studio.

The RHLF Studio Item Format

It is built from two layers: a file layer and an annotation layer.
The file contains a list of prompts, and the annotations layer contains a list of responses per prompt.

File or Prompt Layer Format

The following format is a data structure example of a file that contains two prompts. The first prompt contains text and an image, and the second prompt contains text only.

{
	"shebang": "dataloop",
	"metadata": {
		"dltype": "prompt"
	},
	"prompts": {
		"prompt1": [
			{
				"mimetype": "application/text",
				"value": "What animal is in this image?"
			},
			{
				"mimetype": "image/jpeg",
				"value": "https://gate.dataloop.ai/api/v1/items/6489600c8d5a1c350e55116a/stream"
			}
		],
		"prompt2": [
			{
				"mimetype": "application/text",
				"value": "What is the eye color of this cat?"
			}
		
		]
	}
}

Annotations or Response Layer Format

Data structure of a response.

{
  "id": "64899218661aa36a1744112c",
  "datasetId": "64885c4d71e85c4f03c3758c",
  "url": "https://rc-gate.dataloop.ai/api/v1/items/64899218661aa36a1744112c",
  "dataset": "https://rc-gate.dataloop.ai/api/v1/datasets/64885c4d71e85c4f03c3758c",
  "createdAt": "2023-06-14T10:10:32.863Z",
  "dir": "/image_prompts",
  "filename": "/image_prompts/flyingdog.json",
  "type": "file",
  "hidden": false,
  "metadata": {
    "system": {
      "encoding": "7bit",
      "isBinary": false,
      "mimetype": "application/json",
      "originalname": "flyingdog.json",
      "refs": [],
      "shebang": {
        "dltype": "prompt"
      },
      "size": 161,
      "taskStatusLog": []
    }
  },
  "name": "flyingdog.json",
  "creator": "lior@dataloop.ai",
  "stream": "https://rc-gate.dataloop.ai/api/v1/items/64899218661aa36a1744112c/stream",
  "thumbnail": "https://rc-gate.dataloop.ai/api/v1/items/64899218661aa36a1744112c/thumbnail",
  "annotations": [
    {
      "id": "64899219bf9b191be0acdd2c",
      "datasetId": "64885c4d71e85c4f03c3758c",
      "itemId": "64899218661aa36a1744112c",
      "url": "https://rc-gate.dataloop.ai/api/v1/annotations/64899219bf9b191be0acdd2c",
      "item": "https://rc-gate.dataloop.ai/api/v1/items/64899218661aa36a1744112c",
      "dataset": "https://rc-gate.dataloop.ai/api/v1/datasets/64885c4d71e85c4f03c3758c",
      "type": "binary",
      "label": "q",
      "coordinates": "https://rc-gate.dataloop.ai/api/v1/items/648992159b8b5e823eda9972/stream",
      "metadata": {
        "system": {
          "automated": true,
          "promptId": "first"
        },
        "user": {
          "annotation_type": "prediction",
          "model": {
            "confidence": 0.9,
            "name": "model1"
          },
          "stream": true
        }
      },
      "creator": "lior@dataloop.ai",
      "createdAt": "2023-06-14T10:10:33.092Z",
      "updatedBy": "lior@dataloop.ai",
      "updatedAt": "2023-06-14T10:10:33.092Z",
      "hash": "64885c4d71e85c4f03c3758c_64899218661aa36a1744112c_q_lior@dataloop.ai",
      "source": "sdk"
    }
  ],
  "annotationsCount": 1,
  "annotated": true
}

RLHF JSON Fields Description

Key Name Definition Parent Key
id Unique identifier for the item Root
datasetId Identifier for the dataset containing the item Root
url API URL to access the item Root
dataset API URL to access the dataset Root
createdAt Timestamp of when the item was created Root
dir Directory path of the item Root
filename File name of the item Root
type Type of the item (e.g., file) Root
hidden Boolean indicating if the item is hidden Root
metadata Metadata associated with the item Root
system System-related metadata metadata
encoding Encoding format of the file system
isBinary Boolean indicating if the file is binary system
mimetype MIME type of the file system
originalname Original name of the file system
refs References to related items system
shebang Additional metadata related to file type system
dltype Type of data stored in the file (e.g., prompt) shebang
size Size of the file in bytes system
taskStatusLog Log of task statuses related to the item system
name Name of the item Root
creator Email of the user who created the item Root
stream API URL to stream the item Root
thumbnail API URL to access the thumbnail Root
annotations List of annotations associated with the item Root
id Unique identifier for the annotation annotations
datasetId Identifier of the dataset associated with the annotation annotations
itemId Identifier of the item being annotated annotations
url API URL to access the annotation annotations
item API URL to access the annotated item annotations
dataset API URL to access the dataset of the annotation annotations
type Type of annotation (e.g., binary) annotations
label Label assigned to the annotation annotations
coordinates API URL to access coordinates of the annotation annotations
metadata Metadata related to the annotation annotations
system System metadata for annotation metadata
automated Boolean indicating if annotation was automated system
promptId Identifier for the prompt associated with annotation system
user User-related metadata metadata
annotation_type Type of annotation (e.g., prediction) user
model Model-related metadata user
confidence Confidence score of the model's prediction model
name Name of the model model
stream Boolean indicating if the annotation has a stream user
creator Email of the user who created the annotation annotations
createdAt Timestamp when annotation was created annotations
updatedBy Email of the user who updated the annotation annotations
updatedAt Timestamp when annotation was last updated annotations
hash Unique hash identifier for annotation annotations
source Source of the annotation (e.g., SDK) annotations
annotationsCount Total count of annotations for the item Root
annotated Boolean indicating if the item has annotations Root