RLHF Prompt Studio
  • 03 Sep 2024
  • Dark
    Light
  • PDF

RLHF Prompt Studio

  • Dark
    Light
  • PDF

Article summary

Overview

As generative AI grows in popularity and rapidly evolves, so does the need to fine-tune models for specific commercial needs, training them on proprietary data to extend their base capabilities.


RLHF Studio

Dataloop’s RLHF studio (Reinforcement Learning Human Feedback) enables prompt engineering, allowing annotators to provide their feedback over responses to prompts. Both prompts and responses can be any type of item, for example, text or an image, and soon we will also support video and audio.

The purpose of the RLHF Studio is to enable organizations to fine-tune their generative AI models. The studio supports multiple prompts and responses, organized in a sequential chat flow, where annotators can provide feedback at any stage, rank the best response, and offer necessary feedback to enhance the model.

Once you open the JSON file, the RLHF Studio is displayed, and it has two sections: Conversation and Feedback.

For more information on the JSON format, see the RLHF JSON document.


Conversation

This section contains prompts and responses. The Text viewer for both Prompts and Responses supports Markdown formatting in addition to standard text. To create well-structured content with elements like headings, bold, italics, and other formatting options, use Markdown format to ensure the text is displayed correctly.

For example,

  • # Heading 1 -> Heading 1
  • ## Heading 2 -> Heading 2
  • **bold** -> bold

For more information on formatting, see the Markdown Syntax page.

Prompts

A prompt is an instruction, question, stimulus, or cue given to the model to provide responses. Dataloop supports more than one prompt for generating model responses. The project owners or managers upload the prompts' data to the dataset.

Response

A response is the action or answer that occurs as a result of a prompt. Dataloop supports more than two model responses; hence, a prompt might have multiple responses that were generated by different model versions or completely different models.

Upload Responses

You can connect your models to the Dataloop platform or use the SDK to upload responses.


Feedback

Enables annotators to answer questions and provide feedback on the responses generated by the model. The questions in the feedback section are defined and customized by the project owner or manager through the recipe.
The list of responses is displayed based on the selection made in the Conversation > Response section.


RLHF Studio Keyboard Shortcuts

General Shortcuts

ActionKeyboard Shortcuts
SaveS
DeleteDelete
UndoCtrl + Z
RedoCtrl + Y
Zoom In/OutScroll
Change BrightnessVertical Arrow + M
Change ContrastVertical Arrow + R
PanCtrl + Drag
Tool Selection0-9 (1-6 for Segmentation Studio)
Move selected annotationsShift + Arrow Keys
Previous ItemLeft Arrow
Next ItemRight Arrow
Add Item DescriptionT
Mark Item as DoneShift + F
Mark Item as DiscardedShift + G
Enable Cross Grid Tool HelperAlt + G
Hold G to show Cross Grid MeasurementsG
Hide/Show AnnotationsH
Show Unmasked PixelsCtrl + M
Hide/Show Annotation ControllersC
Set Object ID menuO
Toggle pixel measurementP
Use tool creation modeHold Shift
Copy annotations from previous itemShift + V


Work in the RLHF Studio

Project owners or managers perform the following actions for the RLHF Studio:

  • Upload prompts data
  • Set feedback questions.
  • Approve feedback questions.
  • Report an issue.

Annotators are getting tasks to validate the models' responses (predictions). Annotators perform the following actions in the RLHF Studio to validate the responses:

  • Verify the prompts and their responses.
  • Rank the responses.
  • Provide answers to the questions in the feedback section.
  • Add comments
  • Send the answers for review.
Using SDK

Use the SDK to upload prompts and responses.

Set Feedback Questions

To set the questions presented in the feedback section, the project owner or developer can edit the attributes of the recipe.
Dataoop supports multiple question types such as scale, multiple choice, yes-no questions, and open questions, and soon we will support ranking.

Rank the Model's Responses

Dataloop RHLF Studio allows the annotator to rank the model responses. On the Response section, click on the dropdown and select a response rank (0 to 3) from the list.

Answer Feedback Questions

Annotators can answer multiple question types, such as scale, multiple choice, yes-no questions, and open questions, available in the Feedback section of the RLHF Studio. Also, you can use the comment feature to describe the feedback answer. Once you complete the question, click the Save icon.

Identify the Best Response

Dataloop allows you to select the best responses at the prompt level.
For example, if a prompt has responses A, B, and C generated by different models, the annotator can provide information on which response is the best in the Feedback section.
Once you are done, click the Save icon.

Note

The best response selection does not validate the response ranking. With the required permission, you can customize the validation using JavaScript, if required.

Report an Issue

If you identify any issue in the response feedback provided by the annotators, click on the Open Issue icon to report it as an issue. You can also use the comment feature to describe the issue.