- 03 Sep 2024
- Print
- DarkLight
- PDF
RLHF Prompt Studio
- Updated On 03 Sep 2024
- Print
- DarkLight
- PDF
Overview
As generative AI grows in popularity and rapidly evolves, so does the need to fine-tune models for specific commercial needs, training them on proprietary data to extend their base capabilities.
RLHF Studio
Dataloop’s RLHF studio (Reinforcement Learning Human Feedback) enables prompt engineering, allowing annotators to provide their feedback over responses to prompts. Both prompts and responses can be any type of item, for example, text or an image, and soon we will also support video and audio.
The purpose of the RLHF Studio is to enable organizations to fine-tune their generative AI models. The studio supports multiple prompts and responses, organized in a sequential chat flow, where annotators can provide feedback at any stage, rank the best response, and offer necessary feedback to enhance the model.
Once you open the JSON file, the RLHF Studio is displayed, and it has two sections: Conversation and Feedback.
For more information on the JSON format, see the RLHF JSON document.
Conversation
This section contains prompts and responses. The Text viewer for both Prompts and Responses supports Markdown formatting in addition to standard text. To create well-structured content with elements like headings, bold, italics, and other formatting options, use Markdown format to ensure the text is displayed correctly.
For example,
# Heading 1
-> Heading 1## Heading 2
-> Heading 2**bold**
-> bold
For more information on formatting, see the Markdown Syntax page.
Prompts
A prompt is an instruction, question, stimulus, or cue given to the model to provide responses. Dataloop supports more than one prompt for generating model responses. The project owners or managers upload the prompts' data to the dataset.
Response
A response is the action or answer that occurs as a result of a prompt. Dataloop supports more than two model responses; hence, a prompt might have multiple responses that were generated by different model versions or completely different models.
You can connect your models to the Dataloop platform or use the SDK to upload responses.
Feedback
Enables annotators to answer questions and provide feedback on the responses generated by the model. The questions in the feedback section are defined and customized by the project owner or manager through the recipe.
The list of responses is displayed based on the selection made in the Conversation > Response section.
RLHF Studio Keyboard Shortcuts
General Shortcuts
Action | Keyboard Shortcuts |
---|---|
Save | S |
Delete | Delete |
Undo | Ctrl + Z |
Redo | Ctrl + Y |
Zoom In/Out | Scroll |
Change Brightness | Vertical Arrow + M |
Change Contrast | Vertical Arrow + R |
Pan | Ctrl + Drag |
Tool Selection | 0-9 (1-6 for Segmentation Studio) |
Move selected annotations | Shift + Arrow Keys |
Previous Item | Left Arrow |
Next Item | Right Arrow |
Add Item Description | T |
Mark Item as Done | Shift + F |
Mark Item as Discarded | Shift + G |
Enable Cross Grid Tool Helper | Alt + G |
Hold G to show Cross Grid Measurements | G |
Hide/Show Annotations | H |
Show Unmasked Pixels | Ctrl + M |
Hide/Show Annotation Controllers | C |
Set Object ID menu | O |
Toggle pixel measurement | P |
Use tool creation mode | Hold Shift |
Copy annotations from previous item | Shift + V |
Work in the RLHF Studio
Project owners or managers perform the following actions for the RLHF Studio:
- Upload prompts data
- Set feedback questions.
- Approve feedback questions.
- Report an issue.
Annotators are getting tasks to validate the models' responses (predictions). Annotators perform the following actions in the RLHF Studio to validate the responses:
- Verify the prompts and their responses.
- Rank the responses.
- Provide answers to the questions in the feedback section.
- Add comments
- Send the answers for review.
Use the SDK to upload prompts and responses.
Set Feedback Questions
To set the questions presented in the feedback section, the project owner or developer can edit the attributes of the recipe.
Dataoop supports multiple question types such as scale, multiple choice, yes-no questions, and open questions, and soon we will support ranking.
Rank the Model's Responses
Dataloop RHLF Studio allows the annotator to rank the model responses. On the Response section, click on the dropdown and select a response rank (0 to 3) from the list.
Answer Feedback Questions
Annotators can answer multiple question types, such as scale, multiple choice, yes-no questions, and open questions, available in the Feedback section of the RLHF Studio. Also, you can use the comment feature to describe the feedback answer. Once you complete the question, click the Save icon.
Identify the Best Response
Dataloop allows you to select the best responses at the prompt level.
For example, if a prompt has responses A, B, and C generated by different models, the annotator can provide information on which response is the best in the Feedback section.
Once you are done, click the Save icon.
The best response selection does not validate the response ranking. With the required permission, you can customize the validation using JavaScript, if required.
Report an Issue
If you identify any issue in the response feedback provided by the annotators, click on the Open Issue icon to report it as an issue. You can also use the comment feature to describe the issue.