Data Versioning
  • 28 Feb 2024
  • Dark
    Light
  • PDF

Data Versioning

  • Dark
    Light
  • PDF

Article summary

Dataloop enables you to manage your datasets and items, including functions like cloning, merging, moving, as well as refining and segmenting your files.

Clone Dataset or Items

You can clone either datasets or items along with their annotations or metadata.

Important
  1. You cannot clone the item status, such as approved, completed, discarded, etc.
  2. Cloned datasets are generated using the same recipe as the original ones.
  3. Do not make any changes to items during the cloning process. This includes actions such as adding, editing, or deleting annotations, or moving items, etc.

Clone the Dataset

To clone an entire dataset, follow these instructions:

  1. From the left-side menu, go to Data.
  2. Find the desired dataset from the list, and click on the three-dots icon.
  3. Choose Clone Dataset from the list.
  4. In the Clone Dataset/Items window, decide whether you want to clone items from an existing dataset or create a new one:
    1. Existing Dataset:
      1. Select a dataset from the list.
      2. Search for and select the folder within the dataset where you want to clone the dataset (root folder, subfolders, etc.).
    2. New Dataset:
      1. Enter a name for the new dataset.
  5. Choose your cloning options:
    1. Whether you want to clone with item annotations.
    2. Whether you want to clone with item metadata.
  6. Once you've configured your options, click Clone to initiate the cloning process. A confirmation message is displayed.

Clone Dataset's Items

Dataloop facilitates the cloning of items into target datasets. It's important to note that you can clone items:

  • From internal storage (e.g., Dataloop cloud storage) to internal storage.
  • From external storage (e.g., S3) to external storage, provided that the target storage also uses the same storage driver (e.g., using the same integration secret and storage driver pointing at the same location).

To clone an item, follow the steps:

  1. From the left portal menu, select Data.
  2. Click on the dataset in the list.
  3. Select a single or multiple item(s), and right-click or select File Actions > Clone from the list.
  4. In the Clone Dataset/Items window, decide whether you want to clone items from an existing dataset or create a new one:
    1. Existing Dataset:
      1. Select a dataset from the list.
      2. Search for and select the folder within the dataset where you want to clone the dataset (root folder, subfolders, etc.).
    2. New Dataset:
      1. Enter a name for the new dataset.
  5. Choose your cloning settings:
    1. Whether you want to clone with item annotations.
    2. Whether you want to clone with item metadata.
  6. Once you've configured your options, click Clone to initiate the cloning process.
Parent ID or Dataset ID of the Cloned items

After cloning an item, the metadata (JSON) of the cloned item will display both the parent item ID (srcItem) and parent dataset ID (srcDataset). However, in the Details tab, only the parent item ID is shown.

Merge Datasets

Dataloop provides the capability to merge datasets. The result of dataset merging depends on the degree of similarity or dissimilarity between the datasets.

  • Cloned Datasets: When datasets are cloned, their items, annotations, and metadata are merged. This means that you can have annotations from various datasets associated with the same item, allowing you to view and work with combined annotations on a single item.

Merging items from cloned datasets is feasible only if the items being merged originated from the same master item, meaning that the cloned items must both reference the same source.

  • Different datasets (not clones) with similar recipes: Items will be summed up, and similar items will be duplicated.
  • Datasets with different recipes: Datasets with different default recipes cannot be merged. To merge datasets, use the Switch Recipe option at the dataset level (accessible through the ellipsis icon) to align recipes between datasets.

To merge datasets, follow the instructions:

  1. From the left portal menu, select Data.
  2. Choose the datasets you want to merge from the list.
  3. Click Merge Datasets.
  4. In the Merge Datasets window, enter a name for the newly merged dataset in the Dataset Name field.
  5. Indicate whether you wish to merge With Items Annotations? and/or With Items Metadata? (i.e., including information added by annotators).

Upon successful completion of the merge process, the newly created dataset will be listed with the Dataset type labeled as Merge.

Move Folders or Items

  1. From the left portal menu, select Data.
  2. Click on the dataset in the list.
  3. Select a single or multiple item(s), and right-click or select File Actions > Clone from the list.
  4. Select a folder from the list.
  5. Click Move.
You cannot move datasets.

What's Next