Schema Based Search
  • 19 Jun 2024
  • Dark
    Light
  • PDF

Schema Based Search

  • Dark
    Light
  • PDF

Article summary

Overview

Filters are integral components of the Dataset Browser, providing users with the capability to refine and narrow down the displayed items based on specific criteria. These filters offer a powerful tool for managing and exploring large datasets efficiently.


Filtering Criteria

The Data Browser allows you to search items using its data. You can refer to the following sections to learn how to search and filter items in your dataset.

You can utilize the filter functionality by specifying criteria related to the Items, annotations, and tasks' data associated with the item.

  • Item Data: By default, the Items filter is enabled. It includes details like creation date, creator, or any other relevant information associated with the item. The Items filter is permanent for all the search queries.
  • Annotation Data (Optional): Annotation data is information related to any annotations applied to the dataset items. Annotations could include labels, classifications, or any additional data that has been added to enhance the understanding or categorization of items within the dataset. You can deselect the Annotation filter, if required.
  • Tasks (Optional): You can filter the items in the dataset by using the Task's ID and Name. If necessary, you can remove the Tasks filter. When you activate the tasks filter, it will turn off the Folders based view option in the left-side panel.

Also, click on Add Filters to access additional search filter applications.


Search Query Variables

Filter Data TypeFilter VariableDescriptionConditionsData Types
ItemsannotatedFilter items based on whether they are annotated or not.=, !=Boolean values (true or false)
ItemsannotationsCountFilter items based on the number of annotations.=, !=, >, >=, <, <=, IN, NOT-IN, EXIST, DOESNT-EXISTString
ItemscreatedAtFilter items based on the date and time of their creation.=, !=, >=, <=, <, >dd/mm/yyyy
ItemscreatorFilter items based on the creator of the item.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
ItemsdatasetidFilter items based on the dataset ID.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
ItemsdescribedFilter items based on the presence or absence of a description.=, !=Boolean values (true or false)
ItemsdescriptionFilter items by searching for those that contain a specific part of the description text.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
ItemsdirFilter items based on their folder location within the dataset.=, !=, IN, NOT-INString
ItemsFilePathFilter items by the file location.=, !=, IN, NOT-INString
ItemhiddenFilter out items that are marked as hidden.=, !=, IN, NOT-INBoolean values (true or false)
ItemsItemHeightFilter items based on the height value of each item.=, !=, IN, NOT-INString
ItemsItemIDThe unique ID of the item.=, !=, IN, NOT-INString
ItemsItemWidthFilter items according to the width value of each item.=, !=, IN, NOT-INString
ItemsMediaTypeThe filter allows searching based on their media types, such as video or image.=, !=, IN, NOT-INString
ItemsmetadataFilter items based on metadata information (for example, metadata.system and metadata.description) contained in the items' JSON file.=, !=, >=, <=, <, >, IN, NOT-INString
ItemsModelTestSetFilter items that are designated for testing the model.=, !=Boolean values (true or false)
ItemsModelTrainSetFilter items that are designated for training the model.=, !=Boolean values (true or false)
ItemsModelValidationSetFilter items that are designated for validating the model.=, !=Boolean values (true or false)
ItemsnameFilter items based on their name.=, !=, IN, NOT-INString
ItemstypeFilter items based on their types, such as a file or folder.=, !=, IN, NOT-INString
ItemsupdatedAtFilter items based on the item's last updated date.=, !=, >, >=, <, <=, EXIST, DOESNT-EXISTString
ItemsupdatedByFilter items based on the email ID of the user who last updated the item.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsConfidenceFilter annotations based on its confidence level. The Confidence is the measure of certainty or accuracy in the labels assigned to data, typically expressed as a percentage. Higher confidence scores indicate greater reliability of the annotations.=, !=, >, >=, <, <=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsannotationIdFilter annotation based on the annotation ID.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsattributeIdFilter annotation based on the attribute ID.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsattributeNameFilter annotation based on the attribute name.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationscreatedAtFilter annotations based on the date and time of their creation.=, !=, >=, <=, <, >dd/mm/yyyy
AnnotationscreatorFilter annotations by the user's email ID who created the annotation.=, !=, IN, NOT-INString
AnnotationsdatasetIdFilter annotations based on the dataset ID.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsidFilter annotations based on the annotation ID.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsitemIdFilter annotations based on the item ID.=, !=, IN, NOT-INString
AnnotationslabelFilter annotations based on the labels.=, !=, IN, NOT-INString
AnnotationsmetadataFilter items based on annotation metadata information, such as metadata.system.attributes (annotation's attributes data) and metadata.system.status (annotation's). For more information, see the annotation metadata.=, !=, IN, NOT-INString
AnnotationsmodelNameFilter annotations based on the model names.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
AnnotationsparentIdFilter the data using the ID of the parent annotation.=, !=, IN, NOT-INString
AnnotationssourceFilter the data using the by where the annotation was created: UI/SDK.=, !=, IN, NOT-INString
AnnotationstypeFilter the data using the types of annotations.=, !=, IN, NOT-INString
AnnotationsupdatedByFilter annotations based on the email ID of the user who last updated the item.=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
TasksTaskIDFilter the data using the Tasks' ID=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString
TasksTaskNameFilter the data using the Tasks' Name=, !=, IN, NOT-IN, EXIST, DOESNT-EXISTString

Search Items by Item's Data

  1. In the Data Browser, click on the Items field.
  2. Select or enter the required search query.
  3. Click Search to view the search result.

For more search queries, see the How to Search? article.


Search Items by Annotation's Data

  1. In the Data Browser, click on the Annotation field.
  2. Select or enter the required search query.
  3. Click Search to view the search result.

For more search queries, see the How to Search? article.


Search Items by Task's Data

  1. In the Data Browser, add the Tasks filter.
  2. In the Tasks' field, select task's ID or Name criteria from the list.
  3. Enter the value and click Search to view the search result.

For more search queries, see the How to Search? article.


Search Items by Free Text - CLIP Based

NLP (Natural Language Processing) with CLIP (Contrastive Language-Image Pre-Training) refers to the integration of language understanding capabilities with visual recognition. CLIP is a neural network model that leverages NLP techniques to process and understand text while simultaneously analyzing and comprehending images. This enables the model to interpret and align textual descriptions with corresponding visual content effectively.

You can input any descriptive text, and the model will retrieve images or data entries that best match the query, significantly enhancing the user experience by making the search process more intuitive and flexible. Additionally, you can use it alongside other search fields such as Annotations and Items to further refine and improve your search results.

  1. In the Data Browser, click on the Add Filters.
  2. Select the NLP (CLIP) from the list. A new field NLP (CLIP) is added.
  3. Enter the required text in the NLP (CLIP) field. For example, "Image with red cars".
  4. Click Search to view the search result.

Filter Actions

The Dataloop platform allows you to customize and save search queries within data querying or search systems, designed to streamline and optimize user interactions with large datasets.

For more search filter queries, see the How to Search? article.


Filters Operands

The following operators are applied within and between filters, unless otherwise specified when manually modifying a DQL query.

Cross Filter Operand

A relationship between multiple filters in a single query is based on the AND operand.
For example, filtering by status Annotated AND the user john@doe.ai AND the annotation type: Box, will return items that are annotated, have Box annotation, and have john@doe.ai as an annotator in one or more of the annotations.

Inter-Filter Operand

The operand relationship between multiple values in a specific filter is based on the OR operator.
For example, filtering by labels with the values Person and Dog, the filter will return all items with annotations of either of these labels, not necessarily both at the same time.

Unsearchable Metadata

Dataloop platform enables a new functionality unsearchablePaths on the Dataset's schema where it allows users to specify certain keys or prefixes as blacklisted within the dataset's schema, effectively removing them from the searchable schema. Therefore:

  • Data under unsearchablePaths can't be queried using Dataloop Query Language (DQL), but it can be found in the item metadata level.
  • Users can access metadata at the item level, despite unsearchability.
  • unsearchablePaths act as prefixes, all keys under this prefix will not be searchable.
  • Removing an unsearchablePath automatically makes the existing paths under the removed unsearchable path searchable in the schema.

It helps users to overcome the regular metadata limitations (1024 chars on keys or values, max 100 keys, etc. refer to the Specifications for more information).

Important

When you add or remove a path that's not searchable in the dataset, it triggers an indexing process. During this time, you won't be able to make changes to the dataset schema or add new metadata values to dataset items until the indexing completes.

Make the Metadata Key Unsearchable

You will learn how to retrieve the dataset schema, mark specific keys as unsearchable, and revert them to searchable.
It helps you to overcome the errors received on metadata while uploading items.

Step 1: Get the Dataset Schema

A dataset's schema is the structure of what kind of information it holds, such as the names of different data fields (keys), and the type of data each field contains (e.g., text, number, date, etc.). It helps you understand how the data within the dataset is organized and what kind of data you can expect in each field.

First, you need to fetch the current schema of a dataset, which details the keys and their paths within the dataset structure.

dataset = dl.datasets.get(dataset_id='datasetId')
json = dataset.schema.get()

This code retrieves the schema of the dataset identified by 'datasetId'. The schema, returned as a dictionary, shows keys and their paths within the dataset.

Step 2: Add Keys (Path) to Unsearchable Paths

If certain metadata keys should not be searchable due to privacy concerns or irrelevance to search queries, you can add these keys to the list of unsearchable paths.

dataset = dl.datasets.get(dataset_id='datasetId')
success = dataset.schema.unsearchable_paths.add(paths=['metadata.key1', 'metadata.key2'])

Here, the add() method is used to make the paths metadata.key1 and metadata.key2 unsearchable. The method returns True if the paths are successfully added.

Remove Keys (Path) from Unsearchable Paths

If you decide that certain metadata keys should be searchable again, you can remove them from the list of unsearchable paths.

dataset = dl.datasets.get(dataset_id='datasetId')
success = dataset.schema.unsearchable_paths.remove(paths=['metadata.key1', 'metadata.key2'])

This step uses the remove() method to delete metadata.key1 and metadata.key2 from the list of unsearchable paths, allowing them to be searchable again. The method returns True if the paths are successfully removed.



What's Next