- 21 Jan 2025
- Print
- DarkLight
- PDF
Schema Based Search
- Updated On 21 Jan 2025
- Print
- DarkLight
- PDF
Overview
Filters are integral components of the Dataset Browser, providing users with the capability to refine and narrow down the displayed items based on specific criteria. These filters offer a powerful tool for managing and exploring large datasets efficiently.
Filtering Criteria
The Data Browser allows you to search items using its data. You can refer to the following sections to learn how to search and filter items in your dataset.
You can utilize the filter functionality by specifying criteria related to the Items, annotations, and tasks' data associated with the item.
- Item Data: By default, the Items filter is enabled. It includes details like creation date, creator, or any other relevant information associated with the item. The Items filter is permanent for all the search queries.
- Annotation Data (Optional): Annotation data is information related to any annotations applied to the dataset items. Annotations could include labels, classifications, or any additional data that has been added to enhance the understanding or categorization of items within the dataset. You can deselect the Annotation filter, if required.
- Tasks (Optional): You can filter the items in the dataset by using the Task's ID and Name. If necessary, you can remove the Tasks filter. When you activate the tasks filter, it will turn off the Folders based view option in the left-side panel.
Also, click on Add Filters to access additional search filter applications.
Search Query Variables
To learn various search options using the schema search, see the Search Items article.
Filter Data Type | Filter Variable | Description | Conditions | Data Types |
---|---|---|---|---|
Items | annotated | Filter items based on whether they are annotated or not. | = , != | Boolean values (true or false ) |
Items | annotationsCount | Filter items based on the number of annotations. | = , != , > , >= , < , <= , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Items | createdAt | Filter items based on the date and time of their creation. | = , != , >= , <= , < , > | dd/mm/yyyy |
Items | creator | Filter items based on the creator of the item. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Items | datasetid | Filter items based on the dataset ID. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Items | described | Filter items based on the presence or absence of a description. | = , != | Boolean values (true or false ) |
Items | description | Filter items by searching for those that contain a specific part of the description text. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Items | dir | Filter items based on their folder location within the dataset. | = , != , IN , NOT-IN | String |
Items | filePath | Filter items by the file location. | = , != , IN , NOT-IN | String |
Items | ItemHeight | Filter items based on the height value of each item. | = , != , IN , NOT-IN | String |
Items | ItemID | The unique ID of the item. | = , != , IN , NOT-IN | String |
Items | ItemStatus | Filter items based on the item's annotation status | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String (Discard, Complete, and Approve) |
Items | ItemWidth | Filter items according to the width value of each item. | = , != , IN , NOT-IN | String |
Items | MediaType | The filter allows searching based on their media types, such as video or image. | = , != , IN , NOT-IN | String |
Items | metadata | Filter items based on metadata information (for example, metadata.system and metadata.description ) contained in the items' JSON file. | = , != , >= , <= , < , > , IN , NOT-IN | String |
Items | ModelTestSet | Filter items that are designated for testing the model. | = , != | Boolean values (true or false ) |
Items | ModelTrainSet | Filter items that are designated for training the model. | = , != | Boolean values (true or false ) |
Items | ModelValidationSet | Filter items that are designated for validating the model. | = , != | Boolean values (true or false ) |
Items | name | Filter items based on their name. | = , != , IN , NOT-IN | String |
Items | updatedAt | Filter items based on the item's last updated date. | = , != , > , >= , < , <= , EXIST , DOESNT-EXIST | String |
Items | updatedBy | Filter items based on the email ID of the user who last updated the item. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | annotationId | Filter annotation based on the annotation ID. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | attributeId | Filter annotation based on the attribute ID. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | attributeName | Filter annotation based on the attribute name. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | Confidence | Filter annotations based on its confidence level. The Confidence is the measure of certainty or accuracy in the labels assigned to data, typically expressed as a percentage. Higher confidence scores indicate greater reliability of the annotations. | = , != , > , >= , < , <= , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | createdAt | Filter annotations based on the date and time of their creation. | = , != , >= , <= , < , > | dd/mm/yyyy |
Annotations | creator | Filter annotations by the user's email ID who created the annotation. | = , != , IN , NOT-IN | String |
Annotations | datasetId | Filter annotations based on the dataset ID. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | id | Filter annotations based on the annotation ID. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | itemId | Filter annotations based on the item ID. | = , != , IN , NOT-IN | String |
Annotations | label | Filter annotations based on the labels. | = , != , IN , NOT-IN | String |
Annotations | metadata | Filter items based on annotation metadata information, such as metadata.system.attributes (annotation's attributes data) and metadata.system.status (annotation's like, approved, completed, etc.s). For more information, see the annotation metadata. | = , != , IN , NOT-IN | String |
Annotations | modelName | Filter annotations based on the model names. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | parentId | Filter the data using the ID of the parent annotation. | = , != , IN , NOT-IN | String |
Annotations | source | Filter the data using the by where the annotation was created: UI/SDK. | = , != , IN , NOT-IN | String |
Annotations | type | Filter the data using the types of annotations. | = , != , IN , NOT-IN | String |
Annotations | updatedAt | Filter annotations based on the last updated date. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Annotations | updatedBy | Filter annotations based on the email ID of the user who last updated the item. | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Tasks | TaskID | Filter the data using the Tasks' ID | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Tasks | TaskName | Filter the data using the Tasks' Name | = , != , IN , NOT-IN , EXIST , DOESNT-EXIST | String |
Search Items by Item's Data
- In the Data Browser, click on the Items field.
- Select or enter the required search query.
- Click Search to view the search result.
For more search queries, see the How to Search? article.
Search Items by Annotation's Data
- In the Data Browser, click on the Annotation field.
- Select or enter the required search query.
- Click Search to view the search result.
For more search queries, see the How to Search? article.
Search Items by Task's Data
- In the Data Browser, add the Tasks filter.
- In the Tasks' field, select task's ID or Name criteria from the list.
- Enter the value and click Search to view the search result.
For more search queries, see the How to Search? article.
Search Items by Free Text - CLIP Based
NLP (Natural Language Processing) with CLIP (Contrastive Language-Image Pre-Training) refers to the integration of language understanding capabilities with visual recognition. CLIP is a neural network model that leverages NLP techniques to process and understand text while simultaneously analyzing and comprehending images. This enables the model to interpret and align textual descriptions with corresponding visual content effectively.
You can input any descriptive text, and the model will retrieve images or data entries that best match the query, significantly enhancing the user experience by making the search process more intuitive and flexible. Additionally, you can use it alongside other search fields such as Annotations and Items to further refine and improve your search results.
- In the Data Browser, click on the Add Filters.
- Select the NLP (CLIP) from the list. A new field NLP (CLIP) is added.
- Enter the required text in the NLP (CLIP) field. For example, "Image with red cars".
- Click Search to view the search result.
Filter Actions
The Dataloop platform allows you to customize and save search queries within data querying or search systems, designed to streamline and optimize user interactions with large datasets.
For more search filter queries, see the How to Search? article.
Filters Operands
The following operators are applied within and between filters, unless otherwise specified when manually modifying a DQL query.
Cross Filter Operand
A relationship between multiple filters in a single query is based on the AND operand.
For example, filtering by status Annotated
AND
the user john@doe.ai
AND
the annotation type: Box
, will return items that are annotated, have Box
annotation, and have john@doe.ai
as an annotator in one or more of the annotations.
Inter-Filter Operand
The operand relationship between multiple values in a specific filter is based on the OR operator.
For example, filtering by labels with the values Person and Dog, the filter will return all items with annotations of either of these labels, not necessarily both at the same time.
Unsearchable Metadata
Dataloop platform enables a new functionality unsearchablePaths
on the Dataset's schema where it allows users to specify certain keys or prefixes as blacklisted within the dataset's schema, effectively removing them from the searchable schema. Therefore:
- Data under
unsearchablePaths
can't be queried using Dataloop Query Language (DQL), but it can be found in the item metadata level. - Users can access metadata at the item level, despite unsearchability.
unsearchablePaths
act as prefixes, all keys under this prefix will not be searchable.- Removing an
unsearchablePath
automatically makes the existing paths under the removed unsearchable path searchable in the schema.
It helps users to overcome the regular metadata limitations (1024 chars on keys or values, max 100 keys, etc. refer to the Specifications for more information).
When you add or remove a path that's not searchable in the dataset, it triggers an indexing process. During this time, you won't be able to make changes to the dataset schema or add new metadata values to dataset items until the indexing completes.
Make the Metadata Key Unsearchable
You will learn how to retrieve the dataset schema, mark specific keys as unsearchable, and revert them to searchable.
It helps you to overcome the errors received on metadata while uploading items.
Step 1: Get the Dataset Schema
A dataset's schema is the structure of what kind of information it holds, such as the names of different data fields (keys), and the type of data each field contains (e.g., text, number, date, etc.). It helps you understand how the data within the dataset is organized and what kind of data you can expect in each field.
First, you need to fetch the current schema of a dataset, which details the keys and their paths within the dataset structure.
dataset = dl.datasets.get(dataset_id='datasetId')
json = dataset.schema.get()
This code retrieves the schema of the dataset identified by 'datasetId'
. The schema, returned as a dictionary, shows keys and their paths within the dataset.
Step 2: Add Keys (Path) to Unsearchable Paths
If certain metadata keys should not be searchable due to privacy concerns or irrelevance to search queries, you can add these keys to the list of unsearchable paths.
dataset = dl.datasets.get(dataset_id='datasetId')
success = dataset.schema.unsearchable_paths.add(paths=['metadata.key1', 'metadata.key2'])
Here, the add()
method is used to make the paths metadata.key1
and metadata.key2
unsearchable. The method returns True if the paths are successfully added.
Remove Keys (Path) from Unsearchable Paths
If you decide that certain metadata keys should be searchable again, you can remove them from the list of unsearchable paths.
dataset = dl.datasets.get(dataset_id='datasetId')
success = dataset.schema.unsearchable_paths.remove(paths=['metadata.key1', 'metadata.key2'])
This step uses the remove()
method to delete metadata.key1
and metadata.key2
from the list of unsearchable paths, allowing them to be searchable again. The method returns True if the paths are successfully removed.