- 14 Oct 2024
- Print
- DarkLight
- PDF
Getting Started
- Updated On 14 Oct 2024
- Print
- DarkLight
- PDF
The DQL - The Dataloop Query Language
While the Dataloop platform gives you different capabilities to organize your data in datasets, folders, and versioning systems, you still need the ability to query your data. This is where our Dataloop Query Language becomes handy.
The Dataloop Query Language allows you to navigate through and sort massive amounts of data and can be used in the SDK as part of your code. Also, in the user interface in the Dataset Browser and in triggers and filters in the Applications and Pipelines.
DQL Structure
Every DQL query has the following components:
- Resource : The target resource for the query. The resource can be items or annotations.
- Filter: The filter includes attributes and logical operators to filter items.
When using the DQL in the SDK, additional fields such as sort, page, and pageSize can be defined to sort the data that is returned from the query.
- sort: The properties and order by which results are sorted. Sort is set by default to include the attributes "type": “ascending” and "createdAt": “descending” (i.e., sort results by item type in ascending order and sort each item type by the date in which it was created in descending order – from newest to oldest).
- page: The page offset of the results (i.e., how many pages of the results should be skipped before presenting the results). The default value is 0.
- pageSize: The number of items per page. In the SDK, the default value is 1000 items per page.
For sorting and pagination of data in the UI, use the details view.
For example, the following query executed in the DQL Editor in the Dataset Browser will return only items that
- are of type file
- are not hidden
- have the attribute “createdAt'' greater than Jan 1, 2022
- and have the attribute “mimetype” set to “video*” (the wildcard character * makes this query search for any format of video – video/webm, video/mkv, etc.).
{
"filter": {
"$and": [{
"createdAt": {
"$gt": "2022-01-01T00:00:00.000Z"
}
}, {
"metadata.system.mimetype": {
"$eq": "video*"
}
}, {
"hidden": false
}, {
"type": "file"
}]
}
}
In this case, as the query was executed in the DQL Editor in the Dataset Browser, the system sets the "resource" to be “items”, "page": 0, "pageSize": 100, "sort" : {type: “ascending”, createdAt: “descending”}.
The "$and" array tells the system to apply the AND logical operator to the query fields within the array.
DQL Format
The query is written in JSON format: { }
JSON is a minimal text-based data exchange format that is used primarily to transmit data between a server and web application.
To learn more about JSON click here.
Data Types
The DQL uses the following data types:
- string: surround string constants with double quotes, e.g., "hello world"
- number: floating point and integer
- boolean: boolean variables are true or false
- array: collection of elements, e.g., [ "item1", "item2" ]
- null: no value
Operators
DQL uses the following logical operators in the following format:
- AND: “$and”
- OR: “$or”
- Exists: "$exists": true or false
- Equal: "$eq": "value"
- Equal not case sensitive (for strings): "$ieq": "value"
- Not equal: "$ne": "value"
- Greater than: "$gt": "value"
- Greater than or equal: "$gte": "value"
- Less than: "$lt": "value"
- Less than or equal: "$lte": "value"
- Field List Matching: "$in": ["option-1", "option-2"]
- Wildcard characters: * (representing zero or more characters) and ? (representing 1 character)
More on operators here.
DQL Supports Intersect and Except Operations
In Dataloop, support for intersect
and except
operators in DQL allows you to create more advanced queries that can compare and combine data sets based on specific conditions. These operations can be highly useful when working with multiple queries and data sets, enabling more refined and complex filtering.
To allow users to work with the new operators like INTERSECT and EXCEPT directly through the DQL editor, the query syntax can be designed to be intuitive and expressive. They function similarly to SQL's except
and intersect
operations. except
eliminates rows that meet the exclusion criteria from the final result, while intersect
retains only rows that fulfill the intersection condition.
To learn how to query, refer to the How to Use the DQL Query Editor to Search? section.
Below is a proposed structure for how the user might write these queries in the DQL editor.
{
"filter": {
"$and": [
{
"hidden": false
},
{
"type": "file"
}
]
},
"join": {
"filter": {
"$and": [
{
"type": {
"$in": [
"box"
]
}
}
]
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
},
"intersect": {
"query": {
"filter": {
"$and": [
{
"annotated": false
},
]
},
"join": {
"filter": {
"$and": [{
"type":
"point"
}
]
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
}
},
},
"except": {
"query": {
"filter": {
"$and": [
{
"annotated": true
},
]
},
"join": {
"filter": {
"$and": [
{
"type":
"point"
}
]
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
}
}
}
}
INTERSECT in DQL
The INTERSECT operation returns the common elements between two or more queries. It is useful when you want to find data items that satisfy multiple conditions across different subsets of data.
DQL INTERSECT Interface:
interface intersect {
"query": {
"filter": {... },
"join": {
"filter": {... },
"on": {... }
},
}
}
filter
: The main filter is where you define the first set of conditions. These are the primary conditions that will narrow down the dataset.join.filter
: The secondary filter contains conditions for the second dataset or subset that will be intersected with the results of the main filter.on
: Theon
field will define the common fields that act as the intersection criteria between the two sets of filtered data. It specifies the relationship or matching key used to join the data from the two filters.
Key Points for the Intersect Interface
- Modularity: You can easily extend or modify the filter or join conditions.
- Flexibility: You can apply the
join
on different fields, allowing for more complex intersections across datasets. - Readability: The structure is clear and easy to understand for combining datasets or subsets.
To use INTERSECT functionality, you can manually combine multiple conditions using logical operators ($and
) and specify the filtering conditions that mimic an intersection.
For example, if you want to get data that appears in both Query A and Query B, you could write a query that joins the two conditions together with $and
, which functions similarly to an intersection.
EXCEPT in DQL
The EXCEPT operation returns the elements that are in one query but not in another. This is useful for excluding specific items from a query result.
Here’s a design for the EXCEPT operation in DQL, similar to the structure you suggested for the INTERSECT operation. The EXCEPT operation removes items that match a secondary filter from the results of a primary filter.
interface except {
"query": {
"filter": {... },
"join": {
"filter": {... },
"on": {... }
},
}
}
filter
: This is the main filter condition that defines the primary set of data to retrieve. These are the data points you want to include initially.join.filter
: This is the exclusion condition. It defines the filter criteria for items you want to exclude from the results of the primary filter.on
: This defines the field or fields that must match between the two datasets (from the primary filter and the exclusion filter). It acts as the condition for identifying which data items to exclude.
Key Points for the EXCEPT Interface
- Exclusion Logic: This interface allows you to specify which subset of the data to exclude based on matching conditions.
- Customizable Exclusions: The
join
can be customized to exclude items based on a variety of criteria, and you can apply exclusions across different fields. - Efficient Filtering: By defining the
on
clause, you can efficiently exclude only the items that match both the primary and exclusion criteria.
To achieve the EXCEPT operation, you can use the $not
or negation operators in the UI's query builder. By specifying the condition that you want to exclude, you can mimic the behavior of EXCEPT.
Actions Available with INTERSECT and EXCEPT DQL Query Operations
By leveraging INTERSECT and EXCEPT in DQL, you can streamline your data querying process, enabling more targeted and efficient data retrieval and manipulation. The following actions are available with INTERSECT and EXCEPT DQL Query Operations:
- Create new task
- Add items to existing task
- Cloning
- Predict/embed
- Run with FaaS
- Run with Pipeline
For example, you can create tasks using an intersect query to choose items, but you cannot choose items for export using an intersect query.
Intersect and Except Query Example
Scenario
image_3
has annotations with the labels "label_1", "label_2", and "label_3".image_2
has annotations with the labels "label_1" and "label_2".image_1
has an annotation with the label "label_1".
Goal
Retrieve items that have annotations with both "label_1" and "label_2", but exclude any items that have the "label_3" label.
Query
{
"filter": {
"$and": [
{
"hidden": false
},
{
"type": "file"
}
],
"join": {
"filter": {
"label": "label_1"
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
}
},
"intersect": {
"query": {
"filter": {
"filename": { "$in": ["/image_1.png", "/image_2.png", "/image_3.png", "/image_4.png"] }
},
"join": {
"filter": {
"label": "label_2"
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
}
}
},
"except": {
"query": {
"join": {
"filter": {
"label": "label_3"
},
"on": {
"resource": "annotations",
"local": "itemId",
"forigen": "id"
}
}
}
},
"sort": { "filename": "descending" }
}
Expected Result
Only Image 2 meets these conditions, as Image 1 does not have "test-inter" and Image 3 contains "test-except", which needs to be excluded.