Getting Started
  • 14 Oct 2024
  • Dark
    Light
  • PDF

Getting Started

  • Dark
    Light
  • PDF

Article summary

The DQL - The Dataloop Query Language

While the Dataloop platform gives you different capabilities to organize your data in datasets, folders, and versioning systems, you still need the ability to query your data. This is where our Dataloop Query Language becomes handy.

The Dataloop Query Language allows you to navigate through and sort massive amounts of data and can be used in the SDK as part of your code. Also, in the user interface in the Dataset Browser and in triggers and filters in the Applications and Pipelines.


DQL Structure

Every DQL query has the following components:

  • Resource : The target resource for the query. The resource can be items or annotations.
  • Filter: The filter includes attributes and logical operators to filter items.


When using the DQL in the SDK, additional fields such as sort, page, and pageSize can be defined to sort the data that is returned from the query.

  • sort: The properties and order by which results are sorted. Sort is set by default to include the attributes "type": “ascending” and "createdAt": “descending” (i.e., sort results by item type in ascending order and sort each item type by the date in which it was created in descending order – from newest to oldest).
  • page: The page offset of the results (i.e., how many pages of the results should be skipped before presenting the results). The default value is 0.
  • pageSize: The number of items per page. In the SDK, the default value is 1000 items per page.

Sorting and Pagination in the UI

For sorting and pagination of data in the UI, use the details view.

For example, the following query executed in the DQL Editor in the Dataset Browser will return only items that

  • are of type file
  • are not hidden
  • have the attribute “createdAt'' greater than Jan 1, 2022
  • and have the attribute “mimetype” set to “video*” (the wildcard character * makes this query search for any format of video – video/webm, video/mkv, etc.).
{
	"filter": {
		"$and": [{
			"createdAt": {
				"$gt": "2022-01-01T00:00:00.000Z"
			}
		}, {
			"metadata.system.mimetype": {
				"$eq": "video*"
			}
		}, {
			"hidden": false
		}, {
			"type": "file"
		}]
	}
}

In this case, as the query was executed in the DQL Editor in the Dataset Browser, the system sets the "resource" to be “items”, "page": 0, "pageSize": 100, "sort" : {type: “ascending”, createdAt: “descending”}.

The "$and" array tells the system to apply the AND logical operator to the query fields within the array.


DQL Format

The query is written in JSON format: { }

JSON

JSON is a minimal text-based data exchange format that is used primarily to transmit data between a server and web application.
To learn more about JSON click here.

Data Types

The DQL uses the following data types:

  • string: surround string constants with double quotes, e.g., "hello world"
  • number: floating point and integer
  • boolean: boolean variables are true or false
  • array: collection of elements, e.g., [ "item1", "item2" ]
  • null: no value

Operators

DQL uses the following logical operators in the following format:

  • AND: “$and”
  • OR: “$or”
  • Exists: "$exists": true or false
  • Equal: "$eq": "value"
  • Equal not case sensitive (for strings): "$ieq": "value"
  • Not equal: "$ne": "value"
  • Greater than: "$gt": "value"
  • Greater than or equal: "$gte": "value"
  • Less than: "$lt": "value"
  • Less than or equal: "$lte": "value"
  • Field List Matching: "$in": ["option-1", "option-2"]
  • Wildcard characters: * (representing zero or more characters) and ? (representing 1 character)


More on operators here.


DQL Supports Intersect and Except Operations

In Dataloop, support for intersect and except operators in DQL allows you to create more advanced queries that can compare and combine data sets based on specific conditions. These operations can be highly useful when working with multiple queries and data sets, enabling more refined and complex filtering.

To allow users to work with the new operators like INTERSECT and EXCEPT directly through the DQL editor, the query syntax can be designed to be intuitive and expressive. They function similarly to SQL's except and intersect operations. except eliminates rows that meet the exclusion criteria from the final result, while intersect retains only rows that fulfill the intersection condition.

To learn how to query, refer to the How to Use the DQL Query Editor to Search? section.

Below is a proposed structure for how the user might write these queries in the DQL editor.

{
	"filter": {
		"$and": [
			{
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	},
	"join": {
		"filter": {
			"$and": [
				{
					"type": {
						"$in": [
							"box"
						]
					}
				}
			]
		},
		"on": {
			"resource": "annotations",
			"local": "itemId",
			"forigen": "id"
		}
	},
	"intersect": {
		"query": {
			"filter": {
				"$and": [
					{
						"annotated": false
					},
				]
			},
			"join": {
				"filter": {
					"$and": [{
						"type":
							"point"
					}
					]
				},
				"on": {
					"resource": "annotations",
					"local": "itemId",
					"forigen": "id"
				}
			}
		},
	},
	"except": {
		"query": {
			"filter": {
				"$and": [
					{
						"annotated": true
					},
				]
			},
			"join": {
				"filter": {
					"$and": [
						{
							"type":
								"point"
						}
					]
				},
				"on": {
					"resource": "annotations",
					"local": "itemId",
					"forigen": "id"
				}
			}
		}

	}
}

INTERSECT in DQL

The INTERSECT operation returns the common elements between two or more queries. It is useful when you want to find data items that satisfy multiple conditions across different subsets of data.

DQL INTERSECT Interface:

interface intersect {
	"query": {
		"filter": {... },
		"join": {
			"filter": {... },
			"on": {... }
		},
	}
}
  • filter: The main filter is where you define the first set of conditions. These are the primary conditions that will narrow down the dataset.
  • join.filter: The secondary filter contains conditions for the second dataset or subset that will be intersected with the results of the main filter.
  • on: The on field will define the common fields that act as the intersection criteria between the two sets of filtered data. It specifies the relationship or matching key used to join the data from the two filters.

Key Points for the Intersect Interface

  • Modularity: You can easily extend or modify the filter or join conditions.
  • Flexibility: You can apply the join on different fields, allowing for more complex intersections across datasets.
  • Readability: The structure is clear and easy to understand for combining datasets or subsets.

To use INTERSECT functionality, you can manually combine multiple conditions using logical operators ($and) and specify the filtering conditions that mimic an intersection.

For example, if you want to get data that appears in both Query A and Query B, you could write a query that joins the two conditions together with $and, which functions similarly to an intersection.

EXCEPT in DQL

The EXCEPT operation returns the elements that are in one query but not in another. This is useful for excluding specific items from a query result.

Here’s a design for the EXCEPT operation in DQL, similar to the structure you suggested for the INTERSECT operation. The EXCEPT operation removes items that match a secondary filter from the results of a primary filter.

interface except {
	"query": {
		"filter": {... },
		"join": {
			"filter": {... },
			"on": {... }
		},
	}
}
  • filter: This is the main filter condition that defines the primary set of data to retrieve. These are the data points you want to include initially.
  • join.filter: This is the exclusion condition. It defines the filter criteria for items you want to exclude from the results of the primary filter.
  • on: This defines the field or fields that must match between the two datasets (from the primary filter and the exclusion filter). It acts as the condition for identifying which data items to exclude.

Key Points for the EXCEPT Interface

  • Exclusion Logic: This interface allows you to specify which subset of the data to exclude based on matching conditions.
  • Customizable Exclusions: The join can be customized to exclude items based on a variety of criteria, and you can apply exclusions across different fields.
  • Efficient Filtering: By defining the on clause, you can efficiently exclude only the items that match both the primary and exclusion criteria.

To achieve the EXCEPT operation, you can use the $not or negation operators in the UI's query builder. By specifying the condition that you want to exclude, you can mimic the behavior of EXCEPT.

Actions Available with INTERSECT and EXCEPT DQL Query Operations

By leveraging INTERSECT and EXCEPT in DQL, you can streamline your data querying process, enabling more targeted and efficient data retrieval and manipulation. The following actions are available with INTERSECT and EXCEPT DQL Query Operations:

  • Create new task
  • Add items to existing task
  • Cloning
  • Predict/embed
  • Run with FaaS
  • Run with Pipeline

For example, you can create tasks using an intersect query to choose items, but you cannot choose items for export using an intersect query.

Intersect and Except Query Example

Scenario

  • image_3 has annotations with the labels "label_1", "label_2", and "label_3".
  • image_2 has annotations with the labels "label_1" and "label_2".
  • image_1 has an annotation with the label "label_1".

Goal

Retrieve items that have annotations with both "label_1" and "label_2", but exclude any items that have the "label_3" label.

Query

{
	"filter": {
		"$and": [
			{
				"hidden": false
			},
			{
				"type": "file"
			}
		],
		"join": {
			"filter": {
				"label": "label_1"
			},
			"on": {
				"resource": "annotations",
				"local": "itemId",
				"forigen": "id"
			}
		}
	},
	"intersect": {
		"query": {
			"filter": {
				"filename": { "$in": ["/image_1.png", "/image_2.png", "/image_3.png", "/image_4.png"] }
			},
			"join": {
				"filter": {
					"label": "label_2"
				},
				"on": {
					"resource": "annotations",
					"local": "itemId",
					"forigen": "id"
				}
			}
		}
	},
	"except": {
		"query": {
			"join": {
				"filter": {
					"label": "label_3"
				},
				"on": {
					"resource": "annotations",
					"local": "itemId",
					"forigen": "id"
				}
			}
		}
	},
	"sort": { "filename": "descending" }
	
}

Expected Result

Only Image 2 meets these conditions, as Image 1 does not have "test-inter" and Image 3 contains "test-except", which needs to be excluded.