DQL Explained
  • 04 Jun 2024
  • Dark
    Light
  • PDF

DQL Explained

  • Dark
    Light
  • PDF

Article summary

The DQL - The Dataloop Query Language

While the Dataloop platform gives you different capabilities to organize your data in datasets, folders, and versioning systems, you still need the ability to query your data. This is where our Dataloop Query Language becomes handy.

The Dataloop Query Language allows you to navigate through and sort massive amounts of data and can be used in the SDK as part of your code, as well as in the user interface in the Dataset Browser and in triggers and filters in the FaaS and Pipelines.


DQL Structure

Every DQL query has the following components:

  • Resource : The target resource for the query. The resource can be items or annotations.
  • Filter: The filter includes attributes and logical operators to filter items.


When using the DQL in the SDK, additional fields such as sort, page, and pageSize can be defined to sort the data that is returned from the query.

  • sort: The properties and order by which results are sorted. Sort is set by default to include the attributes "type": “ascending” and "createdAt": “descending” (i.e., sort results by item type in ascending order and sort each item type by the date in which it was created in descending order – from newest to oldest).
  • page: The page offset of the results (i.e., how many pages of the results should be skipped before presenting the results). The default value is 0.
  • pageSize: The number of items per page. In the SDK, the default value is 1000 items per page.

Sorting and Pagination in the UI

For sorting and pagination of data in the UI, use the details view.

For example, the following query executed in the DQL Editor in the Dataset Browser will return only items that

  • are of type file
  • are not hidden
  • have the attribute “createdAt'' greater than Jan 1, 2022
  • and have the attribute “mimetype” set to “video*” (the wildcard character * makes this query search for any format of video – video/webm, video/mkv, etc.).
{
	"filter": {
		"$and": [{
			"createdAt": {
				"$gt": "2022-01-01T00:00:00.000Z"
			}
		}, {
			"metadata.system.mimetype": {
				"$eq": "video*"
			}
		}, {
			"hidden": false
		}, {
			"type": "file"
		}]
	}
}

In this case, as the query was executed in the DQL Editor in the Dataset Browser, the system sets the "resource" to be “items”, "page": 0, "pageSize": 100, "sort" : {type: “ascending”, createdAt: “descending”}.

The "$and" array tells the system to apply the AND logical operator to the query fields within the array.


DQL Format

The query is written in JSON format: { }

JSON

JSON is a minimal text-based data exchange format that is used primarily to transmit data between a server and web application.
To learn more about JSON click here.

Data Types

The DQL uses the following data types:

  • string: surround string constants with double quotes, e.g., "hello world"
  • number: floating point and integer
  • boolean: boolean variables are true or false
  • array: collection of elements, e.g., [ "item1", "item2" ]
  • null: no value

Operators

DQL uses the following logical operators in the following format:

  • AND: “$and”
  • OR: “$or”
  • Exists: "$exists": true or false
  • Equal: "$eq": "value"
  • Equal not case sensitive (for strings): "$ieq": "value"
  • Not equal: "$ne": "value"
  • Greater than: "$gt": "value"
  • Greater than or equal: "$gte": "value"
  • Less than: "$lt": "value"
  • Less than or equal: "$lte": "value"
  • Field List Matching: "$in": ["option-1", "option-2"]
  • Wildcard characters: * (representing zero or more characters) and ? (representing 1 character)


More on operators here.


DQL Query

The following DQL attributes can be used in queries to filter and sort data. To learn more about the item JSON fields that the DQL queries, click here.

resource

Specifies the resource to be queried.
"resource": "items" | "annotations"

filter

Specifies the filter to be applied and may include a combination of field values and operators.
"filter": {...},

For example, to get all items named “foo” with extension “bar” in the dataset root folder, run this query in the DQL Editor:

{
    "filter": {
        "$and":[{
            "hidden":false
        }, {
            "filename": {
                "$eq":"/foo.bar"
            }}, {
            "type":"file"
        }]}}
Note

You need to add the relative path before the filename. For example, if you wish to filter for items named “foo” with extension “bar” in the subfolder “test,” you would write “filename”: { “$eq”: “/test/foo.bar”} in the example above.

join

[optional field]
Filters a resource by another resource, for example, items by the items' annotations.
"join": { "filter": {...},
"on": {...}},

For example, if you filter items in the dataset by annotation type box, you will see the DQL Editor produce the following query:

{
    "filter": {
        "$and":[{
            "hidden":false
        }, {
            "type":"file"
        }]},"join": {
        "filter": {
            "$and":[{
                "type": {
                    "$in":["box"]}}]},"on": {
            "resource":"annotations","local":"itemId","forigen":"id"
        }}}

The field "local" is the key on the joined resource (e.g., annotations).
The field "forigen" is the key on the returned resource (e.g., items).

DQL in the SDK

To learn more about filtering and DQL queries in the SDK, click here.

1. Filter: Get all items named foo.bar

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
filters.add(field='name', values='foo.bar.*')
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
# Iterate through the items - go over all items and print the properties
for page in pages:
    for item in page:
        item.print()

2. Join: get items that have box annotations

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
filters.add_join(field='type', values='box')
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
# Iterate through the items - go over all items and print the properties
for page in pages:
    for item in page:
        item.print()

3. Sort: Sort files of size 0-5 MB by name, in ascending order

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
filters.add(field='metadata.system.size', values='0', operator='gt')
filters.add(field='metadata.system.size', values='5242880', operator='lt')
filters.sort_by(field=dl.FILTERS_KNOWN_FIELDS_FILENAME, value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
# Iterate through the items - go over all items and print the properties
for page in pages:
    for item in page:
        item.print()

4. Sort with multiple fields: Sort by labels ascending and createdAt descending

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
# set annotation resource
filters.resource = dl.FiltersResource.ANNOTATION
# return results sorted by descending label
filters.sort_by(field=dl.FILTERS_KNOWN_FIELDS_LABEL, value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
filters.sort_by(field=dl.FILTERS_KNOWN_FIELDS_CREATED_AT, value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
# Iterate through the items - go over all items and print the properties
for page in pages:
    for item in page:
        item.print()

Response to DQL Query

A typical response to a DQL query will look like the following:

{
    "totalItemsCount": number,
    "items": Array,
    "totalPagesCount": number,
    "hasNextPage": boolean,
}

# A possible result:

{
    "totalItemsCount": 2,
    "totalPagesCount": 1,
    "hasNextPage": false,
    "items": [
        {
            "id": "5d0783852dbc15306a59ef6c",
            "createdAt": "2019-06-18T23:29:15.775Z",
            "filename": "/5546670769_8df950c6b6.jpg",
            "type": "file"
            // ...
        },
        {
            "id": "5d0783852dbc15306a59ef6d",
            "createdAt": "2019-06-19T23:29:15.775Z",
            "filename": "/5551018983_3ce908ac98.jpg",
            "type": "file"
            // ...
        }
    ]
}

Equal / Not equal

  • $eq specifies equality condition.
  • $ne operator returns all the items that do not match the specified value.
 {
  "resource": "items",
  "filter": {
    "createdAt": { "$eq": "2019-06-17T23:29:15.775Z" } // Match items that were created at the specified date
  }
}

Greater / Lower Than

  • $gt operator returns all items where the field value is greater than the specified value.
  • $lt operator returns all items where the field value is less than the specified value.
{
  "resource": "items",
  "filter": {
    "createdAt": { "$gt": "2019-06-17T23:29:15.775Z" } // Match items that were created after the specified date
  }
}

Logical AND / OR

  • $or operator specifies a list of expressions and matches all items that satisfy at least one of the expressions that may contain operators.
  • $and operator specifies a list of expressions and matches of all items that satisfy all of the expressions that may contain operators.

In this example, you will get items who have both box annotations and 'car' label.

"join": {
    "filter": {
        "$and": [
            {
                "type": "box"
            },
            {
                "label": "car"
            }
        ]
    },
    "on": {
        "resource": "annotations",
        "local": "itemId",
        "forigen": "id"
    }
}

Read more about 'join' filters here.


Field List Matching

$in operator specifies a list of values and matches all items where the field value matches one of the values.

In this example, you will get items that have annotations created by either "annotator1@dataloop.ai" or "annotator2@dataloop.ai".

{
    "join": {
        "filter": {
            "$and": [
                {
                    "creator": {
                        "$in": [
                            "annotator1@dataloop.ai",
                            "annotator2@dataloop.ai"
                        ]
                    }
                }
            ]
        },
        "on": {
            "resource": "annotations",
            "local": "itemId",
            "forigen": "id"
        }
    }
}

Read more about 'join' filters here.

Wildcards

Wildcards '*' and '?' are supported in field value

  • * Will match zero or more characters.
  • ? Will match one character at most.

Example:

// Match all filenames that contains magic

{ "filename": "*magic*" }

//Match mimetypes image/jpg and image/jpeg

{ "filename": "*jp?g" }

Commonly Used DQL Filters

All Completed Items

{
	"filter": {
		"$and": [{
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$eq": "completed"
								}
							}
						}
					}
				}
			},
			{
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	}
}

All Approved Items

{
	"filter": {
		"$and": [{
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$eq": "approved"
								}
							}
						}
					}
				}
			},
			{
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	}
}

All Completed and Approved Items

{
	"filter": {
		"$and": [{
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$eq": "completed"
								}
							}
						}
					}
				}
			}, {
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$eq": "approved"
								}
							}
						}
					}
				}
			}, {
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	}
}

All Not Completed Items

{
	"filter": {
		"$and": [{
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$ne": "completed"
								}
							}
						}
					}
				}
			},
			{
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	}
}

All Completed and Not Approved Items

{
	"filter": {
		"$and": [{
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$eq": "completed"
								}
							}
						}
					}
				}
			}, {
				"metadata": {
					"system": {
						"refs": {
							"metadata": {
								"status": {
									"$ne": "approved"
								}
							}
						}
					}
				}
			}, {
				"hidden": false
			},
			{
				"type": "file"
			}
		]
	}
}