Embeddings
  • 27 May 2025
  • Dark
    Light
  • PDF

Embeddings

  • Dark
    Light
  • PDF

Article summary

The Embeddings tab in the Dataloop platform provides a powerful way to visualize and interact with feature sets derived from your datasets. These feature sets represent the data in a numerical form (embeddings) that models can process and learn from.

Embeddings in Dataloop are often generated by a model or custom function, and stored per item (image, annotation, etc.).

  • A Feature Vector = an embedding for a single item. An individual numerical representation.
  • A Feature Set = a structured collection of these embeddings. A group of embeddings ready for analysis or ML

Feature Sets

A Feature Set is a structured collection of Feature Vectors generated by a specific model. When a model is used for the first time to extract feature vectors from a dataset, a corresponding Feature Set is automatically created.

Subsequently, the same model can be applied to additional datasets to extract feature vectors. For each item in those datasets, one feature vector is generated. All resulting feature vectors—regardless of their originating dataset—are associated with the same Feature Set, as long as they are derived from the same model.

Each model corresponds to exactly one Feature Set per project, ensuring a consistent and centralized representation of feature vectors for that model within the project.

It acts as the input matrix for machine learning tasks and is typically used for:

  • Training machine learning models
  • Evaluating model performance
  • Clustering data items
  • Similarity-based retrieval or search

🧩 Structure:

  • Rows: Each row is a Feature Vector representing a single data item.
  • Columns: Each column corresponds to a feature/attribute (e.g., color value, texture score, sentiment score).

What are Embeddings or Feature Vectors?

Embeddings, also known as Feature Vectors, are numerical representations of data—such as images, text, video frames, or other objects—produced by AI models. These representations encode the most significant characteristics of the data in a compact, high-dimensional vector format.

The purpose of embeddings is to capture the underlying features and semantic relationships within the data, enabling efficient comparison, clustering, search, and downstream machine learning tasks. By transforming complex, unstructured data into structured numerical form, embeddings make it possible for models to perform tasks like classification, similarity detection, and pattern recognition with greater accuracy and speed.

In summary, a Feature Vector represents a single data item in this vector space, encapsulating its most meaningful traits in a format optimized for machine analysis and learning.

These embeddings can be used for tasks such as:

  • Similarity searches (e.g., finding similar images or texts)
  • Clustering (e.g., grouping items by similarity)
  • Visualization (e.g., projecting high-dimensional data into 2D or 3D space)
  • Training downstream models (e.g., using embeddings as input features for predictive models)

✨ Key Properties:

  • Structured: It's an array or list of numbers where each number represents a feature, i.e., a measurable property of the data.
  • Model-ready: These vectors are the foundation for ML tasks like classification, clustering, similarity search, etc.
  • Generated via Extraction: Typically, feature vectors are created by running an embedding model, such as a neural network, on the raw data.

📌 Example Use Cases:

  • An image might be converted into a 512-dimensional vector using a pre-trained vision model.
  • A labeled audio clip may be transformed into a numerical vector based on frequency and amplitude patterns.

Key Features

1. Feature Set Extraction

  • Enables you to extract embeddings (feature sets) from one or more datasets.
  • Supports various types of data, including images, text, and other structured/unstructured data formats.

2. View Feature Sets

  • Displays embeddings (numerical representations) generated from your datasets.
  • Helps understand patterns, clusters, or relationships within the data.

3. Detailed Overview

Lists all generated feature sets with comprehensive metadata:

  • Feature Set Name: Displays the name assigned to the generated feature set.
  • Associated Datasets: Shows the number of datasets used for embedding extraction. Click the number to view the related datasets, and copy the dataset ID.
  • Model Used: Specifies the model (e.g., a neural network) used to generate the embeddings. Clicking on the model name will open the Model's details page.
  • Dimensions: Indicates the dimensionality of the embeddings (e.g., 512-dimensional vectors).
  • Created By: Identifies the user or process that generated the feature set.
  • Creation Date: Records when the feature set was generated.
  • Updated Date: Records when the last modification date of the feature set.
  • Search by Feature Set Name: Enables you to search for specific feature sets using their names.
  • Filter Feature Sets by Creator Name: Allows filtering of feature sets based on their creator’s name.

3. Centralized Management

  • Provides a dedicated workspace for managing all embeddings generated within your platform.
  • Allows for searching, sorting, and filtering embeddings by various attributes.

4. Integration and Usage

Seamlessly integrates with other tools within the platform for tasks like similarity search, training models, or data analysis.

5. Visualization and Analysis

  • Some platforms may include built-in visualization tools to explore embeddings in 2D/3D spaces (e.g., using UMAP).
  • Facilitates data analysis by revealing patterns, clusters, or anomalies.

6. Scalable and Customizable

  • Can handle large-scale datasets and multiple embedding models.
  • Supports custom models if you want to use your trained models for feature extraction.

Extract Embeddings

Refer to the Extract Embeddings section.


Copy Feature Set ID

  1. In the Embeddings tab, identify the feature set from the list.
  2. Click on the three-dots and select the Copy Feature Set ID. The ID will be copied.

Delete Feature Sets

  1. In the Embeddings tab, identify the feature set from the list.
  2. Click on the three-dots and select the Delete Feature Set.
  3. Type DELETE and click Delete. A confirmation message is displayed.