Text Embedding Module¶
This module provides abstractions and implementations for text embedding models.
Bases: ABC
Abstract base class for text embedding models.
Source code in cogitator/embedding.py
encode(texts)
abstractmethod
¶
Encodes a list of texts into embedding vectors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts
|
List[str]
|
A list of strings to encode. |
required |
Returns:
Type | Description |
---|---|
List[ndarray]
|
A list of NumPy arrays, where each array is the embedding vector for |
List[ndarray]
|
the corresponding text. |
Source code in cogitator/embedding.py
Bases: BaseEmbedder
An embedder implementation using the sentence-transformers library.
This class uses a singleton pattern to avoid reloading the model multiple times.
Source code in cogitator/embedding.py
__new__(model_name='all-MiniLM-L6-v2')
¶
Creates or returns the singleton instance of the embedder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the sentence-transformer model to load. This argument is only used during the first instantiation. |
'all-MiniLM-L6-v2'
|
Returns:
Type | Description |
---|---|
SentenceTransformerEmbedder
|
The singleton instance of SentenceTransformerEmbedder. |
Source code in cogitator/embedding.py
__init__(model_name='all-MiniLM-L6-v2')
¶
Initializes the SentenceTransformerEmbedder instance.
Note: Due to the singleton pattern implemented in __new__
, the
model_name
argument here is effectively ignored after the first
instantiation. The model loaded is determined by the model_name
passed during the first call to __new__
or __init__
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the sentence-transformer model. Defaults to "all-MiniLM-L6-v2". |
'all-MiniLM-L6-v2'
|
Source code in cogitator/embedding.py
encode(texts)
¶
Encodes a list of texts using the loaded sentence-transformer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts
|
List[str]
|
The list of strings to encode. |
required |
Returns:
Type | Description |
---|---|
List[ndarray]
|
A list of NumPy ndarray embeddings. |
Raises:
Type | Description |
---|---|
RuntimeError
|
If the embedding model has not been initialized correctly. |