Clustering Module¶

This module defines abstractions and implementations for clustering algorithms used by a few of the strategies.

Bases: ABC

Abstract base class for clustering algorithms.

Source code in cogitator/clustering.py

class BaseClusterer(ABC):
    """Abstract base class for clustering algorithms."""

    @abstractmethod
    def cluster(
        self, embeddings: np.ndarray, n_clusters: int, **kwargs: Any
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Clusters the given embeddings into a specified number of clusters.

        Args:
            embeddings: A NumPy array where each row is an embedding vector.
            n_clusters: The desired number of clusters.
            **kwargs: Additional keyword arguments specific to the clustering implementation.

        Returns:
            A tuple containing:
                - A NumPy array of cluster labels assigned to each embedding.
                - A NumPy array of cluster centers.
        """
        ...

`cluster(embeddings, n_clusters, **kwargs)` `abstractmethod` ¶

Clusters the given embeddings into a specified number of clusters.

Parameters:

Name	Type	Description	Default
`embeddings`	`ndarray`	A NumPy array where each row is an embedding vector.	required
`n_clusters`	`int`	The desired number of clusters.	required
`**kwargs`	`Any`	Additional keyword arguments specific to the clustering implementation.	`{}`

Returns:

Type	Description
`Tuple[ndarray, ndarray]`	A tuple containing: - A NumPy array of cluster labels assigned to each embedding. - A NumPy array of cluster centers.

Source code in cogitator/clustering.py

@abstractmethod
def cluster(
    self, embeddings: np.ndarray, n_clusters: int, **kwargs: Any
) -> Tuple[np.ndarray, np.ndarray]:
    """Clusters the given embeddings into a specified number of clusters.

    Args:
        embeddings: A NumPy array where each row is an embedding vector.
        n_clusters: The desired number of clusters.
        **kwargs: Additional keyword arguments specific to the clustering implementation.

    Returns:
        A tuple containing:
            - A NumPy array of cluster labels assigned to each embedding.
            - A NumPy array of cluster centers.
    """
    ...

Bases: BaseClusterer

A clustering implementation using the K-Means algorithm from scikit-learn.

Source code in cogitator/clustering.py

class KMeansClusterer(BaseClusterer):
    """A clustering implementation using the K-Means algorithm from scikit-learn."""

    def cluster(
        self, embeddings: np.ndarray, n_clusters: int, **kwargs: Any
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Clusters embeddings using K-Means.

        Args:
            embeddings: The embeddings to cluster (shape: [n_samples, n_features]).
            n_clusters: The number of clusters to form.
            **kwargs: Additional arguments for `sklearn.cluster.KMeans`.
                Supported args include `random_seed` (or `seed`) and `n_init`.

        Returns:
            A tuple containing:
                - labels (np.ndarray): Integer labels array (shape: [n_samples,]).
                - centers (np.ndarray): Coordinates of cluster centers (shape: [n_clusters, n_features]).

        Raises:
            ValueError: If `n_clusters` is invalid or embeddings are incompatible.
        """
        random_seed = kwargs.get("random_seed") or kwargs.get("seed")
        n_init = kwargs.get("n_init", "auto")
        kmeans = KMeans(
            n_clusters=n_clusters,
            random_state=random_seed,
            n_init=n_init,
            init="k-means++",
        )
        labels = kmeans.fit_predict(embeddings)
        return labels, kmeans.cluster_centers_

`cluster(embeddings, n_clusters, **kwargs)` ¶

Clusters embeddings using K-Means.

Parameters:

Name	Type	Description	Default
`embeddings`	`ndarray`	The embeddings to cluster (shape: [n_samples, n_features]).	required
`n_clusters`	`int`	The number of clusters to form.	required
`**kwargs`	`Any`	Additional arguments for `sklearn.cluster.KMeans`. Supported args include `random_seed` (or `seed`) and `n_init`.	`{}`

Returns:

Type	Description
`Tuple[ndarray, ndarray]`	A tuple containing: - labels (np.ndarray): Integer labels array (shape: [n_samples,]). - centers (np.ndarray): Coordinates of cluster centers (shape: [n_clusters, n_features]).

Raises:

Type	Description
`ValueError`	If `n_clusters` is invalid or embeddings are incompatible.

Source code in cogitator/clustering.py

def cluster(
    self, embeddings: np.ndarray, n_clusters: int, **kwargs: Any
) -> Tuple[np.ndarray, np.ndarray]:
    """Clusters embeddings using K-Means.

    Args:
        embeddings: The embeddings to cluster (shape: [n_samples, n_features]).
        n_clusters: The number of clusters to form.
        **kwargs: Additional arguments for `sklearn.cluster.KMeans`.
            Supported args include `random_seed` (or `seed`) and `n_init`.

    Returns:
        A tuple containing:
            - labels (np.ndarray): Integer labels array (shape: [n_samples,]).
            - centers (np.ndarray): Coordinates of cluster centers (shape: [n_clusters, n_features]).

    Raises:
        ValueError: If `n_clusters` is invalid or embeddings are incompatible.
    """
    random_seed = kwargs.get("random_seed") or kwargs.get("seed")
    n_init = kwargs.get("n_init", "auto")
    kmeans = KMeans(
        n_clusters=n_clusters,
        random_state=random_seed,
        n_init=n_init,
        init="k-means++",
    )
    labels = kmeans.fit_predict(embeddings)
    return labels, kmeans.cluster_centers_

Clustering Module¶

cluster(embeddings, n_clusters, **kwargs) abstractmethod ¶

cluster(embeddings, n_clusters, **kwargs) ¶

`cluster(embeddings, n_clusters, **kwargs)` `abstractmethod` ¶

`cluster(embeddings, n_clusters, **kwargs)` ¶