Utility Functions¶
This module provides various utility functions used throughout the Cogitator library, such as metrics calculation and text processing helpers.
cogitator.utils.count_steps(cot)
¶
Counts the number of reasoning steps in a Chain-of-Thought string.
Identifies steps based on lines starting with digits followed by a period/paren or lines starting with list markers like -, *, •.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cot
|
str
|
The string containing the Chain-of-Thought reasoning. |
required |
Returns:
Type | Description |
---|---|
int
|
The integer count of identified reasoning steps. |
Source code in cogitator/utils.py
cogitator.utils.approx_token_length(text)
¶
Approximates the number of tokens in a string.
Counts sequences of word characters and any non-whitespace, non-word characters as separate tokens. This provides a rough estimate, not a precise token count based on a specific tokenizer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The input string. |
required |
Returns:
Type | Description |
---|---|
int
|
An approximate integer count of tokens. |
Source code in cogitator/utils.py
cogitator.utils.exact_match(pred, gold)
¶
Performs case-insensitive exact matching between two strings.
Strips leading/trailing whitespace and converts both strings to lowercase before comparison.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pred
|
str
|
The predicted string. |
required |
gold
|
str
|
The ground truth (gold standard) string. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the normalized strings are identical, False otherwise. |
Source code in cogitator/utils.py
cogitator.utils.accuracy(preds, golds)
¶
Calculates the exact match accuracy between lists of predictions and golds.
Uses the exact_match
function for comparison. Handles potential differences
in list lengths by iterating up to the length of the shorter list if strict=False
(default in zip). If golds
is empty, returns 0.0.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preds
|
list[str]
|
A list of predicted strings. |
required |
golds
|
list[str]
|
A list of ground truth strings. |
required |
Returns:
Type | Description |
---|---|
float
|
The accuracy score as a float between 0.0 and 1.0. |