Utils
evals_hub.utils.sampling
stratified_subsampling(dataset_dict, seed, splits=['test'], label='label', n_samples=2048)
Subsamples the dataset with stratification by the supplied label. Returns a datasetDict object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_dict
|
DatasetDict
|
the DatasetDict object. |
required |
seed
|
int
|
the random seed. |
required |
splits
|
list[str]
|
the splits of the dataset. |
['test']
|
label
|
str
|
the label with which the stratified sampling is based on. |
'label'
|
n_samples
|
int
|
Optional, number of samples to subsample. Default is max_n_samples. |
2048
|
Source code in evals_hub/utils/sampling.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
evals_hub.utils.utils
assign_session_id(data)
Assign session_id column to polars dataframe.
Source code in evals_hub/utils/utils.py
86 87 88 89 90 91 92 93 | |
backend_factory()
Configure the HTTP backend for requests to disable SSL verification.
Source code in evals_hub/utils/utils.py
28 29 30 31 32 | |
create_repo_from_config(config_path, repo_name)
Create a Hugging Face dataset repository using configuration settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
Path
|
Path to the Hugging Face configuration file containing: - org_name (required): Organization name for the repository - resource_group_id (optional): Resource group identifier - private_repository (optional, default True): Whether repo should be private |
required |
repo_name
|
str
|
Name of the repository to create |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The full repository ID (org_name/repo_name) |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If repository creation fails (except for 409 conflicts when repo already exists) |
Source code in evals_hub/utils/utils.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
get_device()
Automatically detect and return the best available device.
Returns:
| Type | Description |
|---|---|
device
|
torch.device: The best available device (cuda, mps, or cpu) |
Source code in evals_hub/utils/utils.py
35 36 37 38 39 40 41 42 43 44 45 46 47 | |
load_huggingface_config(config_path=None)
Load Hugging Face configuration from huggingface_config.yaml.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
Path | None
|
Optional path to the config file. If None, uses the default path. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Configuration containing org_name and resource_group_id |
Source code in evals_hub/utils/utils.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
setup_logging()
Configure logging for the application.
Source code in evals_hub/utils/utils.py
19 20 21 22 23 24 25 | |