API Reference
This API reference provides detailed documentation for the classes and functions in the pybmc
package. It is automatically generated from the docstrings in the source code.
Dataset
Manages datasets for Bayesian model combination workflows.
Supports loading data from HDF5 and CSV files, splitting data, and filtering.
Attributes:
Name | Type | Description |
---|---|---|
data_source |
str
|
Path to data file. |
data |
dict[str, DataFrame]
|
Dictionary of loaded data by property. |
domain_keys |
list[str]
|
Domain columns used for data alignment. |
Source code in pybmc/data.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
|
__init__(data_source=None)
Initializes the Dataset instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_source
|
str
|
Path to data file (.h5 or .csv). |
None
|
Source code in pybmc/data.py
get_subset(property_name, filters=None, models_to_include=None)
Returns a filtered subset of data for a property.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
property_name
|
str
|
Property to filter. |
required |
filters
|
dict
|
Domain filtering rules. |
None
|
models_to_include
|
list[str]
|
Models to include. |
None
|
Returns:
Type | Description |
---|---|
pandas.DataFrame: Filtered DataFrame. |
Raises:
Type | Description |
---|---|
ValueError
|
If property not found. |
Source code in pybmc/data.py
load_data(models, keys=None, domain_keys=None, model_column='model')
Loads data for multiple properties and models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
models
|
list[str]
|
Model names to load. |
required |
keys
|
list[str]
|
Property names to extract. |
None
|
domain_keys
|
list[str]
|
Domain columns (default: ['N', 'Z']). |
None
|
model_column
|
str
|
CSV column identifying models (default: 'model'). |
'model'
|
Returns:
Type | Description |
---|---|
dict[str, pandas.DataFrame]: Dictionary of DataFrames keyed by property name. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
FileNotFoundError
|
If |
Example
dataset = Dataset('data.h5') data = dataset.load_data( models=['model1', 'model2'], keys=['BE', 'Rad'], domain_keys=['Z', 'N'] )
Source code in pybmc/data.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
separate_points_distance_allSets(list1, list2, distance1, distance2)
Separates points into groups based on proximity thresholds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list1
|
list[tuple[float, float]]
|
Points to classify as (x, y) tuples. |
required |
list2
|
list[tuple[float, float]]
|
Reference points as (x, y) tuples. |
required |
distance1
|
float
|
First proximity threshold. |
required |
distance2
|
float
|
Second proximity threshold. |
required |
Returns:
Type | Description |
---|---|
tuple[list[int], list[int], list[int]]: Three lists of indices from |
Source code in pybmc/data.py
split_data(data_dict, property_name, splitting_algorithm='random', **kwargs)
Splits data into training, validation, and test sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dict
|
dict[str, DataFrame]
|
Output from |
required |
property_name
|
str
|
Property to use for splitting. |
required |
splitting_algorithm
|
str
|
'random' or 'inside_to_outside'. |
'random'
|
**kwargs
|
Algorithm-specific parameters:
- |
{}
|
Returns:
Type | Description |
---|---|
tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]: (train, validation, test) DataFrames. |
Raises:
Type | Description |
---|---|
ValueError
|
For invalid algorithm or missing parameters. |
Source code in pybmc/data.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
|
view_data(property_name=None, model_name=None)
Provides flexible data viewing options.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
property_name
|
str
|
Specific property to view. |
None
|
model_name
|
str
|
Specific model to view. |
None
|
Returns:
Type | Description |
---|---|
Union[dict[str, Union[pandas.DataFrame, str]], pandas.DataFrame, pandas.Series]:
- If no args: dict of available properties/models.
- If only |
Raises:
Type | Description |
---|---|
RuntimeError
|
If no data loaded. |
KeyError
|
If property or model not found. |
Source code in pybmc/data.py
BayesianModelCombination
Implements Bayesian Model Combination (BMC) for aggregating predictions from multiple models.
This class performs orthogonalization of model predictions, trains the model combination using Gibbs sampling, and provides methods for prediction and evaluation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
models_list
|
list[str]
|
List of model names to combine. |
required |
data_dict
|
dict[str, DataFrame]
|
Dictionary from |
required |
truth_column_name
|
str
|
Name of the column containing ground truth values. |
required |
weights
|
list[float]
|
Initial weights for models. Defaults to equal weights. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
models_list |
list[str]
|
List of model names. |
data_dict |
dict[str, DataFrame]
|
Loaded data dictionary. |
truth_column_name |
str
|
Ground truth column name. |
weights |
list[float]
|
Current model weights. |
samples |
ndarray
|
Posterior samples from Gibbs sampling. |
current_property |
str
|
Current property being processed. |
centered_experiment_train |
ndarray
|
Centered experimental values. |
U_hat |
ndarray
|
Reduced left singular vectors from SVD. |
Vt_hat |
ndarray
|
Normalized right singular vectors. |
S_hat |
ndarray
|
Retained singular values. |
Vt_hat_normalized |
ndarray
|
Original right singular vectors. |
_predictions_mean_train |
ndarray
|
Mean predictions across models. |
Example
bmc = BayesianModelCombination( models_list=["model1", "model2"], data_dict=data, truth_column_name="truth" )
Source code in pybmc/bmc.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
__init__(models_list, data_dict, truth_column_name, weights=None)
Initializes the BMC instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
models_list
|
list[str]
|
List of model names to combine. |
required |
data_dict
|
dict[str, DataFrame]
|
Dictionary of DataFrames from Dataset.load_data(). |
required |
truth_column_name
|
str
|
Name of column containing ground truth values. |
required |
weights
|
list[float]
|
Initial model weights. Defaults to None (equal weights). |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
Source code in pybmc/bmc.py
evaluate(domain_filter=None)
Evaluates model performance using coverage calculation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
domain_filter
|
dict
|
Filtering rules for domain columns. Example: {"Z": (20, 30), "N": (20, 40)}. |
None
|
Returns:
Type | Description |
---|---|
list[float]: Coverage percentages for each percentile in [0, 5, 10, ..., 100]. |
Source code in pybmc/bmc.py
orthogonalize(property, train_df, components_kept)
Performs orthogonalization of model predictions using SVD.
This method centers model predictions, performs SVD decomposition, and retains the specified number of components for subsequent training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
property
|
str
|
Nuclear property to orthogonalize (e.g., 'BE'). |
required |
train_df
|
DataFrame
|
Training data from Dataset.split_data(). |
required |
components_kept
|
int
|
Number of SVD components to retain. |
required |
Note
This method must be called before training. Results are stored in instance attributes.
Source code in pybmc/bmc.py
predict(X)
Predicts values using the trained model combination with uncertainty quantification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
Input data containing model predictions and domain information. |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]: Contains: - rndm_m (numpy.ndarray): Full posterior draws (n_samples, n_points). - lower_df (pandas.DataFrame): Lower bounds (2.5th percentile) with domain keys. - median_df (pandas.DataFrame): Median predictions with domain keys. - upper_df (pandas.DataFrame): Upper bounds (97.5th percentile) with domain keys. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
Source code in pybmc/bmc.py
predict2(property)
Predicts values for a specific property using the trained model combination.
This version uses the property name instead of a DataFrame input.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
property
|
str
|
Property name to predict (e.g., 'ChRad'). |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]: Contains: - rndm_m (numpy.ndarray): Full posterior draws (n_samples, n_points). - lower_df (pandas.DataFrame): Lower bounds (2.5th percentile) with domain keys. - median_df (pandas.DataFrame): Median predictions with domain keys. - upper_df (pandas.DataFrame): Upper bounds (97.5th percentile) with domain keys. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
KeyError
|
If property not found in |
Source code in pybmc/bmc.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
|
train(training_options=None)
Trains the model combination using Gibbs sampling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_options
|
dict
|
Training configuration. Options: - iterations (int): Number of Gibbs iterations (default: 50000). - sampler (str): 'gibbs_sampling' or 'simplex' (default: 'gibbs_sampling'). - burn (int): Burn-in iterations for simplex sampler (default: 10000). - stepsize (float): Proposal step size for simplex sampler (default: 0.001). - b_mean_prior (numpy.ndarray): Prior mean vector (default: zeros). - b_mean_cov (numpy.ndarray): Prior covariance matrix (default: diag(S_hat²)). - nu0_chosen (float): Degrees of freedom for variance prior (default: 1.0). - sigma20_chosen (float): Prior variance (default: 0.02). |
None
|
Note
Requires prior call to orthogonalize()
. Stores posterior samples in self.samples
.
Source code in pybmc/bmc.py
USVt_hat_extraction(U, S, Vt, components_kept)
Extracts reduced-dimensionality matrices from SVD results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
U
|
ndarray
|
Left singular vectors. |
required |
S
|
ndarray
|
Singular values. |
required |
Vt
|
ndarray
|
Right singular vectors (transposed). |
required |
components_kept
|
int
|
Number of components to retain. |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray]:
- |
Source code in pybmc/inference_utils.py
gibbs_sampler(y, X, iterations, prior_info)
Performs Gibbs sampling for Bayesian linear regression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
Response vector (centered). |
required |
X
|
ndarray
|
Design matrix. |
required |
iterations
|
int
|
Number of sampling iterations. |
required |
prior_info
|
tuple[ndarray, ndarray, float, float]
|
Prior parameters:
- |
required |
Returns:
Type | Description |
---|---|
numpy.ndarray: Posterior samples |
Source code in pybmc/inference_utils.py
gibbs_sampler_simplex(y, X, Vt_hat, S_hat, iterations, prior_info, burn=10000, stepsize=0.001)
Performs Gibbs sampling with simplex constraints on model weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
Centered response vector. |
required |
X
|
ndarray
|
Design matrix of principal components. |
required |
Vt_hat
|
ndarray
|
Normalized right singular vectors. |
required |
S_hat
|
ndarray
|
Singular values. |
required |
iterations
|
int
|
Number of sampling iterations. |
required |
prior_info
|
list[float]
|
|
required |
burn
|
int
|
Burn-in iterations (default: 10000). |
10000
|
stepsize
|
float
|
Proposal step size (default: 0.001). |
0.001
|
Returns:
Type | Description |
---|---|
numpy.ndarray: Posterior samples |
Source code in pybmc/inference_utils.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
coverage(percentiles, rndm_m, models_output, truth_column)
Calculates coverage percentages for credible intervals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
percentiles
|
list[int]
|
Percentiles to evaluate (e.g., |
required |
rndm_m
|
ndarray
|
Posterior samples of predictions. |
required |
models_output
|
DataFrame
|
DataFrame containing true values. |
required |
truth_column
|
str
|
Name of column with true values. |
required |
Returns:
Type | Description |
---|---|
list[float]: Coverage percentages for each percentile. |
Source code in pybmc/sampling_utils.py
rndm_m_random_calculator(filtered_model_predictions, samples, Vt_hat)
Generates posterior predictive samples and credible intervals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filtered_model_predictions
|
ndarray
|
Model predictions. |
required |
samples
|
ndarray
|
Gibbs samples |
required |
Vt_hat
|
ndarray
|
Normalized right singular vectors. |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, list[numpy.ndarray]]:
- |