API Reference
This API reference provides detailed documentation for the classes and functions in the pybmc
package. It is automatically generated from the docstrings in the source code.
Dataset
A general-purpose dataset class for loading and managing model data for Bayesian model combination workflows.
Supports .h5 and .csv files, and provides data splitting functionality.
Source code in pybmc/data.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
|
__init__(data_source=None, verbose=True)
Initialize the Dataset object.
:param data_source: Path to the data file (.h5 or .csv). :param verbose: If True, display warnings and informational messages. Default is True.
Source code in pybmc/data.py
get_subset(property_name, filters=None, models_to_include=None)
Return a filtered, wide-format DataFrame for a given property.
:param property_name: Name of the property (e.g., "BE", "ChRad"). :param filters: Dictionary of filtering rules applied to the domain columns (e.g., {"Z": (26, 28)}). :param models_to_include: Optional list of model names to retain in the output. If None, all model columns are retained. :return: Filtered wide-format DataFrame with columns: domain keys + model columns.
Source code in pybmc/data.py
load_data(models, keys=None, domain_keys=None, model_column='model', truth_column_name=None)
Load data for each property and return a dictionary of synchronized DataFrames. Each DataFrame has columns: domain_keys + one column per model for that property.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
models
|
list
|
List of model names (for HDF5 keys or filtering CSV). |
required |
keys
|
list
|
List of property names to extract (each will be a key in the output dict). |
None
|
domain_keys
|
list
|
List of columns used to define the common domain (default ['N', 'Z']). |
None
|
model_column
|
str
|
Name of the column in the CSV that identifies which model each row belongs to. Only used for CSV files; ignored for HDF5 files. |
'model'
|
truth_column_name
|
str
|
Name of the truth model. If provided, the truth data will be left-joined to the common domain of the other models, allowing the truth data to have a smaller domain than the models. |
None
|
Returns:
Name | Type | Description |
---|---|---|
dict |
Dictionary where each key is a property name and each value is a DataFrame with columns: domain_keys + one column per model for that property. The DataFrames are synchronized to the intersection of the domains for all models. If truth_column_name is provided, truth data is left-joined (may have NaN values). |
Supports both .h5 and .csv files.
Source code in pybmc/data.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
separate_points_distance_allSets(list1, list2, distance1, distance2)
Separates points in list1 into three groups based on their proximity to any point in list2.
:param list1: List of (x, y) tuples. :param list2: List of (x, y) tuples. :param distance: The threshold distance to determine proximity. :return: Two lists - close_points and distant_points.
Source code in pybmc/data.py
split_data(data_dict, property_name, splitting_algorithm='random', **kwargs)
Split data into training, validation, and testing sets using random or inside-to-outside logic.
:param data_dict: Dictionary output from load_data
, where keys are property names and values are DataFrames.
:param property_name: The key in data_dict
specifying which DataFrame to use for splitting.
:param splitting_algorithm: 'random' (default) or 'inside_to_outside'.
:param kwargs: Additional arguments depending on the chosen algorithm.
For 'random': train_size, val_size, test_size
For 'inside_to_outside': stable_points (list of (x, y)), distance1, distance2
:return: Tuple of train, validation, test datasets as DataFrames.
Source code in pybmc/data.py
view_data(property_name=None, model_name=None)
View data flexibly based on input parameters.
- No arguments: returns available property names and model names.
- property_name only: returns the full DataFrame for that property.
- model_name only: Return model values across all properties.
- property_name + model_name: returns a Series of values for the model.
:param property_name: Optional property name :param model_name: Optional model name :return: dict, DataFrame, or Series depending on input.
Source code in pybmc/data.py
BayesianModelCombination
The main idea of this class is to perform BMM on the set of models that we choose from the dataset class. What should this class contain: + Orthogonalization step. + Perform Bayesian inference on the training data that we extract from the Dataset class. + Predictions for certain isotopes.
Source code in pybmc/bmc.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
__init__(models_list, data_dict, truth_column_name, weights=None)
Initialize the BayesianModelCombination class.
:param models_list: List of model names
:param data_dict: Dictionary from load_data()
where each key is a model name and each value is a DataFrame of properties
:param truth_column_name: Name of the column containing the truth values.
:param weights: Optional initial weights for the models.
Source code in pybmc/bmc.py
evaluate(domain_filter=None)
Evaluate the model combination using coverage calculation.
:param domain_filter: dict with optional domain key ranges, e.g., {"Z": (20, 30), "N": (20, 40)} :return: coverage list for each percentile
Source code in pybmc/bmc.py
orthogonalize(property, train_df, components_kept)
Perform orthogonalization for the specified property using training data.
:param property: The nuclear property to orthogonalize on (e.g., 'BE'). :param train_index: Training data from split_data :param components_kept: Number of SVD components to retain.
Source code in pybmc/bmc.py
predict(property)
Predict a specified property using the model weights learned during training.
:param property: The property name to predict (e.g., 'ChRad'). :return: - rndm_m: array of shape (n_samples, n_points), full posterior draws - lower_df: DataFrame with columns domain_keys + ['Predicted_Lower'] - median_df: DataFrame with columns domain_keys + ['Predicted_Median'] - upper_df: DataFrame with columns domain_keys + ['Predicted_Upper']
Source code in pybmc/bmc.py
train(training_options=None)
Train the model combination using training data and optional training parameters.
:param training_data: Placeholder (not used). :param training_options: Dictionary of training options. Keys: - 'iterations': (int) Number of Gibbs iterations (default 50000) - 'b_mean_prior': (np.ndarray) Prior mean vector (default zeros) - 'b_mean_cov': (np.ndarray) Prior covariance matrix (default diag(S_hat²)) - 'nu0_chosen': (float) Degrees of freedom for variance prior (default 1.0) - 'sigma20_chosen': (float) Prior variance (default 0.02)
Source code in pybmc/bmc.py
USVt_hat_extraction(U, S, Vt, components_kept)
Extracts reduced-dimensionality matrices from SVD results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
U
|
ndarray
|
Left singular vectors. |
required |
S
|
ndarray
|
Singular values. |
required |
Vt
|
ndarray
|
Right singular vectors (transposed). |
required |
components_kept
|
int
|
Number of components to retain. |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray]:
- |
Source code in pybmc/inference_utils.py
gibbs_sampler(y, X, iterations, prior_info)
Performs Gibbs sampling for Bayesian linear regression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
Response vector (centered). |
required |
X
|
ndarray
|
Design matrix. |
required |
iterations
|
int
|
Number of sampling iterations. |
required |
prior_info
|
tuple[ndarray, ndarray, float, float]
|
Prior parameters:
- |
required |
Returns:
Type | Description |
---|---|
numpy.ndarray: Posterior samples |
Source code in pybmc/inference_utils.py
gibbs_sampler_simplex(y, X, Vt_hat, S_hat, iterations, prior_info, burn=10000, stepsize=0.001)
Performs Gibbs sampling with simplex constraints on model weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
Centered response vector. |
required |
X
|
ndarray
|
Design matrix of principal components. |
required |
Vt_hat
|
ndarray
|
Normalized right singular vectors. |
required |
S_hat
|
ndarray
|
Singular values. |
required |
iterations
|
int
|
Number of sampling iterations. |
required |
prior_info
|
list[float]
|
|
required |
burn
|
int
|
Burn-in iterations (default: 10000). |
10000
|
stepsize
|
float
|
Proposal step size (default: 0.001). |
0.001
|
Returns:
Type | Description |
---|---|
numpy.ndarray: Posterior samples |
Source code in pybmc/inference_utils.py
coverage(percentiles, rndm_m, models_output, truth_column)
Calculates coverage percentages for credible intervals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
percentiles
|
list[int]
|
Percentiles to evaluate (e.g., |
required |
rndm_m
|
ndarray
|
Posterior samples of predictions. |
required |
models_output
|
DataFrame
|
DataFrame containing true values. |
required |
truth_column
|
str
|
Name of column with true values. |
required |
Returns:
Type | Description |
---|---|
list[float]: Coverage percentages for each percentile. |
Source code in pybmc/sampling_utils.py
rndm_m_random_calculator(filtered_model_predictions, samples, Vt_hat)
Generates posterior predictive samples and credible intervals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filtered_model_predictions
|
ndarray
|
Model predictions. |
required |
samples
|
ndarray
|
Gibbs samples |
required |
Vt_hat
|
ndarray
|
Normalized right singular vectors. |
required |
Returns:
Type | Description |
---|---|
tuple[numpy.ndarray, list[numpy.ndarray]]:
- |