Usage Guide
This guide provides a comprehensive walkthrough of the pybmc
package, demonstrating how to load data, combine models, and generate predictions with uncertainty quantification. We will use the selected_data.h5
file included in the repository for this example.
1. Load and Prepare Data
First, we import the necessary classes and specify the path to our data file. We then load the data, specifying the models and properties we're interested in.
import pandas as pd
from pybmc.data import Dataset
from pybmc.bmc import BayesianModelCombination
# Path to the data file
data_path = "pybmc/selected_data.h5"
# Initialize the dataset
dataset = Dataset(data_path)
# Load data for specified models and properties
data_dict = dataset.load_data(
models=["FRDM12", "HFB24", "D1M",
"UNEDF1", "BCPM", "AME2020"],
keys=["BE"],
domain_keys=["N", "Z"]
2. Split the Data
Next, we split the data into training, validation, and test sets. pybmc
supports random splitting as shown below.
# Split the data into training, validation, and test sets
train_df, val_df, test_df = dataset.split_data(
data_dict,
"BE",
splitting_algorithm="random",
train_size=0.6,
val_size=0.2,
test_size=0.2,
)
3. Initialize and Train the BMC Model
Now, we initialize the BayesianModelCombination
class. We provide the list of models, the data dictionary, and the name of the column containing the ground truth values.
# Initialize the Bayesian Model Combination
bmc = BayesianModelCombination(
models_list=["FRDM12", "HFB24", "D1M",
"UNEDF1", "BCPM", "AME2020"],
data_dict=data_dict,
truth_column_name="AME2020",
)
Before training, we orthogonalize the model predictions. This is a crucial step that improves the stability and performance of the Bayesian inference.
With the data prepared and the model orthogonalized, we can train the model combination. We use Gibbs sampling to infer the posterior distribution of the model weights.
4. Make Predictions
After training, we can use the predict2
method to generate predictions with uncertainty quantification. The method returns the full posterior draws, as well as DataFrames for the lower, median, and upper credible intervals.
# Make predictions with uncertainty quantification
rndm_m, lower_df, median_df, upper_df = bmc.predict2("BE")
# Display the first 5 rows of the median predictions
print(median_df.head())
5. Evaluate the Model
Finally, we can evaluate the performance of our model combination using the evaluate
method. This calculates the coverage of the credible intervals, which tells us how often the true values fall within the predicted intervals.