Bayesian Analysis (emcee): LBCO, HRPT¶
This tutorial demonstrates a practical two-stage workflow for powder diffraction analysis with EasyDiffraction.
In the first stage, we run a fast local refinement to obtain a sensible point estimate and parameter uncertainties. In the second stage, we use these refined values to define fit bounds and then sample the posterior distribution with emcee.
The example uses constant-wavelength neutron powder diffraction data for La0.5Ba0.5CoO3 measured on HRPT at PSI.
The goal is not only to obtain a good fit, but also to answer Bayesian questions such as:
- Which parameter values are most probable?
- How broad are the credible intervals?
- Which parameters are strongly correlated?
- How much uncertainty propagates into the calculated diffraction pattern?
🛠️ Import Library¶
import easydiffraction as ed
📦 Define Project¶
The project object keeps structures, experiments, fit settings, and plotting utilities together in a single place. We will build the full workflow inside this object.
Save the project to a directory early on so that you can easily reload it later if needed.
project = ed.Project(name='lbco_hrpt_emcee')
project.save_as(dir_path='projects/ed_25_lbco_hrpt_emcee')
Saving project 📦 'lbco_hrpt_emcee' to '../../../projects/ed_25_lbco_hrpt_emcee'
├── 📄 project.cif
├── 📁 structures/
├── 📁 experiments/
├── 📁 analysis/
│ └── 📄 analysis.cif
└── 📁 reports/
└── 📄 lbco_hrpt_emcee.html
🧩 Define Structure¶
We define a simple cubic perovskite model for LBCO. La and Ba share the same crystallographic site with equal occupancy, while Co and O occupy the remaining ideal perovskite positions.
project.structures.create(name='lbco')
structure = project.structures['lbco']
structure.space_group.name_h_m = 'P m -3 m'
structure.space_group.it_coordinate_system_code = '1'
structure.cell.length_a = 3.88
The atom-site definitions below form the starting structural model. The parameters are intentionally reasonable rather than fully optimized, because the refinement step will improve them.
structure.atom_sites.create(
label='La',
type_symbol='La',
fract_x=0,
fract_y=0,
fract_z=0,
wyckoff_letter='a',
adp_type='Biso',
adp_iso=0.5151,
occupancy=0.5,
)
structure.atom_sites.create(
label='Ba',
type_symbol='Ba',
fract_x=0,
fract_y=0,
fract_z=0,
wyckoff_letter='a',
adp_type='Biso',
adp_iso=0.5151,
occupancy=0.5,
)
structure.atom_sites.create(
label='Co',
type_symbol='Co',
fract_x=0.5,
fract_y=0.5,
fract_z=0.5,
wyckoff_letter='b',
adp_type='Biso',
adp_iso=0.2190,
)
structure.atom_sites.create(
label='O',
type_symbol='O',
fract_x=0,
fract_y=0.5,
fract_z=0.5,
wyckoff_letter='c',
adp_type='Biso',
adp_iso=1.3916,
)
With the structural model complete, render it to confirm the perovskite framework before configuring the experiment.
project.display.structure(struct_name='lbco')
Structure 🧩 'lbco' (Atom view type: 'covalent')
wheel = zoom
right-drag = pan
🔬 Define Experiment¶
Next we download the measured powder pattern, create a neutron powder experiment, and configure the instrument, profile, background, and excluded regions.
Download the measured data from the repository. Alternatively, you could use your own data file by providing the path to it instead of downloading from the repository.
data_path = ed.download_data(id=3, destination='data')
Getting data...
Data #3: La0.5Ba0.5CoO3, HRPT (PSI), 300 K
✅ Data #3 already present at '../../../data/ed-3.xye'. Keeping existing.
Create the experiment object and specify the sample form, beam mode, and radiation probe.
project.experiments.add_from_data_path(
name='hrpt',
data_path=data_path,
sample_form='powder',
beam_mode='constant wavelength',
radiation_probe='neutron',
)
Data loaded successfully
Experiment 🔬 'hrpt'. Number of data points: 3098.
experiment = project.experiments['hrpt']
Link the structural phase to the experiment.
experiment.linked_phases.create(id='lbco', scale=9.1351)
Set instrument and peak profile parameters.
These values provide the initial instrument description for the local refinement. Later, a subset of them will be refined.
experiment.instrument.setup_wavelength = 1.494
experiment.instrument.calib_twotheta_offset = 0.0
experiment.peak.broad_gauss_u = 0.1
experiment.peak.broad_gauss_v = -0.1
experiment.peak.broad_gauss_w = 0.1204
experiment.peak.broad_lorentz_y = 0.0844
Add background points and excluded regions.
The line-segment background is defined by a few anchor points. We also exclude regions that are not intended to contribute to the fit.
experiment.background.create(id='1', x=10, y=168.5585)
experiment.background.create(id='2', x=30, y=164.3357)
experiment.background.create(id='3', x=50, y=166.8881)
experiment.background.create(id='4', x=110, y=175.4006)
experiment.excluded_regions.create(id='1', start=0, end=10)
experiment.excluded_regions.create(id='2', start=100, end=180)
🚀 Initial Refinement¶
Before Bayesian sampling, it is useful to run a deterministic fit. This gives us:
- a good point estimate near the best-fit region,
- uncertainties from the local optimizer,
- a quick check that the model and experiment are configured sensibly.
In this tutorial we refine only a small set of parameters that are easy to interpret in the later Bayesian stage.
structure.cell.length_a.free = True
experiment.linked_phases['lbco'].scale.free = True
experiment.peak.broad_gauss_u.free = True
experiment.peak.broad_gauss_v.free = True
experiment.instrument.calib_twotheta_offset.free = True
We keep LMFIT Levenberg-Marquardt minimizer as a fast local optimizer. Its main purpose here is to provide a stable starting point and uncertainty estimates for the Bayesian run.
project.analysis.minimizer.show_supported()
Minimizer types
| Type | Description | ||
|---|---|---|---|
| 1 | bumps | BUMPS library using the default Levenberg-Marquardt method | |
| 2 | bumps (amoeba) | BUMPS library with Nelder-Mead simplex method | |
| 3 | bumps (de) | BUMPS library with differential evolution method | |
| 4 | bumps (dream) | BUMPS library with DREAM Bayesian sampling | |
| 5 | bumps (lm) | BUMPS library with Levenberg-Marquardt method | |
| 6 | dfols | DFO-LS library for derivative-free least-squares optimization | |
| 7 | emcee | emcee affine-invariant ensemble Bayesian sampling | |
| 8 | lmfit | LMFIT library using the default Levenberg-Marquardt method | |
| 9 | lmfit (least_squares) | LMFIT library with SciPy's trust region reflective algorithm | |
| 10 | * | lmfit (leastsq) | LMFIT library with Levenberg-Marquardt least squares method |
project.analysis.fit()
Standard fitting
📋 Using experiment 🔬 'hrpt' for 'single' fitting
🚀 Starting fit process with 'lmfit (leastsq)'...
📈 Goodness-of-fit progress:
| iteration | time (s) | χ² | change / status | |
|---|---|---|---|---|
| 1 | 1 | 0.24 | 377.51 | |
| 2 | 9 | 0.49 | 56.42 | 85.1% ↓ |
| 3 | 15 | 0.67 | 39.43 | 30.1% ↓ |
| 4 | 22 | 0.91 | 38.63 | 2.0% ↓ |
| 5 | 28 | 1.12 | 23.34 | 39.6% ↓ |
| 6 | 34 | 1.31 | 6.18 | 73.5% ↓ |
| 7 | 40 | 1.50 | 2.73 | 55.8% ↓ |
| 8 | 46 | 1.68 | 1.32 | 51.7% ↓ |
| 9 | 52 | 1.86 | 1.29 | 2.1% ↓ |
| 10 | 71 | 2.47 | 1.29 |
🏆 Best goodness-of-fit (reduced χ²) is 1.29 at iteration 70
✅ Fitting complete.
Saving project 📦 'lbco_hrpt_emcee' to '../../../projects/ed_25_lbco_hrpt_emcee'
├── 📄 project.cif
├── 📁 structures/
│ └── 📄 lbco.cif
├── 📁 experiments/
│ └── 📄 hrpt.cif
├── 📁 analysis/
│ └── 📄 analysis.cif
└── 📁 reports/
└── 📄 lbco_hrpt_emcee.html
project.display.fit.results()
⚙️ Settings used:
| Name | Value | Description | |
|---|---|---|---|
| 1 | max_iterations | 1000 | Maximum solver iterations. |
📋 Least-squares fit results:
| Metric | Value | |
|---|---|---|
| 1 | 🧪 Minimizer | lmfit (leastsq) |
| 2 | ✅ Overall status | success |
| 3 | ⏱️ Fitting time (seconds) | 2.47 |
| 4 | 🔁 Iterations | 68 |
| 5 | 📏 Goodness-of-fit (reduced χ²) | 1.29 |
| 6 | 📏 R-factor (Rf, %) | 5.65 |
| 7 | 📏 R-factor squared (Rf², %) | 4.92 |
| 8 | 📏 Weighted R-factor (wR, %) | 4.08 |
📈 Refined parameters:
| datablock | category | entry | parameter | units | start | value | s.u. | change | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | lbco | cell | length_a | Å | 3.8800 | 3.8913 | 0.0001 | 0.29 % ↑ | |
| 2 | hrpt | linked_phases | lbco | scale | 9.1351 | 9.1330 | 0.0333 | 0.02 % ↓ | |
| 3 | hrpt | peak | broad_gauss_u | deg² | 0.1000 | 0.0816 | 0.0078 | 18.41 % ↓ | |
| 4 | hrpt | peak | broad_gauss_v | deg² | -0.1000 | -0.1169 | 0.0057 | 16.86 % ↑ | |
| 5 | hrpt | instrument | twotheta_offset | deg | 0.0000 | 0.6303 | 0.0019 | N/A |
• value = refined value from least-squares minimization
• s.u. = standard uncertainty (one sigma), from the covariance matrix
• change = relative change from start, in %; ↑ = increase, ↓ = decrease
The correlation plot shows how strongly the fitted parameters move together in the local refinement. The measured-vs-calculated plots show how well the refined model reproduces the data globally and in a zoomed region.
project.display.fit.correlations()
project.display.pattern(expt_name='hrpt')
🎲 Prepare Sampling¶
Bayesian samplers require finite bounds for the free parameters. Instead of setting them manually, we derive them from the uncertainties estimated in the local refinement.
The helper method set_fit_bounds_from_uncertainty centers the bounds
on the current parameter value and expands them by a chosen multiple of
the reported uncertainty.
The default multiplier is 4. If the local refinement is very tight,
or if you expect a broader posterior, increase it explicitly.
Show unset fit bounds before setting them from the local refinement uncertainties.
project.display.parameters.free()
Free parameters for both structures (🧩 data blocks) and experiments (🔬 data blocks)
| datablock | category | entry | parameter | value | uncertainty | min | max | units | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | lbco | cell | length_a | 3.89132 | 0.00011 | -inf | inf | Å | |
| 2 | hrpt | linked_phases | lbco | scale | 9.13298 | 0.03329 | -inf | inf | |
| 3 | hrpt | peak | broad_gauss_u | 0.08159 | 0.00783 | -inf | inf | deg² | |
| 4 | hrpt | peak | broad_gauss_v | -0.11686 | 0.00566 | -inf | inf | deg² | |
| 5 | hrpt | instrument | twotheta_offset | 0.63028 | 0.00191 | -inf | inf | deg |
Set fit bounds for all free parameters using the default multiplier of
4. In this tutorial that means the posterior pair plot will later
refer to a ±4 × uncertainty region in its title. To use a different
region, pass another value, for example multiplier=6.
for param in project.free_parameters:
param.set_fit_bounds_from_uncertainty()
Displaying the free parameters again is a convenient way to confirm that the fit bounds have been assigned as expected before launching the sampler.
project.display.parameters.free()
Free parameters for both structures (🧩 data blocks) and experiments (🔬 data blocks)
| datablock | category | entry | parameter | value | uncertainty | min | max | units | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | lbco | cell | length_a | 3.89132 | 0.00011 | 3.89089 | 3.89175 | Å | |
| 2 | hrpt | linked_phases | lbco | scale | 9.13298 | 0.03329 | 8.99984 | 9.26613 | |
| 3 | hrpt | peak | broad_gauss_u | 0.08159 | 0.00783 | 0.05027 | 0.11292 | deg² | |
| 4 | hrpt | peak | broad_gauss_v | -0.11686 | 0.00566 | -0.13950 | -0.09422 | deg² | |
| 5 | hrpt | instrument | twotheta_offset | 0.63028 | 0.00191 | 0.62263 | 0.63794 | deg |
🎲 Run Sampling¶
We now switch from the local minimizer to the Bayesian emcee sampler.
The settings below are intentionally small so the tutorial runs quickly. For production analysis you would usually increase the number of steps and often the burn-in as well. emcee also lets you tune how walkers are initialized, how many walkers are used, and which proposal move drives the ensemble.
The burn setting is auto-resolved when left unset. Here we override
steps with a smaller value to keep the tutorial fast, and the
effective burn-in is recomputed automatically.
project.analysis.minimizer.show_supported()
Minimizer types
| Type | Description | ||
|---|---|---|---|
| 1 | bumps | BUMPS library using the default Levenberg-Marquardt method | |
| 2 | bumps (amoeba) | BUMPS library with Nelder-Mead simplex method | |
| 3 | bumps (de) | BUMPS library with differential evolution method | |
| 4 | bumps (dream) | BUMPS library with DREAM Bayesian sampling | |
| 5 | bumps (lm) | BUMPS library with Levenberg-Marquardt method | |
| 6 | dfols | DFO-LS library for derivative-free least-squares optimization | |
| 7 | emcee | emcee affine-invariant ensemble Bayesian sampling | |
| 8 | lmfit | LMFIT library using the default Levenberg-Marquardt method | |
| 9 | lmfit (least_squares) | LMFIT library with SciPy's trust region reflective algorithm | |
| 10 | * | lmfit (leastsq) | LMFIT library with Levenberg-Marquardt least squares method |
project.analysis.minimizer.type = 'emcee'
⚠️ Switching minimizer type removes these settings: • max_iterations
⚠️ Switching minimizer type adds these settings with defaults: • burn_in_steps=1000 • initialization_method='ball' • parallel_workers=0 • population_size=32 • proposal_moves='de' • random_seed=None • sampling_steps=5000 • thinning_interval=1
Current minimizer changed to
emcee
project.analysis.minimizer.sampling_steps = 100 # lower than the default 5000
project.analysis.minimizer.burn_in_steps = 20 # lower than the default 1000
project.analysis.minimizer.population_size = 16 # lower than the default 32
project.analysis.minimizer.random_seed = 42 # fixed seed for reproducible output
project.analysis.fit()
Standard fitting
📋 Using experiment 🔬 'hrpt' for 'single' fitting
🚀 Starting fit process with 'emcee'...
📈 Bayesian sampling progress:
| iteration | progress | time (s) | log posterior | phase | |
|---|---|---|---|---|---|
| 1 | 1/121 | 0.70 | -1156.99 | pre-processing | |
| 2 | 7/121 | 5.8% | 2.17 | -1156.99 | burn-in |
| 3 | 14/121 | 11.6% | 3.75 | -1157.01 | burn-in |
| 4 | 20/121 | 16.5% | 5.09 | -1157.03 | burn-in |
| 5 | 21/121 | 17.4% | 5.31 | -1157.03 | sampling |
| 6 | 26/121 | 21.5% | 6.43 | -1157.14 | sampling |
| 7 | 31/121 | 25.6% | 7.91 | -1157.28 | sampling |
| 8 | 36/121 | 29.8% | 9.63 | -1157.15 | sampling |
| 9 | 41/121 | 33.9% | 10.76 | -1157.12 | sampling |
| 10 | 46/121 | 38.0% | 11.88 | -1157.66 | sampling |
| 11 | 51/121 | 42.1% | 13.01 | -1157.27 | sampling |
| 12 | 56/121 | 46.3% | 14.19 | -1157.33 | sampling |
| 13 | 61/121 | 50.4% | 15.40 | -1157.33 | sampling |
| 14 | 66/121 | 54.5% | 16.67 | -1157.31 | sampling |
| 15 | 71/121 | 58.7% | 18.09 | -1157.72 | sampling |
| 16 | 76/121 | 62.8% | 19.61 | -1157.97 | sampling |
| 17 | 81/121 | 66.9% | 21.14 | -1157.05 | sampling |
| 18 | 86/121 | 71.1% | 22.61 | -1157.05 | sampling |
| 19 | 91/121 | 75.2% | 24.01 | -1157.05 | sampling |
| 20 | 96/121 | 79.3% | 25.51 | -1157.05 | sampling |
| 21 | 101/121 | 83.5% | 26.67 | -1157.05 | sampling |
| 22 | 106/121 | 87.6% | 27.84 | -1157.49 | sampling |
| 23 | 111/121 | 91.7% | 29.17 | -1157.49 | sampling |
| 24 | 116/121 | 95.9% | 30.39 | -1157.70 | sampling |
| 25 | 121/121 | 100.0% | 31.60 | -1157.27 | sampling |
| 26 | 70.09 | post-processing |
✅ Bayesian sampling complete.
⚠️ Convergence diagnostics indicate the posterior may be poorly mixed.
Saving project 📦 'lbco_hrpt_emcee' to '../../../projects/ed_25_lbco_hrpt_emcee'
├── 📄 project.cif
├── 📁 structures/
│ └── 📄 lbco.cif
├── 📁 experiments/
│ └── 📄 hrpt.cif
├── 📁 analysis/
│ ├── 📄 analysis.cif
│ └── 📄 results.h5
└── 📁 reports/
└── 📄 lbco_hrpt_emcee.html
📊 Inspect Results¶
The fit-results display now includes sampler settings, convergence diagnostics, committed parameter values, and posterior summary statistics.
project.display.fit.results()
⚙️ Settings used:
| Name | Value | Description | |
|---|---|---|---|
| 1 | sampling_steps | 100 | Total sampler iterations per chain. |
| 2 | burn_in_steps | 20 | Sampler iterations discarded as warm-up. |
| 3 | thinning_interval | 1 | Sampler thinning interval. |
| 4 | population_size | 16 | Number of chains or walkers. |
| 5 | parallel_workers | 0 | Worker count; 0 uses all available CPUs. |
| 6 | initialization_method | ball | emcee walker initialization method. |
| 7 | random_seed | 42 | Random seed; None uses a system-derived seed. |
| 8 | proposal_moves | de | Single emcee proposal move; move mixtures are not persisted in v1. |
📋 Bayesian fit results:
| Metric | Value | |
|---|---|---|
| 1 | 🧪 Sampler | emcee |
| 2 | ❌ Overall status | failed |
| 3 | 💬 Engine message | emcee sampling completed |
| 4 | ⏱️ Fitting time (seconds) | 70.09 |
| 5 | 📏 Goodness-of-fit (reduced χ²) | 1.29 |
| 6 | 📏 R-factor (Rf, %) | 5.66 |
| 7 | 📏 R-factor squared (Rf², %) | 4.93 |
| 8 | 📏 Weighted R-factor (wR, %) | 4.10 |
| 9 | 📉 Best log-posterior | -1157.03 |
| 10 | 📊 Convergence status | failed |
| 11 | 📊 Max r-hat | 1.274 |
| 12 | 📊 Min ess bulk | 139.630 |
| 13 | 📊 Draws per chain | 101 |
| 14 | 📊 Chains | 16 |
📈 Committed parameters:
| datablock | category | entry | parameter | units | start | value | s.u. | change | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | lbco | cell | length_a | Å | 3.8913 | 3.8913 | 0.0001 | 0.00 % ↑ | |
| 2 | hrpt | linked_phases | lbco | scale | 9.1330 | 9.1319 | 0.0260 | 0.01 % ↓ | |
| 3 | hrpt | peak | broad_gauss_u | deg² | 0.0816 | 0.0814 | 0.0061 | 0.29 % ↓ | |
| 4 | hrpt | peak | broad_gauss_v | deg² | -0.1169 | -0.1169 | 0.0045 | 0.06 % ↑ | |
| 5 | hrpt | instrument | twotheta_offset | deg | 0.6303 | 0.6303 | 0.0016 | 0.00 % ↓ |
• value = estimate written back to the project (best posterior sample)
• s.u. = standard uncertainty (one sigma), posterior standard deviation
• change = relative change from start, in %; ↑ = increase, ↓ = decrease
📊 Posterior distribution:
| datablock | category | entry | parameter | units | median | 95% CI | r-hat | ess bulk | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | lbco | cell | length_a | Å | 3.8913 | [3.8911, 3.8915] | 1.128 | 174.5 | |
| 2 | hrpt | linked_phases | lbco | scale | 9.1353 | [9.0918, 9.1838] | 1.274 | 159.7 | |
| 3 | hrpt | peak | broad_gauss_u | deg² | 0.0817 | [0.0685, 0.0929] | 1.226 | 139.6 | |
| 4 | hrpt | peak | broad_gauss_v | deg² | -0.1168 | [-0.1250, -0.1060] | 1.197 | 141.3 | |
| 5 | hrpt | instrument | twotheta_offset | deg | 0.6304 | [0.6270, 0.6339] | 1.120 | 178.7 |
• 95% CI = 95% credible interval (2.5%-97.5%, asymmetric)
• r-hat = Gelman-Rubin diagnostic (good convergence: r-hat <= 1.01)
• ess bulk = bulk effective sample size (typically >= 400)
⚠️ ess bulk < 400: Consider longer sampling or reparameterization.
The correlation and posterior-pair plots are complementary:
plot_param_correlationssummarizes pairwise structure in a compact matrix.plot_posterior_pairsshows marginal densities on the diagonal and posterior contours off-diagonal. In this tutorial its title also reminds you that the display region follows the±4 × uncertaintybounds defined above, while numeric subplot ranges are omitted to keep the grid readable.
project.display.fit.correlations()
project.display.posterior.pairs()
The one-dimensional posterior distributions below make it easier to inspect individual parameters in isolation, including asymmetry or multimodality.
project.display.posterior.distribution()
Finally, the posterior predictive plot propagates the sampled parameter uncertainty into the calculated diffraction pattern. Comparing this to the zoomed measured-vs-calculated view helps assess whether the sampled model family explains the data in the region of interest.
project.display.posterior.predictive(expt_name='hrpt')
A final zoomed measured-vs-calculated plot is useful for checking how the posterior-supported model behaves in a narrow region of the pattern after the Bayesian run.
project.display.posterior.predictive(expt_name='hrpt', x_min=92, x_max=93)