Data Format
Before starting the data analysis workflow, it is important to define the data formats used in EasyDiffraction.
Crystallographic Information File
Each software package typically uses its own data format and parameter names for storing and sharing data. In EasyDiffraction, we use the Crystallographic Information File (CIF) format, which is widely used in crystallography and materials science. It provides both a human-readable syntax and a set of dictionaries that define the meaning of each parameter.
These dictionaries are maintained by the
International Union of Crystallography (IUCr).
The base dictionary, coreCIF, contains the most common parameters in
crystallography. The pdCIF dictionary covers parameters specific to powder
diffraction, magCIF is used for magnetic structure analysis.
As most parameters needed for diffraction data analysis are already covered by IUCr dictionaries, EasyDiffraction uses the strict CIF format and follows these dictionaries as closely as possible — for both input and output — throughout the workflow described in the Analysis Workflow section.
The key advantage of CIF is the standardized naming of parameters and categories, which promotes interoperability and familiarity among researchers.
If a required parameter is not defined in the standard dictionaries, EasyDiffraction introduces custom CIF keywords, documented in the Parameters section under the CIF name for serialization columns.
Format Comparison
Below, we compare CIF with another common data format in programming: JSON.
Scientific Journals
Let's assume the following structural data for La₀.₅Ba₀.₅CoO₃ (LBCO), as reported in a scientific publication. These parameters are to be refined during diffraction data analysis:
Table 1. Crystallographic data. Space group: Pm3̅m.
Parameter | Value |
---|---|
a | 3.8909 |
b | 3.8909 |
c | 3.8909 |
alpha | 90.0 |
beta | 90.0 |
gamma | 90.0 |
Table 2. Atomic coordinates (x, y, z), occupancies (occ) and isotropic displacement parameters (Biso)
Label | Type | x | y | z | occ | Biso |
---|---|---|---|---|---|---|
La | La | 0 | 0 | 0 | 0.5 | 0.4958 |
Ba | Ba | 0 | 0 | 0 | 0.5 | 0.4958 |
Co | Co | 0.5 | 0.5 | 0.5 | 1.0 | 0.2567 |
O | O | 0 | 0.5 | 0.5 | 1.0 | 1.4041 |
CIF
The data above would be represented in CIF as follows:
data_lbco _space_group.name_H-M_alt "P m -3 m" _space_group.IT_coordinate_system_code 1 _cell.length_a 3.8909 _cell.length_b 3.8909 _cell.length_c 3.8909 _cell.angle_alpha 90 _cell.angle_beta 90 _cell.angle_gamma 90 loop_ _atom_site.label _atom_site.type_symbol _atom_site.fract_x _atom_site.fract_y _atom_site.fract_z _atom_site.Wyckoff_letter _atom_site.occupancy _atom_site.adp_type _atom_site.B_iso_or_equiv La La 0 0 0 a 0.5 Biso 0.4958 Ba Ba 0 0 0 a 0.5 Biso 0.4958 Co Co 0.5 0.5 0.5 b 1 Biso 0.2567 O O 0 0.5 0.5 c 1 Biso 1.4041
Here, unit cell parameters are grouped under the _cell
category, and atomic
positions under the _atom_site
category. The loop_
keyword indicates that
multiple rows follow for the listed parameters. Each atom is identified using
_atom_site.label
.
JSON
Representing the same data in JSON results in a format that is more verbose and less human-readable, especially for large datasets. JSON is ideal for structured data in programming environments, whereas CIF is better suited for human-readable crystallographic data.
{
"lbco": {
"space_group": {
"name_H-M_alt": "P m -3 m",
"IT_coordinate_system_code": 1
},
"cell": {
"length_a": 3.8909,
"length_b": 3.8909,
"length_c": 3.8909,
"angle_alpha": 90,
"angle_beta": 90,
"angle_gamma": 90
},
"atom_site": [
{
"label": "La",
"type_symbol": "La",
"fract_x": 0,
"fract_y": 0,
"fract_z": 0,
"occupancy": 0.5,
"B_iso_or_equiv": 0.4958
},
{
"label": "Ba",
"type_symbol": "Ba",
"fract_x": 0,
"fract_y": 0,
"fract_z": 0,
"occupancy": 0.5,
"B_iso_or_equiv": 0.4943
},
{
"label": "Co",
"type_symbol": "Co",
"fract_x": 0.5,
"fract_y": 0.5,
"fract_z": 0.5,
"occupancy": 1.0,
"B_iso_or_equiv": 0.2567
},
{
"label": "O",
"type_symbol": "O",
"fract_x": 0,
"fract_y": 0.5,
"fract_z": 0.5,
"occupancy": 1.0,
"B_iso_or_equiv": 1.4041
}
]
}
}
Experiment Definition
The previous example described the sample model (crystallographic model), but how is the experiment itself represented?
The experiment is also saved as a CIF file. For example, background intensity in a powder diffraction experiment might be represented as:
loop_ _pd_background.line_segment_X _pd_background.line_segment_intensity _pd_background.X_coordinate 10.0 174.3 2theta 20.0 159.8 2theta 30.0 167.9 2theta ...
More details on how to define the experiment in CIF format are provided in the Experiment section.
Other Input/Output Blocks
EasyDiffraction uses CIF consistently throughout its workflow, including in the following blocks:
- project: contains the project information
- sample model: defines the sample model
- experiment: contains the experiment setup and measured data
- analysis: stores fitting and analysis parameters
- summary: captures analysis results
Example CIF files for each block are provided in the Analysis Workflow and Tutorials.
Other Data Formats
While CIF is the primary format in EasyDiffraction, we also support other formats for importing measured data. These include plain text files with multiple columns. The meaning of the columns depends on the experiment type.
For example, in a standard constant-wavelength powder diffraction experiment:
- Column 1: 2θ angle
- Column 2: intensity
- Column 3: standard uncertainty of the intensity
More details on supported input formats are provided in the Experiment section.