Skip to content

Data Format

Before starting the data analysis workflow, it is important to define the data formats used in EasyDiffraction.

Crystallographic Information File

Each software package typically uses its own data format and parameter names for storing and sharing data. In EasyDiffraction, we use the Crystallographic Information File (CIF) format, which is widely used in crystallography and materials science. It provides both a human-readable syntax and a set of dictionaries that define the meaning of each parameter.

These dictionaries are maintained by the International Union of Crystallography (IUCr).
The base dictionary, coreCIF, contains the most common parameters in crystallography. The pdCIF dictionary covers parameters specific to powder diffraction, magCIF is used for magnetic structure analysis.

As most parameters needed for diffraction data analysis are already covered by IUCr dictionaries, EasyDiffraction uses the strict CIF format and follows these dictionaries as closely as possible — for both input and output — throughout the workflow described in the Analysis Workflow section.

The key advantage of CIF is the standardized naming of parameters and categories, which promotes interoperability and familiarity among researchers.

If a required parameter is not defined in the standard dictionaries, EasyDiffraction introduces custom CIF keywords, documented in the Parameters section under the CIF name for serialization columns.

Format Comparison

Below, we compare CIF with another common data format in programming: JSON.

Scientific Journals

Let's assume the following structural data for La₀.₅Ba₀.₅CoO₃ (LBCO), as reported in a scientific publication. These parameters are to be refined during diffraction data analysis:

Table 1. Crystallographic data. Space group: Pm3̅m.

Parameter Value
a 3.8909
b 3.8909
c 3.8909
alpha 90.0
beta 90.0
gamma 90.0

Table 2. Atomic coordinates (x, y, z), occupancies (occ) and isotropic displacement parameters (Biso)

Label Type x y z occ Biso
La La 0 0 0 0.5 0.4958
Ba Ba 0 0 0 0.5 0.4958
Co Co 0.5 0.5 0.5 1.0 0.2567
O O 0 0.5 0.5 1.0 1.4041

CIF

The data above would be represented in CIF as follows:

data_lbco

_space_group.name_H-M_alt              "P m -3 m"
_space_group.IT_coordinate_system_code 1

_cell.length_a      3.8909
_cell.length_b      3.8909
_cell.length_c      3.8909
_cell.angle_alpha  90
_cell.angle_beta   90
_cell.angle_gamma  90

loop_
_atom_site.label
_atom_site.type_symbol
_atom_site.fract_x
_atom_site.fract_y
_atom_site.fract_z
_atom_site.Wyckoff_letter
_atom_site.occupancy
_atom_site.adp_type
_atom_site.B_iso_or_equiv
La La   0   0   0     a   0.5  Biso 0.4958
Ba Ba   0   0   0     a   0.5  Biso 0.4958
Co Co   0.5 0.5 0.5   b   1    Biso 0.2567
O  O    0   0.5 0.5   c   1    Biso 1.4041

Here, unit cell parameters are grouped under the _cell category, and atomic positions under the _atom_site category. The loop_ keyword indicates that multiple rows follow for the listed parameters. Each atom is identified using _atom_site.label.

JSON

Representing the same data in JSON results in a format that is more verbose and less human-readable, especially for large datasets. JSON is ideal for structured data in programming environments, whereas CIF is better suited for human-readable crystallographic data.

{
  "lbco": {
    "space_group": {
      "name_H-M_alt": "P m -3 m",
      "IT_coordinate_system_code": 1
    },
    "cell": {
      "length_a": 3.8909,
      "length_b": 3.8909,
      "length_c": 3.8909,
      "angle_alpha": 90,
      "angle_beta": 90,
      "angle_gamma": 90
    },
    "atom_site": [
      {
        "label": "La",
        "type_symbol": "La",
        "fract_x": 0,
        "fract_y": 0,
        "fract_z": 0,
        "occupancy": 0.5,
        "B_iso_or_equiv": 0.4958
      },
      {
        "label": "Ba",
        "type_symbol": "Ba",
        "fract_x": 0,
        "fract_y": 0,
        "fract_z": 0,
        "occupancy": 0.5,
        "B_iso_or_equiv": 0.4943
      },
      {
        "label": "Co",
        "type_symbol": "Co",
        "fract_x": 0.5,
        "fract_y": 0.5,
        "fract_z": 0.5,
        "occupancy": 1.0,
        "B_iso_or_equiv": 0.2567
      },
      {
        "label": "O",
        "type_symbol": "O",
        "fract_x": 0,
        "fract_y": 0.5,
        "fract_z": 0.5,
        "occupancy": 1.0,
        "B_iso_or_equiv": 1.4041
      }
    ]
  }
}

Experiment Definition

The previous example described the sample model (crystallographic model), but how is the experiment itself represented?

The experiment is also saved as a CIF file. For example, background intensity in a powder diffraction experiment might be represented as:

loop_
_pd_background.line_segment_X
_pd_background.line_segment_intensity
_pd_background.X_coordinate

 10.0  174.3  2theta
 20.0  159.8  2theta
 30.0  167.9  2theta
 ...

More details on how to define the experiment in CIF format are provided in the Experiment section.

Other Input/Output Blocks

EasyDiffraction uses CIF consistently throughout its workflow, including in the following blocks:

  • project: contains the project information
  • sample model: defines the sample model
  • experiment: contains the experiment setup and measured data
  • analysis: stores fitting and analysis parameters
  • summary: captures analysis results

Example CIF files for each block are provided in the Analysis Workflow and Tutorials.

Other Data Formats

While CIF is the primary format in EasyDiffraction, we also support other formats for importing measured data. These include plain text files with multiple columns. The meaning of the columns depends on the experiment type.

For example, in a standard constant-wavelength powder diffraction experiment:

  • Column 1: 2θ angle
  • Column 2: intensity
  • Column 3: standard uncertainty of the intensity

More details on supported input formats are provided in the Experiment section.