Working with Structures#

Open In Colab

In PyDicer, structure sets which have been converted from the DICOM RTSTRUCT modality are stored within the directory structure. One common issue when working with real world datasets is that structure names are often inconsistent requiring standardisation of names prior to analysing data.

In PyDicer, structure name standardisation is achieved by defining structure mapping dictionaries which can be stored globally (applied to all structure sets) or locally (specific mapping per structure set or per patient).

In this guide we present some examples on how to define such structure name mappings and will introduce the StructureSet class which simplifies loading and working with structure objects.

[1]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path

import SimpleITK as sitk

from pydicer.utils import fetch_converted_test_data, add_structure_name_mapping
from pydicer.dataset.structureset import StructureSet

Setup PyDicer#

Here we load the LCTSC data which has already been converted. This is downloaded into the testdata_lctsc directory. We also initialise a PyDicer object.

[2]:
working_directory = fetch_converted_test_data("./testdata_lctsc", dataset="LCTSC")

pydicer = PyDicer(working_directory)
Working directory %s aready exists, won't download test data.

Load Structures with StructureSet#

With the StructureSet, we can load structures in a structure set, with the structure name being the key and the SimpleITK Image of the mask as the value.

In the following cell, we create a StructureSet object, determine the names of the structures in that structure set, and iterate over each structure, printing the sum of all voxel values in the mask (for demonstration purposes).

[3]:
# Load the converted data
df = pydicer.read_converted_data()
df_structs = df[df.modality=="RTSTRUCT"]

# Create a StructureSet for the first row
struct_row = df_structs.iloc[0]
structure_set = StructureSet(struct_row)

structure_names = structure_set.structure_names
print(f"Structure names: {structure_names}")

for structure in structure_names:
    mask = sitk.GetArrayFromImage(structure_set[structure])
    print(f"Mask voxel sum for {structure}: {mask.sum()}")
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
Mask voxel sum for Lung_R: 70924
Mask voxel sum for SpinalCord: 2785
Mask voxel sum for Esophagus: 1601
Mask voxel sum for Heart: 23415
Mask voxel sum for Lung_L: 53272

In the next cell, we iterate over all our structure sets, and print out the names of the structures available. Notice that for some structure sets, structures aren’t named consistently. In the next section we will resolve this with a structure name mapping.

[4]:
df = pydicer.read_converted_data()
df_structs = df[df.modality=="RTSTRUCT"]

for idx, struct_row in df_structs.iterrows():
    structure_set = StructureSet(struct_row)

    structure_names = structure_set.structure_names
    print(f"Patient: {struct_row.patient_id}, Structure names: {structure_names}")
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-006, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-008, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Test-S1-102, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-005, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'L_Lung']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-004, Structure names: ['Lung_R', 'SC', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-001, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-002, Structure names: ['SpinalCord', 'Esophagus', 'Lung_Left', 'Heart', 'Lung_Right']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-003, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Test-S1-101, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id default
Patient: LCTSC-Train-S1-007, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']

Add Structure Name Mapping#

Structure name mappings are defined as Python dictionaries, with the standardised structure name as the key, and the value a list of name variations which should map to the standardised name.

Use the add_structure_name_mapping to add a mapping. A mapping_id may be supplied to refer to different mappings. If no mapping_id is supplied, a default mapping id is used.

If a structure_set_row or patient_id is supplied, then the mapping will be stored at the corresponding level. If neither is supplied, the mapping will be stored globally for all structure sets in the datasets.

[5]:
mapping = {
    "Esophagus": [],
    "Heart": [],
    "Lung_L": ["Lung_Left"],
    "Lung_R": ["Lung_Right"],
    "SpinalCord": ["SC"],
}
[6]:
pydicer.add_structure_name_mapping(mapping)
pydicer.utils - INFO - Adding mapping for project in testdata_lctsc/.pydicer

The default mapping has been saved. You can find the saved mapping in the testdata_lctsc/.pydicer/.structure_set_mappings directory.

Now we can check our StructureSet to confirm the names are mapped properly.

[7]:
for idx, struct_row in df_structs.iterrows():
    structure_set = StructureSet(struct_row)

    structure_names = structure_set.structure_names
    print(f"Patient: {struct_row.patient_id}, Structure names: {structure_names}")
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-006, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-008, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Test-S1-102, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-005, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-004, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-001, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-002, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-003, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Test-S1-101, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/default.json
Patient: LCTSC-Train-S1-007, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']

Subsets of Structures#

Structure name mappings are also useful if you only want to work with a subset of structures available. Simply leave them out of the mapping entirely, and they won’t be loaded as part of the StructureSet.

In this example, we use a mapping_id of struct_subset to keep this mapping separate from the mapping defined above.

[8]:
mapping_id = "struct_subset"
sub_mapping = {
    "Lung_L": ["Lung_Left"],
    "Lung_R": ["Lung_Right"],
}

pydicer.add_structure_name_mapping(sub_mapping, mapping_id=mapping_id)

for idx, struct_row in df_structs.iterrows():
    structure_set = StructureSet(struct_row, mapping_id=mapping_id) # Provide the mapping_id!

    structure_names = structure_set.structure_names
    print(f"Patient: {struct_row.patient_id}, Structure names: {structure_names}")
pydicer.utils - INFO - Adding mapping for project in testdata_lctsc/.pydicer
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-006, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-008, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Test-S1-102, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-005, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-004, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-001, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-002, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-003, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Test-S1-101, Structure names: ['Lung_L', 'Lung_R']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/.pydicer/.structure_set_mappings/struct_subset.json
Patient: LCTSC-Train-S1-007, Structure names: ['Lung_L', 'Lung_R']

Local Structure Name Mappings#

Next, we only specify a mapping for one specific structure set. We will use local_mapping as the mapping_id. In the output you will see that only one structure set has had the mapping applied.

[9]:
mapping_id = "local_mapping"
mapping = {
    "Esophagus": [],
    "Heart": [],
    "Lung_L": ["Lung_Left"],
    "Lung_R": ["Lung_Right"],
    "SpinalCord": ["SC"],
}

struct_row = df[(df.patient_id=="LCTSC-Train-S1-002") & (df.modality=="RTSTRUCT")].iloc[0]

# Only adding mapping for one structure set
pydicer.add_structure_name_mapping(
    mapping,
    mapping_id=mapping_id,
    structure_set_row=struct_row
)

for idx, struct_row in df_structs.iterrows():
    structure_set = StructureSet(struct_row, mapping_id=mapping_id) # Provide the mapping_id!

    structure_names = structure_set.structure_names
    print(f"Patient: {struct_row.patient_id}, Structure names: {structure_names}")
pydicer.utils - INFO - Adding mapping local_mapping for structure set f036b8
pydicer.utils - INFO - Adding mapping for stucture_set in testdata_lctsc/data/LCTSC-Train-S1-002/structures/f036b8
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-006, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-008, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Test-S1-102, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-005, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'L_Lung']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-004, Structure names: ['Lung_R', 'SC', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-001, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - DEBUG - Using mapping file in testdata_lctsc/data/LCTSC-Train-S1-002/structures/f036b8/.structure_set_mappings/local_mapping.json
Patient: LCTSC-Train-S1-002, Structure names: ['Esophagus', 'Heart', 'Lung_L', 'Lung_R', 'SpinalCord']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-003, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Test-S1-101, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']
pydicer.dataset.structureset - WARNING - No mapping file found with id local_mapping
Patient: LCTSC-Train-S1-007, Structure names: ['Lung_R', 'SpinalCord', 'Esophagus', 'Heart', 'Lung_L']

Notice that mapping has only been applied to the structure set for patient LCTSC-Train-S1-002

Using Mappings in PyDicer#

Once mappings are defined, these can be used when you: - Compute Dose Metrics - Fetch Radiomics Features - Analyse Auto-segmentations - Prepare data for nnUNet training

Check out the documentation for those modules to see where you can supply your mapping_id to have the structure set standardisation applied. If you have used the default mapping_id, the standardisation will be applied automatically.

[ ]: