Generating Data Objects#

Open In Colab

While PyDicer primarily deals with converting data objects from DICOM, there are instances where you may want to generate a new data object and have it integrated into your PyDicer dataset.

Some examples of when you may want to do this are: - Generate a new dose grid with EQD2 correction applied. - Generate a structure set of auto-segmented structures. - Generate a Pseudo-CT image from an MRI.

In this guide we show you how to generate new data objects to help you perform tasks such as those described in the examples mentioned above.

[1]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path
import SimpleITK as sitk

from pydicer.utils import fetch_converted_test_data, load_object_metadata, read_simple_itk_image

working_directory = fetch_converted_test_data("./testdata_hnscc", dataset="HNSCC")

pydicer = PyDicer(working_directory)
Working directory %s aready exists, won't download test data.

Generate Dose Objects#

In the following cell, we: - Iterate over each dose grid in our dataset - Load the dose grid using SimpleITK - Apply EQD2 dose correction (hard coded for demonstration purposes) - Save the corrected dose as a new object in our dataset

Once the dose object is saved, when you compute DVHs and dose metrics, this new dose will appear in that data.

[2]:
alpha_beta = 2

df = pydicer.read_converted_data()
df_doses = df[df["modality"] == "RTDOSE"]

for _, dose_row in df_doses.iterrows():
    if "EQD2_ab" in dose_row.hashed_uid:
        # This is an already scaled dose
        continue

    df_linked_plan = df[df["sop_instance_uid"] == dose_row.referenced_sop_instance_uid]

    linked_plan = df_linked_plan.iloc[0]
    ds_plan = load_object_metadata(linked_plan)

    # Read the planned fractions from the plan object
    fractions = int(ds_plan.FractionGroupSequence[0].NumberOfFractionsPlanned)

    print(f"{dose_row.patient_id} has {fractions} fractions")

    # Load the dose grid
    dose_path = Path(dose_row.path).joinpath("RTDOSE.nii.gz")
    dose = sitk.ReadImage(str(dose_path))
    dose = sitk.Cast(dose, sitk.sitkFloat64)

    dose_id = f"{dose_row.hashed_uid}_EQD2_ab{alpha_beta}"

    if len(df_doses[df_doses.hashed_uid == dose_id]) > 0:
        print(f"Already converted dose for {dose_id}")
        continue

    # Apply the EQD2 correction
    eqd2_dose = dose * (((dose / fractions) + alpha_beta) / (2 + alpha_beta))

    # Save off the new dose grid
    try:
        print(f"Saving dose grid with ID: {dose_id}")
        pydicer.add_dose_object(
            eqd2_dose, dose_id, dose_row.patient_id, linked_plan, dose_row.for_uid
        )
    except SystemError:
        print(f"Dose object {dose_id} already exists!")
HNSCC-01-0199 has 33 fractions
Saving dose grid with ID: c16e76_EQD2_ab2
HNSCC-01-0176 has 20 fractions
Saving dose grid with ID: 833a74_EQD2_ab2
HNSCC-01-0176 has 20 fractions
Saving dose grid with ID: bf3fba_EQD2_ab2
HNSCC-01-0019 has 21 fractions
Saving dose grid with ID: 309e1a_EQD2_ab2

Now we can load our data objects, and check that our new dose grids are stored alongside our converted data.

[3]:
df = pydicer.read_converted_data()
df[df.modality=="RTDOSE"]
[3]:
sop_instance_uid hashed_uid modality patient_id series_uid for_uid referenced_sop_instance_uid path
1 1.3.6.1.4.1.14519.5.2.1.1706.8040.264264397186... c16e76 RTDOSE HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.233527028792... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.287865632112... testdata_hnscc/data/HNSCC-01-0199/doses/c16e76
4 c16e76_EQD2_ab2 c16e76_EQD2_ab2 RTDOSE HNSCC-01-0199 c16e76_EQD2_ab2 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.287865632112... testdata_hnscc/data/HNSCC-01-0199/doses/c16e76...
9 1.3.6.1.4.1.14519.5.2.1.1706.8040.169033525924... 833a74 RTDOSE HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.279793773343... 1.3.6.1.4.1.14519.5.2.1.1706.8040.706719210726... 1.3.6.1.4.1.14519.5.2.1.1706.8040.470253980284... testdata_hnscc/data/HNSCC-01-0176/doses/833a74
10 1.3.6.1.4.1.14519.5.2.1.1706.8040.267291308489... bf3fba RTDOSE HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.283706688235... 1.3.6.1.4.1.14519.5.2.1.1706.8040.566662631858... 1.3.6.1.4.1.14519.5.2.1.1706.8040.173917268454... testdata_hnscc/data/HNSCC-01-0176/doses/bf3fba
15 833a74_EQD2_ab2 833a74_EQD2_ab2 RTDOSE HNSCC-01-0176 833a74_EQD2_ab2 1.3.6.1.4.1.14519.5.2.1.1706.8040.706719210726... 1.3.6.1.4.1.14519.5.2.1.1706.8040.470253980284... testdata_hnscc/data/HNSCC-01-0176/doses/833a74...
16 bf3fba_EQD2_ab2 bf3fba_EQD2_ab2 RTDOSE HNSCC-01-0176 bf3fba_EQD2_ab2 1.3.6.1.4.1.14519.5.2.1.1706.8040.566662631858... 1.3.6.1.4.1.14519.5.2.1.1706.8040.173917268454... testdata_hnscc/data/HNSCC-01-0176/doses/bf3fba...
18 1.3.6.1.4.1.14519.5.2.1.1706.8040.242809596262... 309e1a RTDOSE HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.777975715563... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.254865609982... testdata_hnscc/data/HNSCC-01-0019/doses/309e1a
21 309e1a_EQD2_ab2 309e1a_EQD2_ab2 RTDOSE HNSCC-01-0019 309e1a_EQD2_ab2 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.254865609982... testdata_hnscc/data/HNSCC-01-0019/doses/309e1a...

Generate Structure Set Objects#

In this example, we: - Iterate over each CT image in our dataset - Load the CT image using SimpleITK, and apply a threshold to segment bones - Save the segmented bones as a new structure set object

Note: This specific functionality is supported by the auto-segmentation inference module. If you are using this to generate auto-segmentations it is recommended you use that functionality.

[4]:
bone_threshold = 300 # Set threshold at 300 HU

df = pydicer.read_converted_data()
df_cts = df[df["modality"] == "CT"]

for idx, ct_row in df_cts.iterrows():

    # Load the image
    img = read_simple_itk_image(ct_row)

    # Apply the threshold
    bone_mask = img > bone_threshold

    # Save the mask in a new structure set
    structure_set_id = f"bones_{ct_row.hashed_uid}"
    new_structure_set = {
        "bones": bone_mask
    }


    try:
        print(f"Saving structure set with ID: {structure_set_id}")
        pydicer.add_structure_object(
            new_structure_set,
            structure_set_id,
            ct_row.patient_id,
            ct_row,
        )
    except SystemError:
        print(f"Structure Set {structure_set_id} already exists!")

Saving structure set with ID: bones_72b0f9
Saving structure set with ID: bones_c4ffd0
Saving structure set with ID: bones_8e0da9
Saving structure set with ID: bones_ec4aec
Saving structure set with ID: bones_33c44a
Saving structure set with ID: bones_b281ea

Let’s load out data objects to see if we have our new structure sets.

[5]:
df = pydicer.read_converted_data()
df[df.modality=="RTSTRUCT"]
[5]:
sop_instance_uid hashed_uid modality patient_id series_uid for_uid referenced_sop_instance_uid path
3 1.3.6.1.4.1.14519.5.2.1.1706.8040.166429645421... 06e49c RTSTRUCT HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.243934637013... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.240263316258... testdata_hnscc/data/HNSCC-01-0199/structures/0...
5 bones_72b0f9 bones_72b0f9 RTSTRUCT HNSCC-01-0199 bones_72b0f9 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.240263316258... testdata_hnscc/data/HNSCC-01-0199/structures/b...
14 1.3.6.1.4.1.14519.5.2.1.1706.8040.403955456521... cbbf5b RTSTRUCT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.276897558084... 1.3.6.1.4.1.14519.5.2.1.1706.8040.120880328745... 1.3.6.1.4.1.14519.5.2.1.1706.8040.334001018535... testdata_hnscc/data/HNSCC-01-0176/structures/c...
15 1.3.6.1.4.1.14519.5.2.1.1706.8040.323156708629... 6d2934 RTSTRUCT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.495627765798... 1.3.6.1.4.1.14519.5.2.1.1706.8040.310630617866... 1.3.6.1.4.1.14519.5.2.1.1706.8040.469610481459... testdata_hnscc/data/HNSCC-01-0176/structures/6...
18 bones_c4ffd0 bones_c4ffd0 RTSTRUCT HNSCC-01-0176 bones_c4ffd0 1.3.6.1.4.1.14519.5.2.1.1706.8040.120880328745... 1.3.6.1.4.1.14519.5.2.1.1706.8040.334001018535... testdata_hnscc/data/HNSCC-01-0176/structures/b...
19 bones_8e0da9 bones_8e0da9 RTSTRUCT HNSCC-01-0176 bones_8e0da9 1.3.6.1.4.1.14519.5.2.1.1706.8040.216161306702... 1.3.6.1.4.1.14519.5.2.1.1706.8040.107072817915... testdata_hnscc/data/HNSCC-01-0176/structures/b...
20 bones_ec4aec bones_ec4aec RTSTRUCT HNSCC-01-0176 bones_ec4aec 1.3.6.1.4.1.14519.5.2.1.1706.8040.216161306702... 1.3.6.1.4.1.14519.5.2.1.1706.8040.133948865586... testdata_hnscc/data/HNSCC-01-0176/structures/b...
21 bones_33c44a bones_33c44a RTSTRUCT HNSCC-01-0176 bones_33c44a 1.3.6.1.4.1.14519.5.2.1.1706.8040.310630617866... 1.3.6.1.4.1.14519.5.2.1.1706.8040.469610481459... testdata_hnscc/data/HNSCC-01-0176/structures/b...
25 1.3.6.1.4.1.14519.5.2.1.1706.8040.168221415040... 7cdcd9 RTSTRUCT HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.103450757970... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.418136430763... testdata_hnscc/data/HNSCC-01-0019/structures/7...
27 bones_b281ea bones_b281ea RTSTRUCT HNSCC-01-0019 bones_b281ea 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.418136430763... testdata_hnscc/data/HNSCC-01-0019/structures/b...

And we can also run the visualise module. Use the force=False flag to ensure that only the newly generated objects are visualised

[6]:
pydicer.visualise.visualise(force=False)
100%|██████████| 24/24 [00:46<00:00,  1.94s/objects, visualise]

Take a look inside the testdata_hnscc/data directory for the new structure set folders. See the visualised snapshot to check that our bone segmentation worked!

Generate Image Objects#

In this example, we: - Iterate over each CT image in our dataset - Load the CT image using SimpleITK, and apply a Laplacian Sharpening image filter to it - Save the image as a new image object

[7]:
df = pydicer.read_converted_data()
df_cts = df[df["modality"] == "CT"]

for idx, ct_row in df_cts.iterrows():

    # Load the image
    img = read_simple_itk_image(ct_row)

    # Sharpen the image
    img_sharp = sitk.LaplacianSharpening(img)

    # Save the sharpened image
    img_id = f"sharp_{ct_row.hashed_uid}"

    try:
        print(f"Saving image with ID: {img_id}")
        pydicer.add_image_object(
            img_sharp,
            img_id,
            ct_row.modality,
            ct_row.patient_id,
            for_uid=ct_row.for_uid
        )
    except SystemError:
        print(f"Image {img_id} already exists!")
Saving image with ID: sharp_72b0f9
Saving image with ID: sharp_c4ffd0
Saving image with ID: sharp_8e0da9
Saving image with ID: sharp_ec4aec
Saving image with ID: sharp_33c44a
Saving image with ID: sharp_b281ea

Now we can visualise the images and produce snapshots once more. Find the sharpened images in the working directory. Can you see the difference between the sharpened CT and the original?

[8]:
pydicer.visualise.visualise(force=False)
  0%|          | 0/30 [00:00<?, ?objects/s, visualise]/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
 20%|██        | 6/30 [00:00<00:01, 12.15objects/s, visualise]/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
 70%|███████   | 21/30 [00:00<00:00, 25.94objects/s, visualise]/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
 80%|████████  | 24/30 [00:01<00:00, 11.38objects/s, visualise]/home/runner/.cache/pypoetry/virtualenvs/pydicer-pNPN9n62-py3.9/lib/python3.9/site-packages/pydicom/valuerep.py:443: UserWarning: Invalid value for VR DA: 'NaT'.
  warnings.warn(msg)
100%|██████████| 30/30 [00:02<00:00, 13.53objects/s, visualise]
[ ]: