Converting Data#

Open In Colab

In this example, the preprocessing and conversion of DICOM data is demonstrated. These are essential first steps before data can be analysed using PyDicer.

[1]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path

from pydicer.input.test import TestInput

Setup PyDicer#

As in the Getting Started example, we must first define a working directory for our dataset. We also create a PyDicer object.

[2]:
directory = Path("./working")
pydicer = PyDicer(directory)

Fetch some data#

A TestInput class is provided in pydicer to download some sample data to work with. Several other input classes exist if you’d like to retrieve DICOM data for conversion from somewhere else. See the docs for information on how the PyDicer input classes work.

Most commonly, if you have DICOM files stored within a folder on your file system you can simply pass the path to your DICOM directory to the pydicer.add_input() function.

[3]:
dicom_directory = directory.joinpath("dicom")
test_input = TestInput(dicom_directory)
test_input.fetch_data()

# Add the input DICOM location to the pydicer object
pydicer.add_input(dicom_directory)

Preprocess#

With some DICOM data ready to work with, we must first use the PyDicer preprocess module. This module will crawl over all DICOM data available and will index all information required for conversion of the data.

[4]:
pydicer.preprocess()
100%|██████████| 1309/1309 [00:03<00:00, 415.00files/s, preprocess]

Inspect Preprocessed Data#

Here we load the data that was indexed during preprocessing and output the first rows. This data will be used by the following step of data conversion.

[5]:
df_preprocessed = pydicer.read_preprocessed_data()
df_preprocessed.head()
[5]:
patient_id study_uid series_uid modality sop_class_uid sop_instance_uid for_uid file_path slice_location referenced_uid referenced_for_uid
1207 HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.797724702538... 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... CT 1.2.840.10008.5.1.4.1.1.2 1.3.6.1.4.1.14519.5.2.1.1706.8040.418136430763... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... working/dicom/HNSCC/HNSCC-01-0019/07-04-1998-N... -807.0 NaN NaN
1268 HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.797724702538... 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... CT 1.2.840.10008.5.1.4.1.1.2 1.3.6.1.4.1.14519.5.2.1.1706.8040.206018114826... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... working/dicom/HNSCC/HNSCC-01-0019/07-04-1998-N... -804.0 NaN NaN
1206 HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.797724702538... 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... CT 1.2.840.10008.5.1.4.1.1.2 1.3.6.1.4.1.14519.5.2.1.1706.8040.100785615013... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... working/dicom/HNSCC/HNSCC-01-0019/07-04-1998-N... -801.0 NaN NaN
1230 HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.797724702538... 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... CT 1.2.840.10008.5.1.4.1.1.2 1.3.6.1.4.1.14519.5.2.1.1706.8040.113351005230... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... working/dicom/HNSCC/HNSCC-01-0019/07-04-1998-N... -798.0 NaN NaN
1247 HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.797724702538... 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... CT 1.2.840.10008.5.1.4.1.1.2 1.3.6.1.4.1.14519.5.2.1.1706.8040.112029189313... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... working/dicom/HNSCC/HNSCC-01-0019/07-04-1998-N... -795.0 NaN NaN

Convert Data#

With the DICOM data having been indexed during preprocessing, we are now ready to convert this data into NIfTI format which will be stored within the PyDicer standard directory structure.

Running the following cell will begin the conversion process. While this cell is running, take a look inside the working/data directory to see how the converted data is being stored.

Notice the converted.csv file stored for each patient. This tracks each converted data object. This will be loaded as a Pandas DataFrame for use throughout PyDicer.

[6]:
pydicer.convert.convert()
Conversion Progress:  24%|██▍       | 5/21 [00:51<03:11, 11.95s/it]WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.3/itkImageSeriesReader.hxx, line 477
ImageSeriesReader (0x7f22b853e2b0): Non uniform sampling or missing slices detected,  maximum nonuniformity:0.000641026

Conversion Progress:  29%|██▊       | 6/21 [00:53<02:07,  8.48s/it]WARNING: In /tmp/SimpleITK-build/ITK-prefix/include/ITK-5.3/itkImageSeriesReader.hxx, line 477
ImageSeriesReader (0x7f22b853e2b0): Non uniform sampling or missing slices detected,  maximum nonuniformity:0.000641026

Conversion Progress: 100%|██████████| 21/21 [04:02<00:00, 11.54s/it]

Load Converted DataFrame#

Once data is converted, we can load a Pandas DataFrame which contains a description of each object converted.

The most useful columns in the DataFrame for working with this data in PyDicer are: - hashed_uid: This is a 6 character hexidecimal hash of the associated DICOM SeriesInstanceUID. PyDicer refers to objects using this hashed identifier for a more consice representation. - modality: The modality of the data object. - patient_id: The ID of the patient this data object belongs to. - path: The path within the working directory where files for this data object are stored.

[7]:
df = pydicer.read_converted_data()
df
[7]:
sop_instance_uid hashed_uid modality patient_id series_uid for_uid referenced_sop_instance_uid path
0 1.3.6.1.4.1.14519.5.2.1.1706.8040.334001018535... c4ffd0 CT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.151938046710... 1.3.6.1.4.1.14519.5.2.1.1706.8040.120880328745... NaN working/data/HNSCC-01-0176/images/c4ffd0
1 1.3.6.1.4.1.14519.5.2.1.1706.8040.107072817915... 8e0da9 CT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.176143398282... 1.3.6.1.4.1.14519.5.2.1.1706.8040.216161306702... NaN working/data/HNSCC-01-0176/images/8e0da9
2 1.3.6.1.4.1.14519.5.2.1.1706.8040.133948865586... ec4aec CT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.192899726585... 1.3.6.1.4.1.14519.5.2.1.1706.8040.216161306702... NaN working/data/HNSCC-01-0176/images/ec4aec
3 1.3.6.1.4.1.14519.5.2.1.1706.8040.469610481459... 33c44a CT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.244362210503... 1.3.6.1.4.1.14519.5.2.1.1706.8040.310630617866... NaN working/data/HNSCC-01-0176/images/33c44a
4 1.3.6.1.4.1.14519.5.2.1.1706.8040.169033525924... 833a74 RTDOSE HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.279793773343... 1.3.6.1.4.1.14519.5.2.1.1706.8040.706719210726... 1.3.6.1.4.1.14519.5.2.1.1706.8040.470253980284... working/data/HNSCC-01-0176/doses/833a74
5 1.3.6.1.4.1.14519.5.2.1.1706.8040.267291308489... bf3fba RTDOSE HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.283706688235... 1.3.6.1.4.1.14519.5.2.1.1706.8040.566662631858... 1.3.6.1.4.1.14519.5.2.1.1706.8040.173917268454... working/data/HNSCC-01-0176/doses/bf3fba
6 1.3.6.1.4.1.14519.5.2.1.1706.8040.173917268454... 6f7db7 RTPLAN HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.120111576192... 1.3.6.1.4.1.14519.5.2.1.1706.8040.566662631858... 1.3.6.1.4.1.14519.5.2.1.1706.8040.323156708629... working/data/HNSCC-01-0176/plans/6f7db7
7 1.3.6.1.4.1.14519.5.2.1.1706.8040.470253980284... a6b346 RTPLAN HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.318927873561... 1.3.6.1.4.1.14519.5.2.1.1706.8040.706719210726... 1.3.6.1.4.1.14519.5.2.1.1706.8040.403955456521... working/data/HNSCC-01-0176/plans/a6b346
8 1.3.6.1.4.1.14519.5.2.1.1706.8040.403955456521... cbbf5b RTSTRUCT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.276897558084... 1.3.6.1.4.1.14519.5.2.1.1706.8040.120880328745... 1.3.6.1.4.1.14519.5.2.1.1706.8040.334001018535... working/data/HNSCC-01-0176/structures/cbbf5b
9 1.3.6.1.4.1.14519.5.2.1.1706.8040.323156708629... 6d2934 RTSTRUCT HNSCC-01-0176 1.3.6.1.4.1.14519.5.2.1.1706.8040.495627765798... 1.3.6.1.4.1.14519.5.2.1.1706.8040.310630617866... 1.3.6.1.4.1.14519.5.2.1.1706.8040.469610481459... working/data/HNSCC-01-0176/structures/6d2934
10 1.3.6.1.4.1.14519.5.2.1.1706.8040.240263316258... 72b0f9 CT HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.261759476368... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... NaN working/data/HNSCC-01-0199/images/72b0f9
11 1.3.6.1.4.1.14519.5.2.1.1706.8040.264264397186... c16e76 RTDOSE HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.233527028792... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.287865632112... working/data/HNSCC-01-0199/doses/c16e76
12 1.3.6.1.4.1.14519.5.2.1.1706.8040.287865632112... 664e96 RTPLAN HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.137463901488... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.166429645421... working/data/HNSCC-01-0199/plans/664e96
13 1.3.6.1.4.1.14519.5.2.1.1706.8040.166429645421... 06e49c RTSTRUCT HNSCC-01-0199 1.3.6.1.4.1.14519.5.2.1.1706.8040.243934637013... 1.3.6.1.4.1.14519.5.2.1.1706.8040.870916135819... 1.3.6.1.4.1.14519.5.2.1.1706.8040.240263316258... working/data/HNSCC-01-0199/structures/06e49c
14 1.3.6.1.4.1.14519.5.2.1.1706.8040.418136430763... b281ea CT HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.233510441938... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... NaN working/data/HNSCC-01-0019/images/b281ea
15 1.3.6.1.4.1.14519.5.2.1.1706.8040.242809596262... 309e1a RTDOSE HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.777975715563... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.254865609982... working/data/HNSCC-01-0019/doses/309e1a
16 1.3.6.1.4.1.14519.5.2.1.1706.8040.254865609982... 57b99f RTPLAN HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.202542618630... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.168221415040... working/data/HNSCC-01-0019/plans/57b99f
17 1.3.6.1.4.1.14519.5.2.1.1706.8040.168221415040... 7cdcd9 RTSTRUCT HNSCC-01-0019 1.3.6.1.4.1.14519.5.2.1.1706.8040.103450757970... 1.3.6.1.4.1.14519.5.2.1.1706.8040.290727775603... 1.3.6.1.4.1.14519.5.2.1.1706.8040.418136430763... working/data/HNSCC-01-0019/structures/7cdcd9

Data Quarantine#

If anything goes wrong while converting a DICOM object during either the preprocess step or the conversion step, the problematic DICOM data will be copied to the working/quarantine directory.

It’s a good idea to regularly check your quarantine directory to ensure that no critical data objects are being quarantine. If so you may want to consider rectifying the issue and running the preprocess and conversion steps again.

As can be seen by running the cell below, there were several DICOM objects moved to the quarantine during for our test dataset. This was due to there being multiple slices at the same location with differing pixel data in one CT image series.

[8]:
df_quarantine = pydicer.read_quarantined_data()
df_quarantine
[8]:
file error quarantine_dttm PatientID Modality SOPInstanceUID SeriesDescription
0 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location 0.0 containing different ... 2025-03-13 20:34:21.029463 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.181695106907... SCOUT/NECK-ORAL/NASO W/CON
1 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location 0.0 containing different ... 2025-03-13 20:34:21.034055 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.258957568007... SCOUT/NECK-ORAL/NASO W/CON
2 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:21.048307 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.308207714344... BONE
3 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:21.053672 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.189167578552... BONE
4 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:21.058698 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.146032766668... BONE
... ... ... ... ... ... ... ...
607 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:25.587872 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.190466192108... CONTRAST120CC@3CC/S,90S DELAY
608 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:25.597725 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.209452648754... CONTRAST120CC@3CC/S,90S DELAY
609 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:25.607680 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.174557835738... CONTRAST120CC@3CC/S,90S DELAY
610 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:25.617611 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.113716820433... CONTRAST120CC@3CC/S,90S DELAY
611 working/dicom/HNSCC/HNSCC-01-0176/03-05-2004-N... 2 slices at location -155.0 containing differe... 2025-03-13 20:34:25.627458 HNSCC-01-0176 CT 1.3.6.1.4.1.14519.5.2.1.1706.8040.141027169157... CONTRAST120CC@3CC/S,90S DELAY

612 rows × 7 columns

[ ]: