Preprocessing#

class pydicer.preprocess.data.PreprocessData(working_directory)#

Class for preprocessing the data information into a dicionary that holds the data in a structured hierarchy

Parameters:

working_directory (Path) – The pydicer working directory

preprocess(input_directory: Union[Path, list], force: bool = True) DataFrame#

Function to preprocess information regarding the data located in an Input working directory

Parameters:
  • input_directory (Path|list) – The directory (or list of directories) containing the DICOM input data

  • force (bool, optional) – When True, all files will be preprocessed. Otherwise only files not already scanned previously will be preprocessed. Defaults to True.

Returns: res_dict (pd.DataFrame): containing a row for each DICOM file that was
preprocessed, with the following columns:
  • patient_id: PatientID field from the DICOM header

  • study_uid: StudyInstanceUID field from the DICOM header

  • series_uid: SeriesInstanceUID field from the DICOM header

  • modality: Modailty field from the DICOM header

  • sop_class_uid: SOPClassUID field from the DICOM header

  • sop_instance_uid: SOPInstanceUID field from the DICOM header

  • for_uid: FrameOfReferenceUID field from the DICOM header

  • file_path: The path to the file (as a pathlib.Path object)

  • slice_location: The real-world location of the slice (used for imaging modalities)

  • referenced_uid: The SeriesUID referenced by this DICOM file for RTSTRUCT and RTDOSE, the SOPInstanceUID of the structure set referenced by an RTPLAN.

  • referenced_for_uid: The ReferencedFrameOfReferenceUID referenced by this DICOM file

scan_file(file: Union[str, Path]) dict#

Scan a DICOM file.

Parameters:

file (pathlib.Path|str) – The path to the file to scan.

Returns:

Returns the dict object containing the scanned information. None if the file

couldn’t be scanned.

Return type:

dict