Analyse#

class pydicer.analyse.data.AnalyseData(working_directory='.')#

Class that performs common analysis on converted data

Parameters:

working_directory (str|pathlib.Path, optional) – Directory in which data is stored. Defaults to “.”.

compute_dose_metrics(dataset_name: str = 'data', patient: Optional[str] = None, df_process: Optional[DataFrame] = None, d_point: Optional[Union[list, float, int]] = None, v_point: Optional[Union[list, float, int]] = None, d_cc_point: Optional[Union[list, float, int]] = None, structure_mapping_id: str = 'default') DataFrame#

Compute Dose metrics from a DVH

Parameters:
  • dataset_name (str, optional) – The name of the dataset from which to extract dose metrics. Defaults to “data”.

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to compute dose metrics for. Must be None if df_process is provided. Defaults to None.

  • df_process (pd.DataFrame, optional) – A DataFrame of the objects to compute dose metrics for. Must be None if patient is provided. Defaults to None.

  • d_point (float|int|list, optional) – The point or list of points at which to compute the D metric. E.g. to compute D50, D95 and D99, supply [50, 95, 99]. Defaults to None.

  • v_point (float|int|list, optional) – The point or list of points at which to compute the V metric. E.g. to compute V5, V10 and V50, supply [5, 10, 50]. Defaults to None.

  • d_cc_point (float|int|list, optional) – The point or list of points at which to compute the Dcc metric. E.g. to compute Dcc5, Dcc10 and Dcc50, supply [5, 10, 50]. Defaults to None.

  • structure_mapping_id (str, optional) – ID of a structure mapping to use. Structure names will be replaced with this mapping if it is found, structures not in the mapping will be excluded. Defaults to ‘default’.

Raises:
  • ValueError – One of d_point, v_point or d_cc_point should be set

  • ValueError – Points must be of type float or int

Returns:

The DataFrame containing the requested metrics.

Return type:

pd.DataFrame

compute_dvh(dataset_name='data', patient=None, df_process=None, force=True, bin_width=0.1, structure_meta_data_cols=None, dose_meta_data_cols=None)#

Compute the Dose Volume Histogram (DVH) for dose volumes and linked structures.

Parameters:
  • dataset_name (str, optional) – The name of the dataset to compute DVHs on. Defaults to “data” (runs on all data).

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to compute DVH for. Must be None if df_process is provided. Defaults to None.

  • df_process (pd.DataFrame, optional) – A DataFrame of the objects to compute radiomics for. Must be None if patient is provided. Defaults to None.

  • force (bool, optional) – When True, DVHs will be recomputed even if the output file already exists. Defaults to True.

  • bin_width (float, optional) – The bin width of the Dose Volume Histogram.

  • structure_meta_data_cols (list, optional) – A list of DICOM tags which will be extracted from the structure DICOM headers and included in the resulting table of DVHs. Defaults to None.

  • dose_meta_data_cols (list, optional) – A list of DICOM tags which will be extracted from the Dose DICOM headers and included in the resulting table of DVHs. Defaults to None.

Raises:

ValueError – Raised if patient is not None, a list of strings or a string.

compute_radiomics(dataset_name='data', patient=None, df_process=None, force=True, radiomics=None, settings=None, structure_match_regex=None, structure_meta_data=None, image_meta_data=None, resample_to_image=False, custom_radiomics=None)#

Compute radiomics for the data in the working directory. Results are saved as csv files in the structure directories processed.

Parameters:
  • dataset_name (str, optional) – The name of the dataset to compute radiomics on. Defaults to “data” (runs on all data).

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to compute radiomics for. Must be None if df_process is provided. Defaults to None.

  • df_process (pd.DataFrame, optional) – A DataFrame of the objects to compute radiomics for. Must be None if patient is provided. Defaults to None.

  • force (bool, optional) – When True, radiomics will be recomputed even if the output file already exists. Defaults to True.

  • radiomics (dict, optional) – A dictionary of the pyradiomics to compute. Format should have the radiomic class name as the key and a list of feature names in the value. See https://pyradiomics.readthedocs.io/en/latest/features.html for more information. Defaults to all first order features.

  • settings (dict, optional) – Settings to pass to pyradiomics. Defaults to PYRAD_DEFAULT_SETTINGS.

  • structure_match_regex (str, optional) – Regular expression to select structures to compute radiomics for. Defaults to None.

  • structure_meta_data (list, optional) – A list of DICOM tags which will be extracted from the structure DICOM headers and included in the resulting table of radiomics. Defaults to None.

  • image_meta_data (list, optional) – A list of DICOM tags which will be extracted from the image DICOM headers and included in the resulting table of radiomics. Defaults to None.

  • resample_to_image (bool, optional) – Define if mask should be resampled to image. If not the image will be resampled to mask. Defaults to False.

Raises:

ValueError – Raised if patient is not None, a list of strings or a string.

get_all_computed_radiomics_for_dataset(dataset_name='data', patient=None, structure_mapping_id='default')#

Return a DataFrame of radiomics computed for this dataset

Parameters:
  • dataset_name (str, optional) – The name of the dataset on which to run analysis. Defaults to “data”.

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to fetch radiomics for. Defaults to None.

  • structure_mapping_id (str, optional) – ID of a structure mapping to use. Structure names will be replaced with this mapping if it is found, structures not in the mapping will be excluded. Defaults to ‘default’.

Returns:

The DataFrame of all radiomics computed for dataset

Return type:

pd.DataFrame

get_all_dvhs_for_dataset(dataset_name='data', patient=None, df_process=None, structure_mapping_id='default')#

Return a DataFrame of DVHs computed for this dataset

Parameters:
  • dataset_name (str, optional) – The name of the dataset on which to run analysis. Defaults to “data”.

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to fetch DVHs for. Defaults to None.

  • df_process (pd.DataFrame, optional) – A DataFrame of the objects to compute dose metrics for. Must be None if patient is provided. Defaults to None.

  • structure_mapping_id (str, optional) – ID of a structure mapping to use. Structure names will be replaced with this mapping if it is found, structures not in the mapping will be excluded. Defaults to ‘default’.

Returns:

The DataFrame of all DVHs computed for dataset

Return type:

pd.DataFrame

pydicer.analyse.compare.compute_contour_similarity_metrics(df_target: DataFrame, df_reference: DataFrame, segment_id: str, mapping_id: str = 'default', compute_metrics: Optional[list] = None, force: bool = False)#

Computes structure similarity metrics between corresponding entries in a target DataFrame and reference DataFrame. Targets are matched to reference using the referenced_sop_instance_uid which is the image to which these structure sets are attached.

Parameters:
  • df_target (pd.DataFrame) – DataFrame containing structure set rows to use as target for similarity metric computation.

  • df_reference (pd.DataFrame) – DataFrame containing structure set rows to use as reference for similarity metric computation. Each row in reference will be match to target which reference the same referenced_sop_instance_uid (image to which they are attached).

  • segment_id (str) – ID to reference the segmentation for which these metrics are computed.

  • mapping_id (str, optional) – The mapping ID to use for structure name mapping. Defaults to DEFAULT_MAPPING_ID.

  • compute_metrics (list, optional) – _description_. Defaults to [“DSC”, “hausdorffDistance”, “meanSurfaceDistance”, “surfaceDSC”].

  • force (bool, optional) – If True, metrics will be recomputed even if they have been previously computed. Defaults to False.

pydicer.analyse.compare.get_all_similarity_metrics_for_dataset(working_directory, dataset_name='data', patient=None, segment_id=None, structure_mapping_id='default')#

Return a DataFrame of similarity metrics computed for this dataset.

Parameters:
  • dataset_name (str, optional) – The name of the dataset for which to extract metrics. Defaults to CONVERTED_DIR_NAME.

  • patient (list|str, optional) – A patient ID (or list of patient IDs) to fetch metrics for. Defaults to None.

  • segment_id (str, optional) – Only extract similarity metrics for segment ID. If none is supplied then all similarity metrics will be fetched. Defaults to None.

  • structure_mapping_id (str, optional) – ID of a structure mapping to load computed metrics for. Defaults to DEFAULT_MAPPING_ID.

Returns:

The DataFrame of all radiomics computed for dataset

Return type:

pd.DataFrame

pydicer.analyse.compare.prepare_similarity_metric_analysis(working_directory: Union[str, Path], analysis_output_directory: Optional[Union[str, Path]] = None, df: Optional[DataFrame] = None, dataset_name: str = 'data', patient: Optional[Union[str, list]] = None, segment_id: Optional[str] = None, structure_mapping_id: str = 'default')#

Prepare the similarity metric analysis and stores raw metrics and statistics as .csv files within the analysis_output_directory. Plots and statistics are also saved as .png files within this directory for inspection.

Parameters:
  • working_directory (Union[str, Path]) – The working directory of the PyDicer project.

  • analysis_output_directory (Union[str, Path], optional) – The directory in which to store the output. If none is provided analysis will be generated in a directory named similarity_analysis within the working_directory. Defaults to None.

  • df (pd.DataFrame, optional) – A DataFrame generated using the get_all_similarity_metrics_for_dataset function. This might be useful if you wish to further filter the DataFrame prior to generating analysis. If none is provided the get_all_similarity_metrics_for_dataset will be used to generate the DataFrame. Defaults to None.

  • dataset_name (str, optional) – The name of the dataset to analyse similarity metrics for. Defaults to CONVERTED_DIR_NAME.

  • patient (Union[str, list], optional) – The patients to analyse similarity metrics for. Defaults to None.

  • segment_id (str, optional) – The segment ID to analyse similarity metrics for. Defaults to None.

  • structure_mapping_id (str, optional) – ID of a structure mapping to load computed metrics for. Defaults to DEFAULT_MAPPING_ID.