Dataset Preparation#

Prepare Dataset#

class pydicer.dataset.preparation.PrepareDataset(working_directory: Union[str, Path] = '.')#

Class that provides functionality for prepartion of subsets of data.

Parameters:

working_directory (-) – Main working directory for pydicer. Defaults to “.”.

add_object_to_dataset(dataset_name: str, data_object_row: Series)#

Add one data object to a dataset.

Parameters:
  • dataset_name (str) – The name of the dataset to add the object to.

  • data_object_row (pd.Series) – The DataFrame row of the converted object.

prepare(dataset_name: str, preparation_function: Callable, patients=None, **kwargs)#

Calls upon an appropriate preparation function to generate a clean dataset ready for use. Additional keyword arguments are passed through to the preparation_function.

Parameters:
  • dataset_name (str) – The name of the dataset to generate

  • preparation_function (function|str) – the function use for preparation

  • patients (list) – The list of patient IDs to use for dataset. If None then all patients will be considered. Defaults to None.

Raises:

AttributeError – Raised if preparation_function is not a function or a string defining a known preparation function.

prepare_from_dataframe(dataset_name: str, df_prepare: DataFrame)#

Prepare a dataset from a filtered converted dataframe

Parameters:
  • dataset_name (str) – The name of the dataset to generate

  • df_prepare (pd.DataFrame) – Filtered Pandas DataFrame containing rows of converted data.

Preparation Functions#

pydicer.dataset.functions.rt_latest_dose(df: DataFrame, **kwargs) DataFrame#

Select the latest RTDOSE and the image, structure and plan which it is linked to. You can specify keyword arguments to for a match on any top level DICOM attributes. You may also supply lists of values to these, one of which should match to select that series.

Example of matching the latest dose with Series Description being “FINAL” or “APPROVED”

prepare_dataset = PrepareDataset(working_directory)
prepare_dataset.prepare(
    "clean",
    "rt_latest_dose",
    SeriesDescription=["FINAL", "APPROVED"]
)
Parameters:

df (pd.DataFrame) – DataFrame of converted data objects available for dataset

Returns:

The filtered DataFrame containing only the objects to select

Return type:

pd.DataFrame

pydicer.dataset.functions.rt_latest_struct(df: DataFrame, **kwargs) DataFrame#

Select the latest Structure set and the image which it is linked to. You can specify keyword arguments to for a match on any top level DICOM attributes. You may also supply lists of values to these, one of which should match to select that series.

Example of matching the latest structure set with Series Description being “FINAL” or “APPROVED”

prepare_dataset = PrepareDataset(working_directory)
prepare_dataset.prepare(
    "clean",
    "rt_latest_struct",
    SeriesDescription=["FINAL", "APPROVED"]
)
Parameters:

df (pd.DataFrame) – DataFrame of converted data objects available for dataset

Returns:

The filtered DataFrame containing only the objects to select

Return type:

pd.DataFrame

Structure Sets#

class pydicer.dataset.structureset.StructureSet(structure_set_row, mapping_id='default')#
get_mapped_structure_name(item: str) str#

Get the structure set specific name for a structure that may have been mapped.

Parameters:

item (str) – The standardised name to look up.

Returns:

The structure set specific name if it could be mapped (returns the original name

otherwise).

Return type:

str

get_standardised_structure_name(item: str) str#

Get the standardised name for a structure that is present in this structure set.

Parameters:

item (str) – The name of the structure in this structure set.

Returns:

The standardised name if it could be mapped (returns the original name

otherwise).

Return type:

str

get_unmapped_structures() list#

Get a list of structures for which no structure was found based on the mapping. If no mapping is being used this will always be empty.

Returns:

Names of structures that can’t be found using a mapping

Return type:

list

items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
values() an object providing a view on D's values#
pydicer.dataset.structureset.get_mapping_for_structure_set(structure_set_row: Series, mapping_id: str) dict#

Searches the folder hierarchy to find a structure name mapping file with the given ID.

Parameters:
  • structure_set_row (pd.Series) – The converted dataframe row entry for the structure set.

  • mapping_id (str) – The ID of the mapping to find.

Returns:

The structure name mapping

Return type:

dict