Dataset Preparation#
Prepare Dataset#
- class pydicer.dataset.preparation.PrepareDataset(working_directory: Union[str, Path] = '.')#
Class that provides functionality for prepartion of subsets of data.
- Parameters:
working_directory (-) – Main working directory for pydicer. Defaults to “.”.
- add_object_to_dataset(dataset_name: str, data_object_row: Series)#
Add one data object to a dataset.
- Parameters:
dataset_name (str) – The name of the dataset to add the object to.
data_object_row (pd.Series) – The DataFrame row of the converted object.
- prepare(dataset_name: str, preparation_function: Callable, patients=None, **kwargs)#
Calls upon an appropriate preparation function to generate a clean dataset ready for use. Additional keyword arguments are passed through to the preparation_function.
- Parameters:
dataset_name (str) – The name of the dataset to generate
preparation_function (function|str) – the function use for preparation
patients (list) – The list of patient IDs to use for dataset. If None then all patients will be considered. Defaults to None.
- Raises:
AttributeError – Raised if preparation_function is not a function or a string defining a known preparation function.
- prepare_from_dataframe(dataset_name: str, df_prepare: DataFrame)#
Prepare a dataset from a filtered converted dataframe
- Parameters:
dataset_name (str) – The name of the dataset to generate
df_prepare (pd.DataFrame) – Filtered Pandas DataFrame containing rows of converted data.
Preparation Functions#
- pydicer.dataset.functions.rt_latest_dose(df: DataFrame, **kwargs) DataFrame #
Select the latest RTDOSE and the image, structure and plan which it is linked to. You can specify keyword arguments to for a match on any top level DICOM attributes. You may also supply lists of values to these, one of which should match to select that series.
Example of matching the latest dose with Series Description being “FINAL” or “APPROVED”
prepare_dataset = PrepareDataset(working_directory) prepare_dataset.prepare( "clean", "rt_latest_dose", SeriesDescription=["FINAL", "APPROVED"] )
- Parameters:
df (pd.DataFrame) – DataFrame of converted data objects available for dataset
- Returns:
The filtered DataFrame containing only the objects to select
- Return type:
pd.DataFrame
- pydicer.dataset.functions.rt_latest_struct(df: DataFrame, **kwargs) DataFrame #
Select the latest Structure set and the image which it is linked to. You can specify keyword arguments to for a match on any top level DICOM attributes. You may also supply lists of values to these, one of which should match to select that series.
Example of matching the latest structure set with Series Description being “FINAL” or “APPROVED”
prepare_dataset = PrepareDataset(working_directory) prepare_dataset.prepare( "clean", "rt_latest_struct", SeriesDescription=["FINAL", "APPROVED"] )
- Parameters:
df (pd.DataFrame) – DataFrame of converted data objects available for dataset
- Returns:
The filtered DataFrame containing only the objects to select
- Return type:
pd.DataFrame
Structure Sets#
- class pydicer.dataset.structureset.StructureSet(structure_set_row, mapping_id='default')#
- get_mapped_structure_name(item: str) str #
Get the structure set specific name for a structure that may have been mapped.
- Parameters:
item (str) – The standardised name to look up.
- Returns:
- The structure set specific name if it could be mapped (returns the original name
otherwise).
- Return type:
str
- get_standardised_structure_name(item: str) str #
Get the standardised name for a structure that is present in this structure set.
- Parameters:
item (str) – The name of the structure in this structure set.
- Returns:
- The standardised name if it could be mapped (returns the original name
otherwise).
- Return type:
str
- get_unmapped_structures() list #
Get a list of structures for which no structure was found based on the mapping. If no mapping is being used this will always be empty.
- Returns:
Names of structures that can’t be found using a mapping
- Return type:
list
- items() a set-like object providing a view on D's items #
- keys() a set-like object providing a view on D's keys #
- values() an object providing a view on D's values #
- pydicer.dataset.structureset.get_mapping_for_structure_set(structure_set_row: Series, mapping_id: str) dict #
Searches the folder hierarchy to find a structure name mapping file with the given ID.
- Parameters:
structure_set_row (pd.Series) – The converted dataframe row entry for the structure set.
mapping_id (str) – The ID of the mapping to find.
- Returns:
The structure name mapping
- Return type:
dict