Overview (version 0.0.1)

PDCP is a tool used to collect and process patients imaging data from the Orthanc research PACs in an Auscat data centre.

Orthanc provides a simple and standalone DICOM server (PACS). It supports the DICOM and DICOMweb protocols. It is currently used in the AUScat network to save patients anonymized radiotherapy imaging data exported from the Hospital PACS.

PDCP is a python module that consists of multiple classes used to prepare patients records. Object oriented programming and inheritance were followed to make the code reusable and to extend certain functions. The patientImaging class is the main module and it contains a set of functions used to collect, validate, and track the data preparation process.

Other classes that inherit from patientImaging were created to handle different cancer sites and projects:

patientImagingCRDP: a class that inherits the patientImaging class, used to collect and process cancer patients data from the AUScat data centers, where there is a need to link 4 patient modalities. (C: stands for CT, R: stands for RTSTRUCTS, D: stands for RTDOSE, P: stands for the RTPLAN)
patientImagingCRD: a class that inherits the patientImaging class, used to collect and process cancer patients radiotherapy data where the RTPLAN cannot be obtained
patientImagingCR: a class that inherits the patientImaging class, used to collect and process a patients data where there is a need to link the CT with the RTSTRUCT only (segmentation task).

Other classes/functions were implemented to facilitate the data preparation task:

ReadPatientImagingData: used to load the patient’s processed data into a python dictionary {‘CT’:,’masks’:,etc}.
ROIP : a class used to generate the central slices patient’s organs at risk (OARs) and target volumes (TV)
DVH : a class used to generate the dosimetry features for the patients.

Several python packages were utilized to collect the data and to process the images:

requests: a python built-in module used to retrieve required files from an orthanc server (targets the orthanc server rest API)
pyorthanc: a python library that wraps the Orthanc REST API and facilitates the manipulation of data with several utilities. It was used in this script to identify patient related files. It was initially used to retrieve data through some fuctions (get_instance_file(), get_instance_simplified_tags()), however it was noticed that it was slow compared to python built-in modules.
concurrent.futures : a python built-in module used to apply threading and multiprocessing.
pydicom: a python library used to handle dicom imaging tasks. i.e the orthanc server returns the files as a bytes array. pydicom functions were used to convert such arrays to pydicom FileDatasets before saving into the patient’s directory.
simpleITK: is a simplified, open-source interface to the Insight Segmentation and Registration Toolkit. It was used to handle resampling tasks for the dose grid.
numpy: used to save radiotherapy data into 3d arrays.
pandas: a python library used with tabular data, dataframes were used to save the images summaries in this script
json: used to save patient notes while collecting and processing the records
hdf5stogare: used to save and load matlab files that contains the patient’s related data (not used).

What can PDCP do?

PDCP facilitates the data collection and preparation for using imaging radiotherapy data. PDCP can:

Query, retrieve, and validate patient imaging summaries from an Orthanc PACS.
Analyse associations in patient studies (linking required modalities)
Retrieve patient imaging data into a local directory.
Prepare the records for use in various research questions (dosimetry analyses, contouring, image standardization)
Track the patients data collection process and idenfity reasons behind excluding patients data.

Next Steps

Patients data are currently retrieved via the http protocol. We aim to utilize the WebDav, which is an extended protocol that allows remote web content authoring operations, in future versions to be able to use dicom tags and indexed images from Orthanc directly. Within WebDav, data can be viewed by patients, studies, or uids, which will help in managing DICOM data at the centres.
We aim to utilize the Orthanc plugins such as serve-folders to be able to track patient notes from webpages.
At this stage, patients with multiple studies will require manual review for selecting the right study. We aim to automate this process.

Selecting Modalities in a Patient Study

The current linkage is conducted without retrieving patients data from the Orthanc server. Before retrieving data, the patients will be verified.

For each CT inside the study:
- Start the RTSTRUCT linkage:
  
  Select the ‘SOPInstanceUID’,’SeriesIdentifier’ of RTSTRUCTS in the study that has the referenced ct series uid as the selected CT.
  
  If no RTSTRUCT associated with the CT:
  
  Continue to the next CT
  
  Start the RTPLAN linkage:
  
  Select the ‘SOPInstanceUID’,’ReferencedStructureSetSequence’ of RTPLANS in the study
  
  Remove the RTPLANS that has no ReferencedStructureSetSequence
  
  Get the SOPInstanceUID of each of the remaining RTPLANS
  
  If no remaining SOPInstanceUID:
  
  Cannot proceed further with the CT as there is no association, discard this CT and proceed to the next CT
  
  Get the SOPs of the rtstructs referenced in the RTPLANS
  
  Select the RTPLANS linked to the RTSTRUCT selected earlier
  
  If no remaining RTPLAN SOPInstanceUID:
  
  Cannot proceed further with the CT as there is no association to the RTSTRUCT that links to the CT, proceed to the next CT, continue
  
  Start the RTDOSE linkage:
  
  Select the ‘SOPInstanceUID’,’ReferencedRTPlanSequence’ of the instances that has DoseSummationType as PLAN
  
  If no SOPInstanceUIDs:
  
  Cannot proceed further with the CT as there will be no RTDOSEs, continue
  
  Get the SOPInstanceUIDs of the doses that has ReferencedRTPlanSequence selected.
  
  If no SOPInstanceUIDs:
  
  Cannot proceed further with the CT, as there will be no RTDOSE linked to the RTPLAN linked to the RTSTRUCT linked to the CT. discard this CT and proceed to the next CT
  
  Association is now successful. Orthanc ids are required
  
  Get the CT orthanc identifier
  
  Get the RTSTRUCTS Orthanc identifier/s
  
  Get the RTPLANS Orthanc identifier/s
  
  Get the RTDOSE Orthanc identifier/s
  
  Append the values as a dictionary to a list
After linking the series inside the study.
- If the list is empty:
  
  no links between series CTs, discard the study
- If the list has one value:
  
  best scenario
- If the list contains multiple dictionaries with links:
  
  The patient has multiple association. i.e. there are two CTs that has approved links. Cannot take a decision for now.

Selecting a Patient Study

A study is selected if it contains the required modalities added by the user.
A study will be discarded if the selected CT series contains a large number of instances
A study will be discarded if the study has the required RTSTRUCT, with the RTSTRUCT not containing any of the required keywords (i.e. study with RTSTRUCT with contour names PATIENT, ISO) will not be used.
A study will be discarded if it contains multiple CTs with multiple associations, will be discarded and will require review.
A study will be discarded if it contains a keyword that should not be found in its study name (e.g. a breast cohort is being collecting while the study name shows ‘head and neck’).

Example Data Collection

The example below shows the steps followed to prepare a cohort of 10 patients.

This example has been conducted with patientImagingCRDP for a breast cohort, where a linkage between CT–> RTSTRUCT –> RTPLAN –> RTDOSE is targeted.

Step 1: Setup Configuration File

For each data collection and preparation task, a configuration file is required. The first step is to create the configuration in a json file:

{
  "version": "Westmead breast 0.0.1",
  "required_modalities_for_patient":[
    "CT",
    "RTSTRUCT",
    "RTDOSE",
    "RTPLAN"
  ],
  "required_modalities_for_patient_plus_rtplan":[
    "CT",
    "RTSTRUCT",
    "RTDOSE",
    "RTPLAN",
    "PT"
  ],
  "study_desc_should_not_contain":[
    "lung","head","prostate","rectum"
  ],
  "study_desc_may_contain":["breast"
  ],
  "possibilities":[
    "ctv",
    "gtv",
    "itv",
    "ptv",
    "heart",
    "lung",
    "tumour"
  ],
  "imagesummaries": "../../pdcptest/imagesummaries/",
  "patientnotesdir": "../../pdcptest/patientnotes/",
  "matlabfiles": "../../pdcptest/matlabfiles/",
  "picklefiles": "../../pdcptest/picklefiles/",
  "datadirectory": "../../pdcptest/data/",
  "thequarantine":"../../pdcptest/quarantine/",
  "link_to_ids": "../../pdcptest/ids/breast_ids_test.csv",
  "ipport": "http://localhost:8042",
  "username": "",
  "password": "",
  "CONNECTIONS": 20,
  "TIMEOUT": 100
}

Here is a brief overview of the required keys:

required_modalities_for_patient: a list of the required modalities [‘CT’,’RTSTRUCT’,’RTPLAN’,’RTDOSE’]
study_desc_should_not_contain: a list of keywords the patient study should not contain.
study_desc_may_contain: a list of keywords a patient study might contain
possibilities: a list of keywords, where one of them at least should be found in the RTSTRUCT contours
imagesummaries: a directory to host the patient instances dicom summaries tags found in the targeted Orthanc server.
patientnotesdir: a directory to host the patient notes
datadictionary: a directory to host the patients records.
quarantine: Patients might have a rescan through the course of treatment. Other patients might have multiple courses of treatment. Quarantine is used to host patients with multiple studies.
link_to_ids: a csv file that contains the patient ids and its corresponding indexed orthanc ids.
ipport: ip and port of the Orthanc server.
username: username of an account in the Orthanc server
password: password of an account in the Orthanc server. Leave empty if not needed.
CONNECTIONS: the number of connections to be used when targeting the server.
TIMEOUT: total number of seconds to wait when sending the request to the server.

Step 2: Prepare directories

Create a directory to host the patients data and notes. This directory should include:

ids : a directory used to save a .csv file that will contain the patient ids and its corresponding Orthanc ids (links_to_ids in the config file).
imagesummaries: a directory used to save csv files that contain summaries of patient’s imaging data in the targeted orthanc server
patientnotes: a directory that will be used to save json objects that contain each patient data collection and preparation notes
data: will be used to save patient data. For each patient, a new directory will be created.
quarantine: will be used to save patient quarantine data. Each study will be treated as the only study associated with the patient. Hence, once a study is selected it will be manually moved into the data directory.

Step 3: Retrieve Orthanc Ids

The third step is to retrieve patients assoicated ids from the Orthanc server. The Orthanc server will create a specific id for each patient. These values are currently saved in an SQLite database (by default) with options to utilize other servers as plugins (Postgress).

After running this script, each row in the generated .csv file will represent the patient identifier and its corresponding Orthanc identifier, which is used by the Orthanc server to index patient’s studies.

The script is currently used to get all the patients ids from the remote server.

from PDCP import patientImaging
codesconfig='codesconfig_test.json'
patientImaging.collect_pids_orthanc(codesconfig)

The csv file will be similar to :

ids,orthanc_ids
1,b589720b-b3e2d6bb-e8ad034c-f3443ebd-5d155c67
2,236b7e77-7dd097d0-8e78dcac-98aca88b-1d214fd1
3,8ab22a5c-1c79ed07-229b7721-3d92e770-c7fcdf0e
4,74686685-bb06152e-9525ef4c-f5f53674-fc880224
5,c8e45dnf-bf3b1367-85aa9baa-1017d2a0-1789c6d4

with ids representing the patient ids and orthac_ids representing the patient orthanc id.

Step 4: Prepare Patients Images Summaries

The fourth step is to collect a summary of the available data for each patient in the Orthanc server. Various dicom tags are required to handle the linkage between modalities and to prepare the patient sudy. These tags are retrieved for each patient instance and added as a row to the associated csv file. For each patient, a dataframe is expected to be generated. Three possible cases are axpected when collecting patients images summaries:

pid_SUCCESS.csv : This indicates that the patient images summary (dataframe) was successfully prepared .
pid_IMAGINGDATANOTFOUND.csv : This indicates that the patient images summary (dataframe) cannot be generated because patients data related to data collection task were not found.
pid_ERROR.csv : This indicates that the patient images summary was collected with errors. In most cases, this error occurs because the server failed to handle a high number of requests at the same time. This can be fixed by repeating the execution or by lowering the number of threads to target the server.

To collect patient images summaries:

import os,math,time,sys
sys.path.insert(0, os.path.abspath('../..'))
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))

from PDCP import patientImagingCRDP
import pandas as pd
from concurrent.futures import ProcessPoolExecutor



if __name__ == '__main__':
    try:
        print('Collecting patient summaries from the orthanc server.')
        start=time.time()
        codesconfig="codesconfig_test.json"
        if not os.path.exists(codesconfig):
            print('Congfig file does not exist.')
            input('Press ENTER to exit')
        br=patientImagingCRDP(codesconfig)
        link_to_ids=br.codes['link_to_ids']
        x=pd.read_csv(link_to_ids)
        orthanc_ids=x['orthanc_ids'].tolist()
        patients_ids=x['ids'].tolist()

        #cpus=multiprocessing.cpu_count()
        cpus=2
        n=int(math.ceil(len(orthanc_ids)/(cpus)))#leave one CPU for other things :p
        #n=2 #I assume we have 2 processes available in the server.
        oids = [orthanc_ids[i * n:(i + 1) * n] for i in range((len(orthanc_ids) + n - 1) // n )]
        pids=[patients_ids[i * n:(i + 1) * n] for i in range((len(patients_ids) + n - 1) // n )]
        processes=[]
        with ProcessPoolExecutor(max_workers = cpus) as executor:
          results = executor.map(br.generate_orthanc_files_summaries, oids,pids)
        wpids=[]
        wpatient_comments=[]
        for result in results:#results is an iterable of tuples
            wpids=wpids+result[0]
            wpatient_comments=wpatient_comments+result[1]
        print(wpatient_comments)
        finish=time.time()
        print(f"time taken to get patient imaging data: {finish-start}")
        input('Press ENTER to exit')
        input('Press ENTER to exit')
    except Exception as e:
        print(e)
        print(f"An error occured.")
        input('Press ENTER to exit')

When running this script, the image summaries will be added to imagesummaries. Here is a result of running a cohort of 10 patients with patientImagingCRDP

Step 5: Generate Patient Data

The images summary for each patient has been collected. The next step is to verify linkage and retrieve patients data.

The fifth step includes retrieving data from the orthanc server into a local directory that consists of:

CT: a directory that contains the patient’s CT DICOM slices
RTDOSE: a direcotry that contains the patient’s RTDOSES
RTSTRUCT: a direcotry of directories that contains the patient’s RTSTRUCT. (each RTSTRUCT will be saved in a sub directory)
CTnifti: a directory that contains the patient’s CT nifti file
masks: a directory of directories that contains the patient’s structures masks. To generate the masks, a script was taken from platipy (Thanks Platipy !).
rtdosenifti: a directory that contains the patient’s converted RTDOSES

Part A: Example with one patient

import sys,os
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import patientImagingCRDP
import time
import json

start=time.time()
patients_ids=[]
codesconfig='codesconfig_test.json'
bcd=patientImagingCRDP(codesconfig)
patients_ids=[123456789]
patients_to_execlude,patients_to_review,patients_passed=bcd.generate_patients_data(patients_ids)
finish=time.time()

print(f'time taken for one patient: {finish-start}')


notesdir=bcd.codes['patientnotesdir']
filepath=notesdir+str(patients_ids[0])+'.json'
f = open(filepath,)
file=json.load(f)

The previous script shows the data collection process for one patient with an id 123456789. Running this script, a directory with the patient records is expected.

Part B: Example with multiple patients using multiple CPUs

import sys,os,time,math
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import patientImagingCRDP
import pandas as pd

from concurrent.futures import ProcessPoolExecutor

def main():
    start=time.time()
    patients_ids=[]
    codesconfig='codesconfig_test.json'
    bcd=patientImagingCRDP(codesconfig)
    thedir=bcd.codes['imagesummaries']
    for filename in os.listdir(thedir):
        if "SUCCESS" in filename:
            pid=filename.split("_")[0]
            patients_ids.append(int(pid))

    cpus=2
    n=int(math.ceil(len(patients_ids)/(cpus)))
    pids=[patients_ids[i * n:(i + 1) * n] for i in range((len(patients_ids) + n - 1) // n )]
    with ProcessPoolExecutor(max_workers = cpus) as executor:
      results = executor.map(bcd.generate_patients_data, pids)
    patients_to_execlude=[]
    patients_to_review=[]
    patients_passed=[]
    for result in results:
      patients_to_execlude=patients_to_execlude+result[0]
      patients_to_review=patients_to_review+result[1]
      patients_passed=patients_passed+result[2]
    finish=time.time()
    print(f"time taken to get patient imaging data: {finish-start}")
    print(f"patients_to_execlude: {len(patients_to_execlude)}")
    print(f"patients_to_review: {len(patients_to_review)}")
    print(f"patients_passed: {len(patients_passed)}")
    print("patients_to_execlude")
    print(patients_to_execlude)
    print("patients_to_review")
    print(patients_to_review)
    print("patients_passed")
    print(patients_passed)
    input('Press ENTER to exit')
    input('Press ENTER to exit')


if __name__ == '__main__':
   main()

For each patient, a .json file will be generated to report the data collection process. An example below shows the notes that will be added to the patientnotes directory.

{"raw_images_summary":{"PatId": 123456789,
    "number_of_patient_studies": 1,
    "number_of_CT_series": 1,
    "number_of_CT_instances": 131,
    "number_of_RTSTRUCT_series": 1,
    "number_of_RTSTRUCT_instances": 1,
    "number_of_RTDOSE_series": 1,
    "number_of_RTDOSE_instances": 1,
    "number_of_RTPLAN_series": 1,
    "number_of_RTPLAN_instances": 1, "number_of_PT_series": 0, "number_of_PT_instances": 0},
    "list_of_values": {"PatId": 123456789,
        "CT_SeriesIdentifier": "1.3.6.1.4.1.32722.1111111111",
        "RTSTRUCT_SOPInstanceUID": ["1.3.6.1.4.1.32722.22222222"],
        "StructureSetDate": "missing",
        "StudyDate": 20141222,
        "RTDOSE_SOPInstanceUID": ["1.3.6.1.4.1.32722.3333333"],
        "RTPLAN_SOPInstanceUID": ["1.3.6.1.4.1.32722.44444444444444"],
        "CT_orthanc_identifier": "97953882-c2ac9041-5c6a81b3-asdas21-12567e46",
        "RTSTRUCT_orthanc_identifiers": ["567b1457-9971653c-333222-0bbee186-a36cff64"],
        "RTDOSE_orthanc_identifiers": ["7005f8e5-e75dfd2b-111-44-306b4626"],
        "RTPLAN_orthanc_identifiers": ["e12583b3-555422-99ad91d5-31af02e4-fc62a157"],
        "thedir": "../../pdcptest/data/123456789/",
        "ct_directory": "../../pdcptest/data/123456789/CT/",
        "rtdoses_directory": "../../pdcptest/data/123456789/RTDOSE/",
        "rtstruct_directory": "../../pdcptest/data/123456789/RTSTRUCT/",
        "masks_directory": "../../pdcptest/data/123456789/masks/",
        "nifti_directory": "../../pdcptest/data/123456789/CTnifti/"},
 "verification_notes":
     ["I: Initial Verification: no phantom was found for the patient.",
     "I: Initial Verification: Patient has one associated study",
     "I: I_SUCC_001:PROCEED:123456789", "II: Study verification",
     "II: Study verification: Checking CT series: 1.3.6.1.4.1.32722.1111111111",
     "II: Study verification: Patient has one CT associated with RTSTRUCTS and RTDOSES",
     "II: II_SUCC_001:PROCEED:123456789"],
 "verified_images_summary": {"PatId": 123456789,
     "number_of_patient_studies": 1, "number_of_CT_series": 1,
     "number_of_CT_instances": 131, "number_of_RTSTRUCT_series": 1,
     "number_of_RTSTRUCT_instances": 1, "number_of_RTDOSE_series": 1,
     "number_of_RTDOSE_instances": 1, "number_of_RTPLAN_series": 1,
     "number_of_RTPLAN_instances": 1, "number_of_PT_series": 0, "number_of_PT_instances": 0},
  "version": "liverpool python v1.0.",
  "retrieval_notes": ["XI: CT retrieval:Took 5.89 s",
      "XI: CT retrieval: skipped, no SliceLocation: 0",
      "XI: CT retrieval: skipped, no ImagePositionPatient: 0",
       "XI: CT retrieval: retrieved the CT series instances to the local directory.",
        "XI: XI_SUCC_001:PROCEED",
         "XII: CTnifti generation",
          "XII: CTnifti generation: Successfully added the CT nifti file to its directory",
      "XII: XII_SUCC_001:PROCEED", "XIII: RTSTRUCTS retrieval",
      "XIII: RTSTRUCTS retrieval: added an RTSTRUCT instance to the RTSTRUCT directory.",
      "XIII: XIII_SUCC_001: PROCEED",
      "XIV: RTSTRUCTS nifti masks generation.",
      "XIV: XIV_SUCC_001:PROCEED ", "XV: RTDOSE retrieval.",
       "XV: RTDOSE retrieval: added 1 rtdose PLAN instances to the RTDOSE directory. ",
        "XV: XV_SUCC_001: PROCEED"],
    "loading_notes": ["XXI: load CT", "XXI: XXI_SUCC_001: PROCEED", "XXII: XXII loading masks",
    "XXII: XXII loading masks: loaded the mask to numpy array:COMBLUNG",
    "XXII: XXII loading masks: loaded the mask to numpy array:CTV",
    "XXII: XXII loading masks: loaded the mask to numpy array:CTV_TUMOURBED",
    "XXII: XXII loading masks: loaded the mask to numpy array:CTWIRE",
    "XXII: XXII loading masks: loaded the mask to numpy array:External_ROI",
    "XXII: XXII loading masks: loaded the mask to numpy array:Lung_L",
    "XXII: XXII loading masks: loaded the mask to numpy array:Lung_R",
    "XXII: XXII loading masks: loaded the mask to numpy array:PTV",
     "XXII: XXII loading masks: loaded all the patient's masks.",
     "XXII: XXII_SUCC_001:PROCEED",
     "XXIII: XXIII loading RTDOSES",
     "XXIII: XXIII loading RTDOSES: added an rt dose: 1.3.6.1.4.1.32722.3333333",
     "XXIII: XXIII_SUCC_001: PROCEED", "XXIV: XXIV select RTDOSES",
     "XXIV: XXIV select RTDOSES: Maximum of rt doses: [46.318787]",
     "XXIV: XXIV_SUCC_001:PROCEED"],
     "status": "SUCCESS", "patientimagingfilesready": true}

Step 6: Check Patients

The successfully collected patients are patients who has the key patientimagingfilesready equal to true in the patient notes.

import sys,os
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import ReadPatientImagingData
import json
configfile='codesconfig_test.json'
with open(configfile, "r") as read_file:
    conf = json.load(read_file)

patientnotes=conf['patientnotesdir']
spatients=[]
epatients=[]
for filename in os.listdir(patientnotes):
    jsonfile=f'{patientnotes}{filename}'
    pid=filename.split(".json")[0]
    with open(jsonfile, "r") as read_file:
        file = json.load(read_file)
    patientimagingfilesready = file['patientimagingfilesready'] if 'patientimagingfilesready' in file else False
    if patientimagingfilesready:
        spatients.append(pid)
    else:
        epatients.append(pid)

Step 7: Generate Dosimetry Features

Some dosimetry features can be generated for each generated mask at this stage and added to the patients directory.

import sys,os
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import DVH
import sys,os
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import ReadPatientImagingData
import json
configfile='codesconfig_test.json'
with open(configfile, "r") as read_file:
    conf = json.load(read_file)

patientnotes=conf['patientnotesdir']

for pid in spatients:
    jsonfile=f'{patientnotes}{str(pid)}.json'
    with open(jsonfile, "r") as read_file:
        file = json.load(read_file)

    patientimagingfilesready = file['patientimagingfilesready'] if 'patientimagingfilesready' in file else False
    if patientimagingfilesready:
        adict=file['list_of_values']


    #load patient imaging data into imagingdata
    imagingdata=ReadPatientImagingData.load_patient_images(pid,adict,configfile)


    dvh=DVH(pid)
    prescribedDose=None#if we can get the prescribed dose at this stage.
    dvh_df=dvh.compute_dvh_data(imagingdata,prescribedDose)

    datadirectory=conf['datadirectory']

    dosimetrydir=f'{datadirectory}{str(pid)}/dosimetrydata/'
    if not os.path.exists(dosimetrydir):# a directory that save all the patient's details
        os.mkdir(dosimetrydir)
    dvh_df.to_csv(f'{dosimetrydir}/{str(pid)}_.csv',index=False)

For each patient the following directory will be obtained:

Step 8: Patients in Quarantine

In most cases, patients will be in quarantine if they have mutliple studies with successful associations. In other words, patients with two studies ready to use. This can occure because:

a patient might have two studies, where one of them is a rescan
a patient might have multiple studies, representing multiple cancer treatments

Those patients will need manual review to select the correct study that matches the project research question.

import time,os,shutil,sys
sys.path.insert(0, os.path.abspath('../../PDCP/code/'))
from PDCP import patientImagingCRDP
import pandas as pd
import json

start=time.time()
patients_ids=[]

configfile='codesconfig_test.json'
with open(configfile, "r") as read_file:
        conf = json.load(read_file)

bcd=patientImagingCRDP(configfile)

thequarantine=conf['thequarantine']
if not os.path.exists(thequarantine):
        os.mkdir(thequarantine)
imagesummaries=conf['imagesummaries']
datadirectory=conf['datadirectory']
patientnotesdir=conf['patientnotesdir']
#set the patients who were initially execluded in here.
patients_ids=[]

for patient in patients_ids:#for each patient who is in quarantine
        images=pd.read_csv(f'{imagesummaries}{str(patient)}_SUCCESS.csv')
        studies=images['StudyInstanceUID'].unique().tolist()
        for idx,astudy in enumerate(studies):
                x=images.loc[images['StudyInstanceUID']==astudy].reset_index(drop=True)
                x.to_csv(f'{imagesummaries}{str(patient)}_SUCCESS.csv',index=False)
                patients_to_execlude,patients_to_review,patients_passed=bcd.generate_patients_data([patient])
                #now copy the patients files to a new location
                destination=f'{thequarantine}{str(patient)}_{str(idx)}/'
                if not os.path.exists(destination):
                        os.mkdir(destination)
                source=f'{datadirectory}{str(patient)}'
                 #move the generated files
                if os.path.exists(source):
                        shutil.move(source, destination)
                #move the notes
                source=f'{patientnotesdir}{str(patient)}.json'
                if os.path.exists(source):
                        shutil.move(source, destination)

                #move the image summary
                source=f'{imagesummaries}{str(patient)}_SUCCESS.csv'
                if os.path.exists(source):
                        shutil.move(source, destination)

        #move things back to normal
        images.to_csv(f'{imagesummaries}{str(patient)}_SUCCESS.csv',index=False)

finish=time.time()

print(f'time taken for one patient: {finish-start}')

The studies will be created and mapped to the quarantine directory. Once the correct study is selected, it can be mapped into the data directory to be included in any further analyses.

PDCP

`DVH`(pid)	A class that computes the DVH features for each patient imaging files.
`ROIP`([ptype])	A class to work with the patient’s organs at risk (OARs) and target volumes (TV)
`ReadPatientImagingData`	This class contains functions that read the patient directory.It collects the patients related data such as the ct_image, masks, roi names, pixel spacing.
`patientImaging`([codesconfig])	A class that contains all the required functions to prepare and process patient records from an Orthanc server.
`patientImagingCR`([codesconfig])	This is the class used to collect the patients data where the CT & the RTSTRUCT are only required.
`patientImagingCRD`([codesconfig])	This is the class used to collect the patients data where the CT & the RTSTRUCT & RTDOSE are only required.
`patientImagingCRDP`([codesconfig])	This class was used to collect and process datasets where connections between CT, RTSTRUCTS,RTDOSES, PLANS are required.

Welcome to PDCP’s documentation!