Clinical evaluation of deep learning autocontouring in prostate and head and neck cancer
Hague, Christina ; Beasley, William J ; McPartlin, Andrew J ; Owens, Susan E ; Price, Gareth J ; Saud, H. ; Slevin, Nicholas J ; Van Herk, Marcel ; Whitehurst, Philip ; Chuter, Robert
Hague, Christina
Beasley, William J
McPartlin, Andrew J
Owens, Susan E
Price, Gareth J
Saud, H.
Slevin, Nicholas J
Van Herk, Marcel
Whitehurst, Philip
Chuter, Robert
Citations
Altmetric:
Abstract
Purpose or Objective
Manually contouring organs at risk (OARs) is time
consuming and affected by inter-observer variability. As
the complexity and number of OARs increases the role of
auto-contouring to standardise delineation and reduce
clinician workload becomes increasingly important. The
aim of this study was to evaluate the ability of deeplearning
based auto-contouring to produce clinically
acceptable OAR contours.
Material and Methods
Two Head and Neck (H&N) models, A and B trained on local
data and two “generic” models trained on data from other
centres (one H&N and one prostate model) were
evaluated. OAR contours from ten randomly selected H&N
patients and nine prostate patients were reviewed. Autocontours
(DLCExpert™, Mirada Medical) were reviewed by
two independent observers and scored from 1-7 according
to a ‘goodness of fit’ descriptive category. Scores of 5
(“requiring 20-50% manual edits to meet clinical
standards”) or less were defined as acceptable. To
compare contours generated by the four models with
manual contours distances to agreement (DTA) were
calculated. For the prostate model, median, minimum and
maximum time required for manual contouring was
recorded and compared with the time required to edit to
DLC-expert generated contours.
Results
Manual editing of contours generated by the DLC-expert
model saved time compared with full manual contouring
for all prostate OARs, and in particular the bladder (Table
1). Average goodness-of-fit scores were similar between
the two independent observers as shown in Table 2. The
generic model met clinical standards for the mandible,
oral cavity, brainstem and left submandibular gland and
outperformed models A and B, in particular for left and
right submandibular glands (3.9 vs 12.1 mm and 3.1 vs 3.8
mm DTA). However for brainstem, spinal cord, larynx,
bilateral parotid glands and eyes, local models A and B
performed better (e.g. 2.8 vs 4.2 mm for brainstem and
6.6 vs 11.5 mm for spinal cord). Irrespective of the model,
contours generated were not clinically acceptable for the
optic chiasm, optic nerves and pharyngeal constrictor muscles requiring >50% manual edits. Conclusion
A “generic” deep learning model has been shown to aid
the clinical workflow by reducing the time taken to
delineate OARs for prostate patients. Auto-contouring for
small, poorly visualised structures on CT such as the optic
apparatus, however, has poor performance. The
integration of MR in the contouring of such structures may
be a solution but this remains to be validated. For H&N the
DTA and clinical acceptability showed that contours from
a mixture of local and generic models would potentially
give clinically acceptable contours. Standard models can
be very useful if they match internal contouring
guidelines. Clinical evaluation of these and other models
is ongoing within the centre.
Description
Date
2020
Publisher
Collections
Keywords
Type
Meetings and Proceedings
Citation
Hague C, Beasley W, McPartlin A, Owens S, Price G, Saud H, et al. PO-1719: Clinical evaluation of deep learning autocontouring in prostate and head and neck cancer. Radiotherapy and Oncology . 2020 Nov;152:S950.