Remote, Depth-Based Lung Function Assessment

<italic>Objective:</italic> We propose a remote, noninvasive approach to develop pulmonary function testing (PFT) using a depth sensor. <italic>Method:</italic> After generating a point cloud from scene depth values, we construct a three-dimensional model of the subject's chest. Then, by estimating the chest volume variation throughout a sequence, we generate volume–time and flow–time data for two prevalent spirometry tests: forced vital capacity (FVC) and slow vital capacity (SVC). <italic>Tidal volume</italic> and <italic>main effort</italic> sections of volume–time data are analyzed and calibrated separately to remove the effects of a subject's torso motion. After automatic extraction of keypoints from the volume–time and flow–time curves, seven FVC ( <italic>FVC, FEV1, PEF, FEF</italic><inline-formula><tex-math notation="LaTeX">$_{25\%}$</tex-math></inline-formula>, <italic>FEF</italic><inline-formula><tex-math notation="LaTeX">$_{50\%}$</tex-math></inline-formula>, <italic>FEF </italic><inline-formula><tex-math notation="LaTeX">$_{75\%}$</tex-math></inline-formula>, and <italic>FEF</italic> <inline-formula><tex-math notation="LaTeX">$_{25\text{--}75\%}$</tex-math></inline-formula>) and four SVC measures ( <italic>VC, IC, TV</italic>, and <italic>ERV</italic>) are computed and then validated against measures from a spirometer. A dataset of 85 patients (529 sequences in total), attending respiratory outpatient service for spirometry, was collected and used to evaluate the proposed method. <italic>Results:</italic> High correlation for FVC and SVC measures on intra-test and intra-subject measures between the proposed method and the spirometer. <italic> Conclusion</italic>: Our proposed depth-based approach is able to remotely compute eleven clinical PFT measures, which gives highly accurate results when evaluated against a spirometer on a dataset comprising 85 patients. <italic> Significance:</italic> Experimental results computed over an unprecedented number of clinical patients confirm that chest surface motion is linearly related to the changes in volume of lungs, which establishes the potential toward an accurate, low-cost, and remote alternative to traditional cumbersome methods, such as spirometry.


Remote, Depth-Based Lung Function Assessment
diseases. This can be achieved by a variety of measures, including exercise testing, lung volume measurement, and dynamic breathing tests. Traditional measures of pulmonary function, such as spirometry [1] and whole body plethsmography [2] (which measures lung volumes and gas transfer) require patient cooperation and direct contact with the equipment. There are other measures of lung physiology that are even more invasive, such as arterial blood gas sampling (direct arterial sampling) and cardiopulmonary exercise testing (treadmill or exercise bike) [1]. Comparatively among these methods, spirometry is the most prevalent to assess lung function due to its portability, price, and accuracy for medical diagnosis.
To perform a spirometry test, patients are asked to breathe through a mouthpiece while a nose clip is applied to prevent air leakage. The two primary clinical protocols undertaken with a spirometer are forced vital capacity (FVC) and slow vital capacity (SVC). The former comprises a maximal inspiration followed by a forced maximal expiration, and the latter a maximal inspiration followed by a slow, controlled, maximal expiration. Various clinical PFT measures, such as FVC, FEV1, PEF, and FEF 25-75% (FVC measures) and VC, IC, TV, and ERV (SVC measures) are calculated within a spirometry test [1], [3]. These PFT measures, and their combinations, are used in the diagnosis and assessment of obstructive lung diseases, e.g., chronic obstructive pulmonary disease (COPD) and Asthma, and restrictive lung diseases, e.g., lung fibrosis.
Although spirometry is an accurate and reliable clinical method, there are some disadvantages that limit its application. The spirometer is a particularly challenging device for certain clinical populations to perform with, e.g., the frail elderly, children, and cognitively impaired patients. It needs to be recalibrated at least every couple of days and a new mouthpiece and nasal clip are needed for each patient.
In this paper, we propose a novel depth-based method for remote lung function assessment by estimating and tracking the volume of the chest to compute clinically acquired FVC and SVC measures. For depth sensing, we use the Microsoft Kinect V2 RGB-D sensor [4] which is based on the timeof-flight technology. The estimated measures are correlated against the results obtained using a spirometer for 85 patients who attended a respiratory outpatient service for spirometry. In our previous work [5], we demonstrated that the Microsoft Kinect can be used to estimate chest volume and compute intratest PFT measures. To the best of our knowledge, the only This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ other work that remotely computes and reports PFT measures (just two, FVC and FEV1) is [6], which used the first generation, structured-light-based Kinect. Their study mainly focused on estimating passive airway resistance and was tested on 5 healthy subjects who were instructed to mark their inhalation and exhalation manually (using the computer mouse) during the test.
We extend our previous work in [5] by the following: 1) Obtaining detailed analysis of volume-time data to automatically extract more reliable keypoints for calculating scaling factors and measures; 2) Obtaining three more FVC measures i.e., FEF 25% , FEF 50% , and FEF 75% ; 3) Performing comparative analysis of PFT measures obtained by the proposed method and spirometer; 4) Investigating subjects' upper body motion during the test and its effects on volume-time data; 5) Generalizing the intra-subject scaling factor; and 6) Evaluating the proposed method on 85 actual patients (compared to 40 in [5]). Our proposed system has been developed in response to increasing clinical interest in contactless or remote techniques for respiratory assessment. It can be exploited for a wide range of potential applications, such as screening for respiratory diseases, home monitoring, and gating controls for radiological imaging techniques. The proposed system is easy to setup and does not require calibration on a daily basis. Due to remotely assessing the lungs, not only does it cut the costs (pneumatach and disposable accessories), but also it decreases infection risks caused by connecting to a pneumatach. Furthermore, our method requires no specialist training.
There are several recent studies that only estimate breathing rate, without performing PFT, using structured light [6]- [18], time of flight cameras [5], [19], [20], video cameras [21], [22], and other remote sensors [23], [24]. These are briefly considered in Section II. In Section III, we present an overview and schematic of the proposed approach. Then, in Section IV, we describe the Kinect noise analysis and filtering, three-dimensional (3-D) chest modeling, and volume estimation. This is followed by Section V with volume-time data keypoints computation and analysis. Extracting clinical PFT measures is presented in Section VI and our proposed method for scaling factor generalization is described in Section VII. The system configuration, the dataset, and the experimental results are presented in Section VIII. This paper is concluded in Section IX. A list of abbreviations used in this paper is provided in the Appendix.

II. LITERATURE REVIEW
Remote respiratory monitoring has recently become a potential solution and is attracting more researchers, especially since the availability of affordable depth sensors, such as the first generation Microsoft Kinect, and then later the Microsoft Kinect V2, which use structured light and time-of-flight techniques, respectively. While many works, referred to below, have investigated breathing rate, respiratory waveform estimation, and respiration resistance using depth sensors, we know of only our own earlier work [5] and Ostadabbas et al. [6] that applied the Kinect to PFT measurement in particular. Further, we have many more subjects and a much wider range of PFT measures.
Structure-light approaches-Ostadabbas et al. [6] applied the first generation Kinect to compute two PFT measures (FVC and FEV1) for the estimation of airway resistance, defined as lung pressure divided by the airflow. Five healthy subjects were asked to blow through various numbers of straws (to induce varied airway resistance) while their lung volume was measured over time. They instructed subjects to press their back against the wall to restrict their body movement and use a wireless mouse to timestamp their inhalation and exhalation during the test. They reported an average 0.88 correlation between their method and spirometry for the FEV1 measure.
Aoki et al. [7] proposed a non-contact respiration measurement technique, using the first generation Kinect, by extracting the volume of the thoracoabdominal region formulated on the skeleton joint positions available from the sensor. Respiration waveforms were generated by computing the changes to this volume. Their results were validated against an expiration gas analyzer and flow meter and they reported 0.98 correlation between volume change (estimated by their method) and the air flow volume (measured by an expiratory gas analyzer). Yu et al. [8] developed an elaborate calibration technique, along with a predefined chest wall mask, to approximately extract the subject's chest wall region and dimensions. The respiratory volume was estimated by using the computed length per pixel and depth information. Correlation of 0.96 was reported against a spirometer for estimating respiratory volume. Similar to [8], Seppanen et al. [9] used the first generation Kinect to estimate the respiration rate (of healthy subjects) by generating respiratory airflow waveform using several models from depth sensor data. The best coefficient of determination (R 2 ) between the spirometer signal and the estimated airflow signal was reported as 0.93. Benetazzo et al. [12] detected respiratory rates by applying a weighted averaging filter to the chest region pixels segmented by using the first generation Kinect skeleton's shoulder and torso joint positions. Their breathing rate results were evaluated against a spirometer, with an outcome of 0.98 correlation. Tahavori et al. [13] used a first generation Kinect placed above the participant's body who was required to be supine and obtained the average depth value of 16 regions of interest on the chest and abdomen over time to analyze their motion. After applying principal component analysis (PCA) to the average depth values of these regions, they demonstrated that the first principal component describes nearly 70% of the motion data variance in chest and abdomen surfaces. Other works of note that extract the respiratory rate using the structured-light-based Kinect are [14]- [18].
In an example, non-Kinect, yet structured-light approach, De Boer et al. [10] deployed two cameras as a stereo pair to capture a predefined light pattern projected onto the chest wall and estimate chest volume changes. The volume was defined as the enclosed space between the chest surface and the work bench, for which R 2 = 0.91 was reported when compared with a spirometer. The authors reported that their PFT measures correlated with the spirometer at R 2 = 0.97, but provided no further details. Time-of-flight approaches-In [19], Ostadabbas et al. proposed a noninvasive, passive method, using the Microsoft Kinect V2 and a pulse oximeter, to assess the severity of airway obstruction as mild, moderate, or severe. To estimate respiration airflow, 14 healthy subjects were asked to breathe through various straws to induce airway obstruction externally in a spontaneous breathing session while lying supine to minimize body movement effects. In a separate stage, they estimated breathing rate and tidal volume of 14 patients in a sitting position to classify their airway obstruction severity. In both parts, they asked each subject to perform some instructions, e.g., pressing the pulse oximeter buttons during the test. They reported 76.2% and 80% accuracy in detecting airways obstruction in healthy and ill subjects respectively.
Penne et al. [20] employed a time-of-flight camera and used the flat clinic bed in a calibration stage and fit a reference plane onto it. Then, with a test subject present, the two best-fitting planes were found for the chest and abdomen regions, and were used to compute the breathing signal. They compared their respiration signal against that obtained from an ANZAI belt for chest and abdomen regions and reported 0.85 and 0.91 correlation, respectively.
In our preliminary work [5], we obtained several PFT measures of FVC and SVC tests remotely using Kinect V2 depth data by computing volume-time and flow-time curves of chest volume changes. We evaluated on 40 patients by comparing their computed measures to those obtained from a spirometer. These results are reproduced in the Section VIII of this paper.
RGB video camera approaches-Tan et al. [21] proposed a single video camera approach that used image subtraction to detect the motion of the chest and abdomen regions in subjects wearing a striped pattern shirt. After applying an averaging filter, the breathing signal was obtained from the number of moving pixels given a threshold. They evaluated their results against a stain gauge, a thermistor, and a flow monitoring system, but reported only subjective assessments. Frigola et al. [22] used optical flow to detect body movement to monitor inhalation and exhalation during sleep. Although they used an elastic cloth band as their groundtruth, comparative evaluation results were not reported.
Other sensor approaches-Other example methods of note to monitor respiratory rate are Scalise et al. [23] who used a laser doppler vibrometer, and Sato and Nakajima [24] who employed a stereo system with an infrared beam fiber grating projection. Further, there have been a number of marker-based (motion capture system) clinical works [25]- [28]. These approaches are expensive and require a complicated calibration process. They mainly focus on the existence of correlation between chest wall motion and actual lung volume changes. Fig. 1 presents an overview of the proposed method. After identifying and segmenting the chest region in each depth frame of the sequence captured by the Kinect, the volume of the thoracic wall is estimated and the Kinect volume-time and flowtime curves are generated. Next, the Kinect volume-time curve is smoothed using a moving averaging filter and then keypoints are automatically computed for both the depth and spirometry measurements. After establishing linear scaling factors, needed to calibrate the curves from the depth sensor, PFT measures are computed on the depth sensor curve and their stability over multiple runs for the same subject is analyzed and compared with the spirometer measures. We show that these scaling factors are subject-specific as they relate to the natural body motion of the subject while performing PFTs. Accordingly, by investigating subjects' trunk motion patterns, we generalize intra-subject scaling factors to compute intra-subject PFT measures.

IV. CHEST MODELING AND VOLUME-TIME DATA COMPUTATION
Kinect V2 Sensor Noise-Kinect depth estimation suffers from measurement noise caused by the depth sensor technology. Since the Kinect V2 was released recently, there is little public information on the nature and characteristics of its noise. We performed a planar noise analysis to find the optimal distance range between the sensor and the subject.
In this experiment, we estimated the sensor measurement error by placing the Kinect at various distances-from 60 to 500 cm at 20 cm intervals-in front of a white wall under normal room temperature and lighting conditions, with the sensors optical axis approximately perpendicular to the wall. At each position, a sequence of 200 frames were recorded and 15 K depth values were randomly sampled from a constant-size patch at the center of the sensor's viewpoint and the standard deviation was computed for them. Fig. 2 illustrates this standard deviation in millimeters plotted against the sensor distance to the wall. It shows a nonlinear behavior similar to the general timeof-flight depth sensors [29]. Furthermore, a similar noise curve was reported by Breuer et al. [30]. Noise increases between 60 and 80 cm, and then drops to its minimum at ∼150 cm. Accordingly, we carried out all our experiments with the Kinect placed at ∼150 cm from the subject. However, noise may vary under different environmental lighting and temperature conditions and also depends on the sensor temperature itself. These factors, therefore, require the optimal distance to be recomputed for the environment the device is to be used in.
To filter noise in the measurements, an edge-preserving bilateral filter [31] was applied to each frame of our data where W p is the normalization factor, G σ s is the spatial Gaussian kernel, G σ r is the range Gaussian kernel, p and q are the locations of central and neighbor pixels, p − q is the Euclidean distance between pixel locations p and q, and I is the image to be filtered. The range parameter σ r of the bilateral filter was determined to be 1.5, which is approximately equal to the standard deviation of distance measurements obtained by the Kinect at the chosen distance of ∼150 cm. In particular, this value was selected, as in [32] according to the level of noise at this distance, to optimize the performance of the range component of the bilateral filter. For the spatial filter, we select W f = 13, which guarantees a good trade-off between accuracy and processing speed, also reported by Camplani et al. in a similar filtering approach. Consequently, σ s = W f /6, such that the significant part of the Gaussian kernel (up to 3σ s ) is completely included within the selected window W f [33].
Smoothed volume-time curve-The volume-time curve was obtained for each sequence by estimating the chest volume as a function of time. Smoothing of the volume-time curve, in one form or another, is routinely applied in all other works, for example in [19] and [34]- [36]. Here, although the bilateral filter was applied to each frame of the depth sequence, the volume-time curve still remained considerably noisy (see Fig. 3) as the chest volume is estimated temporally in a very limited chest wall motion, i.e., ±2.5 cm approximately. Thus, we used a noncausal moving average filter, which is a low-pass finite impulse response (FIR) filter [37], to eliminate high frequency noise of the Kinect volume-time curve where V in (k) and V out (k) are the input and filtered volume-time curves, respectively, and N is the averaging window size, which is computed as N = 15 based on the filter cut-off frequency of 1 Hz [38]. The cut-off frequency was chosen according to the range of respiratory rates (frequency) for healthy adults at 12-20 breaths/min (0.2-0.34 Hz) [39], elderly at 16-25 breaths/min (0.27-0.42 Hz) [40], and those with severely pulmonary disorders at 36 breaths/min (0.6 Hz) at most [41]. The computed range of respiratory rates for the 85 patients of our dataset, at 8-32 breaths/min (0.13-0.53 Hz), satisfies the chosen cut-off frequency of 1 Hz.

3-D Modeling of Thoracic
Wall-After obtaining a point cloud representing the captured scene from the filtered depth images, a subject's chest area was segmented automatically using body joints estimated by Kinect software (SDK2.0), defined by ShoulderRight, ShoulderLeft, SpineShoulder and SpineMid joint positions. The chest wall surface was then reconstructed by applying a 2-D Delaunay triangulation [42] on the point cloud (see Fig. 4(a)).
3-D-chest-model-based volume estimation-Given the 2.5 D data, we proposed in [5], a method to approximate the chest volume by computing the volume between the model of the thoracic wall and a reference plane at a predefined distance from the camera. Our approach is sufficient to compute the volume-time curve V (t) that models variations in the approximated volume, based on the assumption that body movements are minimal during PFT and can be ignored. 1 The reconstructed chest wall surface was then enclosed by surrounding lateral surfaces and a reference plane (see Fig. 4(b)), and its volume was estimated using the Divergence Theorem. More information about our volume estimation can be found in [5].
Chest-averaging-based volume estimation-Similar to previous approaches [6], [12], [15], [17], [18], we also estimated the uncalibrated chest volume at time point t by computing the average distance of each pixel located in the chest region. Chest-averaging is simple and fast to compute.
We report results using both the 3-D-chest-model-based [5] and chest-averaging methods in Sections VIII-B and VIII-D.
V. VOLUME-TIME DATA KEYPOINTS AND ANALYSIS All PFT tests start with a few cycles of normal breathing, called tidal volume, followed by the intended lung function test, called main effort. Since our Kinect volume-time data measures the chest volume in cubic meters (m 3 ) relative to an arbitrary plane, as opposed to the spirometer's air volume measure in liters, we need to linearly scale the y-axis in the volume-time curves (using computed scaling factors) to enable the correlation of computed measures. Note that this is not to imply that the Kinect truly measures lung volume: Chest volume is a proxy for the amount of air within the lungs that we show is linearly related to air flow as measured by spirometry.

A. Keypoints Computation
Several keypoints were automatically computed from the volume-time curves to (a) identify tidal volume and the main effort, (b) establish scaling factors, and (c) compute PFT measures. Five keypoints are required for separating tidal volume and main effort in the FVC and SVC volume-time curve V (t), which are named as {C, D} (beginning and end of tidal volume) and {E, A, B} (beginning to the end of main effort), as illustrated in Fig. 5.
In order to compute keypoints correctly, first, we need to find the FVC and SVC volume-time curve extrema that identify respiratory cycles during the PFT test. Since the curve can be noisy (e.g., because of chest movement and coughing), local minima or maxima may be incorrectly selected. To avoid false local extrema, the difference between two consecutive turning points, which are introduced as local extrema, needs to be greater than a threshold γ. Considering V min and V max as the smallest and greatest estimated chest volume in a sequence (volume-time curve global minimum and maximum), [V max − V min ] indicates the maximum volume of exchanged air that occurs during main effort. A fraction of this exchanged volume is defined as γ to identify local extrema, i.e., γ = 1 where ρ is defined as the ratio of the greatest exhaled air during main effort (6.8) to the smallest exhaled air during tidal volume (0.35) among all sequences, which is ρ = ∼20.
Note that SVC volume-time curve presents inhalation and exhalation in the opposite direction to the FVC volume-time curve. This means, while an increase in FVC volume-time curve corresponds to exhalation, it indicates inhalation in the SVC volume-time curve. This is similar to the volume-time curves obtained from the spirometer.
FVC keypoints-In FVC, keypoints D and E are coincident in V (t). Since lungs always contain a residual air volume, the amount of exhaled air volume in deep expiration is greater than inhaled air in a deep inspiration. Hence, keypoints A and B, indicating the beginning and end of deep expiration, respectively, are more detectable than other points. They were extracted, timestamped t A and t B , respectively, as a pair of consecutive minimum and maximum points with the largest change in volume between them during expiration, such that where X and Y are the sets of volume-time curve extrema computed as minima and maxima, x(.) ∈ X and y(.) ∈ Y , t x i and t y i are each minimum and maximum corresponding timestamps, and n is computed as The local maximum directly before t A was selected as E (and thus D). The first extremum of the curve was selected as C.
In addition to the volume-time curve, we also used the flowtime curve to compute some FVC measures. The flow is defined as the rate of changing volume, i.e.,V (t) = ∂ V ∂ t . FVC peak flow and time zero-To compute some FVC test measures, such as FEV1, we also needed to compute the Peak Flow (PF) point and "time zero" t 0 (see Fig. 6). PF is the point at t PF with the maximum air flow speed during main effort exhalation Since FEV1 is a timed PFT measure, instead of keypoint A (timestamped t A ), a starting "time zero" t 0 keypoint is used for computing FEV1 (see Fig. 6). This is because keypoint A is  affected by hesitant or delayed exhalation in the main effort maneuver leading to an incorrect and decreased FEV1 value. After subtracting V (t A ) from the estimated volume, t 0 is computed using the back-extrapolation approach [1] where notations are similar to (3) and m was computed as Here, in the volume-time curve V (t), inhalation in SVC shows as exhalation in FVC. Thus, we still used the exhalation part of the main effort, which is more reliable, to extract B and A, similar to the FVC test. Keypoint E marks the beginning of inhalation in main effort and is determined as the local minimum directly before t B . Like FVC, C is chosen as the first extremum of the curve and D is the local maximum directly before t B . For computing SVC measures, four maxima and four minima keypoints (see F i and G i in Fig. 5(b)) from the tidal volume part are also required.

B. Tidal Volume Analysis and Calibration
To be able to extract PFT measures from the Kinect volumetime curve and compare them with those given by the spirometer, and thus evaluate our proposed method, we needed to (a) temporally align Kinect and spirometer volume-time curves, (b) compute scaling factors, and (c) use them to calibrate the Kinect volume-time curve. We perform alignment and scaling separately for the tidal volume and main effort parts, to take into consideration any inevitable trunk movement when subjects take a deep inhalation, followed by a maximal exhalation.
After selecting the tidal volume parts of the Kinect and spirometer volume-time curves using the C & D keypoints, we performed some preprocessing operations on these two subsignals to allow them to be directly compared. The spirometer subsignal was sampled at the Kinect sampling rate of 30 Hz. Both signals are normalized to zero mean. Finally, the two subsignals were synchronized by computing the optimal time delay using windowed cross correlation where V * k (t) and V s (t) denote the complex conjugate of Kinect normalized tidal volume and spirometer subsampled and normalized tidal volume curves, respectively.
The tidal volume scaling factor can be computed using only a pair of consecutive minimum and maximum points [5], however, this is not very reliable. We modeled it with a first degree polynomial, V s = ξ tv · V k + ψ tv , where V s and V k are subsampled and aligned Kinect and spirometer tidal volume data, ψ tv is the offset between the Kinect and spirometer tidal volume parts, and ξ tv presents the tidal volume scaling factor. Since the Kinect and spirometer tidal volume parts were mean zero normalized, then ψ tv ≈ 0.
However, in many cases, this approach is insufficient to deal with an incremental or decremental trend in the data that can appear in one or both of the Kinect and the spirometer data. Fig. 7(a) shows example Kinect and spirometer tidal volume curves each plotted on a different scale, with the left y-axis for the uncalibrated Kinect volume and the right y-axis for the spirometer volume (L). Both curves exhibit such a trend that makes the extraction of a correct scaling factor (or an alignment process) a cantankerous task (see Fig. 7(b)). This trend might occur because of one or more reasons: the use of a nasal oxygen mask by patients during the test (which affects only the spirometer data), lung hyperinflation, or the subject's body movements.
A simple approach to modeling the trend to help eliminate it would be a linear regression model. However, we found this to be insufficient due to the nonlinear nature of the trend, thus we applied Empirical Mode Decomposition (EMD) [43] to estimate the trend more accurately. EMD is an adaptive method to decompose a nonlinear and nonstationary signal in the time domain into its individual components (Intrinsic Mode Functions or IMFs) and a residual r, from which no more IMFs can be extracted and can be said to represent the signal's trend (10) Fig. 8(a) and (b) presents the first three IMFs and the residual of a tidal volume curve (where the residual displays the signal trend), and the modified tidal volume curve after applying EMD. Fig. 7(c) shows the Kinect and spirometer tidal volume curves with their trend estimated and removed by EMD, and the Kinect curve has been calibrated using the correct tidal volume scaling factor. Note, we used the modified tidal volume curves to compute scaling factors only and other analysis were performed on the original Kinect and spirometer data.

C. Main Effort Analysis and Calibration
As stated in Section V-B, the Kinect and spirometer volumetime curves were aligned by using only their tidal volume sections to avoid errors arising from the subject's upper body movement during main effort. Then, the main effort scaling factor (ξ me ) was obtained by solving V s = ξ me · V k + ψ me , using only the A & B keypoints on each signal as they are less affected by motion artifacts and thus more reliable. Unlike in the tidal volume calibration process, where ψ tv was zero, ψ me here correlates with body movement and appears as an offset along the y-axis. However, in scenarios where subjects are stationary during the  whole test (e.g., see Fig. 5(b)), then ψ me ≈ 0 and there is no offset between the tidal volume and main effort parts.
We calibrated the tidal volume and main effort parts individually and generated two calibrated Kinect volume-time curves. For the first (tidal volume calibrated), the whole Kinect volumetime curve is scaled by multiplying by the tidal volume scaling factor ξ tv , as computed in Section V-B. Then, it was vertically aligned with the spirometer tidal volume part by making both the Kinect and spirometer tidal volume part zero mean, as shown in Fig. 9(a). For the second (main effort calibrated), the whole Kinect volume-time curve was scaled by multiplying by the main effort scaling factor ξ me , computed in this section, and vertically aligned with the spirometer tidal volume part by adding the main effort offset ψ me to all Kinect volume-time data, as shown in Fig. 9(b).

VI. COMPUTATION OF CLINICAL PFT MEASURES
FVC measures-Within an FVC spirometry test, several clinical measures are provided by the spirometer software. In addition to these numerical measures, there are two common "qualitative" presentations of lung function test, i.e., volumetime curve and flow-volume loop (see Fig. 10(a) and (b)), that pulmonologists often use these graphs to visually diagnose problems in the patient's breathing function.
SVC measures-Within an SVC test, four clinical measures are provided by the spirometer software, and only one "qualitative" presentation of lung function, i.e., the volume-time curve (see Fig. 11), which we compute on Kinect volume-time data as follows: 1) VC (vital capacity) as the volume change between full inspiration and complete expiration between keypoints B and A, i.e., V C = V (t B ) − V (t A ) ; 2) IC (inspiratory capacity) as the volume change between taking a slow, full inspiration, and the passive end-tidal expiration, i.e., difference of volume at keypoint B and the average volume at group keypoints G within the tidal volume section 3) TV (tidal volume) as the volume of air inspired and expired at rest condition, i.e., the average volume difference between group keypoints F and G and 4) ERV (expiratory reserve volume) as the volume change between passive end-tidal expiration and complete expiration, i.e., difference of the average volume at group keypoints G within the tidal volume section and volume at keypoint A Note that, based on spirometry experiment protocols [1], each FVC and SVC test should be repeated several times (at least three) to ensure consistency.

VII. SCALING FACTOR GENERALIZATION
So far we have shown that we can compute PFT measures from the Kinect volume-time and flow-time curves that have been calibrated by applying scaling factors computed using the corresponding spirometer volume-time curve. We refer to this as an "intra-test" procedure. However, we need to remove this dependence, so we can compute PFT measures for a new trial 2 using only Kinect volume-time data, i.e., a more practical 'intrasubject' procedure.
As the change in the distance of the Kinect to a subject's thoracic wall is directly related to the change in their lung volume, our scaling factors are specific to each subject. In theory, this relationship should remain unchanged for a subject who performs a test several times (even on different days) with the same system configuration. However, in practice, this is only true for the tidal volume scaling factors, but not for the main effort scaling factor due to the subject's trunk motion. Since there is no significant movement during tidal volume, it should be possible to detect body movement during main effort by comparing scaling factors ξ tv and ξ me . However, even when ξ tv and ξ me are very similar (i.e., ξ tv /ξ me ≈ 1), which implies there is no torso motion, the Kinect volume-time curve might still be affected by body movements. This can be categorized in two ways: 1) backward motion at the beginning of deep inhalation (between E and A keypoints) for FVC and SVC tests and 2) forward lean at the beginning or middle of the deep and fast exhalation (after A in both tests), and then a move back at the end of exhalation such that it compensates the first forward lean-which might be also accompanied by the motion pattern in 1) as well. Fig. 12(a) and (b) presents two examples of volume-time curves related to categories 1) and 2) and their scaling factors. The effects of similar motion artifacts on chest volume estimation have also been reported by Yu et al. [8], Ostadabbas et al. [19], and Soleimani et al. [5] previously.
The similarity of the motion patterns of trunk movements across different trials of a subject allows us to estimate the best matching scaling factors for calibrating the Kinect volume-time curve of a new trial. This means that unless there is unexpected body movement, we can train our system to learn the tidal volume and main effort scaling factors for each subject, which enables us to compute PFT measures directly from the Kinect volume-time curve without using spirometer data when testing.
Training phase-We used training data, provided as pairs of corresponding Kinect and spirometer volume-time curves from training trials, to compute training tidal volume scaling factors ξ tv n tv =1 and training main effort scaling factors and offsets (ξ me , ψ me ) n me =1 , as explained in Sections V-B and V-C. n tv and n me are number of tidal volume and main effort training trials.
Testing phase-We calibrated the Kinect volume-time curve of a test trial by applying the best matching scaling factors and offsets learned from the training phase. Our analysis showed that because the spirometer volume-time curve is always correct, then similar Kinect volume-time curves can be calibrated using similar scaling factors and offsets. Thus, to calibrate the test Kinect volume-time curve, we found the best matching scaling factors and offsets from the training phase using the curve similarity measures where V k (t) is the original Kinect volume-time curve, and t A , t B , t F i , and t G i are automatically computed keypoint timestamps, as introduced in Section V-A. For the FVC test, the estimated main effort scaling factor ξ me was computed as  where F test me denotes the main effort curve similarity measure extracted from the test Kinect volume-time curve in (15), F j me is the same measure for the jth training Kinect volume-time curve, j denotes different trials, ξ me n FS =1 states the training main effort scaling factors, and n FS is the total number of training FVC and SVC trials for this subject. Since vital capacity, i.e., V s (t A ) − V s (t B ) , is equal for FVC and SVC tests (notwithstanding the reproducibility measurement error), we also used training SVC trials to estimate the best matching scaling factors for the FVC test trial. As no measure is computed from the tidal volume section in FVC tests, F tv was not extracted and therefore ξ tv was not computed.
Similarly, for the SVC test, the estimated tidal volume scaling factor ξ tv and the estimated main effort scaling factor and offset (ξ me , ψ me ), were computed as where n S is the total number of only SVC training trials. We do not use the tidal volume section of the FVC volume-time curves in the estimation of ξ tv because the tidal volume breathing cycles are too short in FVC tests and are not reliable for computing F tv and consequently the tidal volume scaling factor. Note that in all FVC and SVC tests, ψ tv ≈ 0.
After calibrating the Kinect volume-time curve of the test trial using the estimated tidal volume and main effort scaling  The proposed scaling factor generalization was evaluated using leave-one-out cross validation, which repeatedly takes one trial as the test and the rest as the training data. Leave-one-out is a more suitable approach, instead of k-fold cross validation or other conventional validation methods, due to the limited number of FVC and SVC trials for each subject.

A. System Configuration and Data Acquisition
In each acquisition, the subject was asked to sit up straight on a chair without armrests, facing the Kinect placed at a distance of 1.5 m away from the subject and at a height of 0.6 m (see Fig. 13). This distance was chosen based on our study in Section IV. The subject was asked to put on a reasonably tight Tshirt to help improve the tracking accuracy of chest motion. Although putting subjects in supine position would have restricted their body movement during the Kinect test, we preferred to perform the test in the sitting position to simulate the spirometry setup. Moreover, it was difficult for fragile COPD patients to accomplish the main effort part of the test correctly in supine position.
The instruments used in our experiments were the Kinect V2 Microsoft depth sensor and the "HDpft 1000 High Definition" spirometer, which provides raw volume-time and flow-time data at 200 Hz for FVC and 50 Hz for SVC. For validating the proposed method, we compared our results with measures taken from the spirometer software.
Following ethical approval, we collected 529 Kinect and spirometry sequences on 85 patients attending respiratory clinic at Southmead Hospital in Bristol with a range of lung pathologies as they underwent their routine spirometry tests. The collection spanned several months between March and July of 2015. For each subject at least three FVC and three SVC efforts were recorded. The 36 male and 49 female patients were aged between 24 and 83 years old (mean of 61.7), height of between 147.9 and 191.2 cm (mean of 166.2 cm), weight of between 19.1 and 146.8 kg (mean of 77.9 kg), and BMI of between 6.9 and 45.7 kg/cm 2 (mean of 28.1).

B. Intra-Test Results
Tables I and II report the 3-D-chest-model and the chestaveraging correlation coefficients (λ v and λ m ) between the Kinect and the spirometer for all FVC and SVC test measures, along with the mean (μ v and μ m ) and standard deviation (σ v and σ m ) of the L 2 error for all 85 subjects (529 sequences). For each measure, we also report the ratio of the mean of the L 2 error to the mean value of that measure (Ω v and Ω m ). These tables also present our previous results from [5] on 40 subjects (247 sequences). We note that the quality of the data for the first 40 subjects was very similar to the next 45 subjects (we verified this by observing the similarity of the correlation results for the two sets). This was expected as all the data were captured under similar conditions in the same clinic. The results show that the Kinect and the spirometer correlate well for the FEV1 measure in the FVC tests and across all the SVC measures. The correlation amongst the other FVC measures is less strong due to the potential issues described later in Section VIII-E. The results from both volume estimation methods are very close, with those from the chest-averagingbased method just edging ahead. This confirms that the 3-D-chest-model volume estimation method, with its greater space requirements and time complexity, does not necessarily obtain better results than the simple and fast chest-averaging approach. The FVC and VC results (gray background rows) are highly correlated due to the rescaling of the y-axis in the volume-time curves using their respective keypoints (A and B).
In comparison to our previous work [5], where we performed only intra-tests for 40 patients, the proposed method achieved extremely similar, if not better, results. For example, we obtained reduced mean error (in μ m ) for all measures except VC and FEF 25-75% and improvement in TV measure correlation coefficient (λ v and λ m ) and mean error (μ v and μ m )-across 85 patients including the same 40 from [5].

C. Intra-Subject Results
Generalizing the scaling factor to compute intra-subject FVC and SVC measures is one of the major extensions in this study  compared to our previous work [5]. Tables III and IV present the correlation coefficients (λ v and λ m ), and the mean (μ v and μ m ) and standard deviation (σ v and σ m ) of L 2 error for FVC and SVC computed measures for all 85 subjects. It also reports the ratio of mean of the L 2 error to the mean value of that measure (Ω v and Ω m ). Similar to the intra-test results, the chest-averaging-based method provides slightly better results.
The FVC test results in Table III, λ v and λ m indicate strong correlation of the FVC and FEV1 measures against the Fig. 18. The proposed method's error increases due to a subject's inevitable trunk movement while they blow harder and faster into the spirometer to achieve higher (better) PEF and FEF 25% measures. spirometer, with the other five measures correlating reasonably well at a minimum of 0.603 for FEF 75% in the chest-averaging model. Furthermore, good correlation can be seen between the intra-subject and intra-test FVC measures (see Tables I  and III).
The SVC results λ v and λ m in Table IV also show strong correlation against the spirometer for VC, IC, and TV measures and good correlation for ERV. However, the differences between intra-subject mean (μ v and μ m ) and standard deviation (σ v and σ m ) of errors (see Table IV) and their intra-test counterparts (μ v and μ m , and σ v and σ m from Table II) are higher than these differences in FVC test. This is because SVC requires two scaling factors for the tidal volume and main effort parts of the curve, in addition to estimating the offset ψ me .

D. Statistical Analysis of Intra-Subject Scaling Factors
The tidal volume and main effort test trials are calibrated using intra-subject scaling factors ξ tv and ξ me , which are chosen from the training sets ξ tv n S =1 and ξ me n FS =1 , respectively, using (16), (17), and (18) based on the similarity measures in (14) and (15). The performance of the similarity measures, in terms of choosing the best intra-subject scaling factors from the training set, is evaluated by computing the normalized L 2 error (20) where ξ c tv and ξ c me are the closest scaling factors in the training set to the original scaling factors of the test trial ξ o tv and ξ o me . The original scaling factors were computed using the corresponding spirometer data as explained in Sections V-B and V-C. Fig. 14(a) and (b) reports the distribution of these errors for all tidal volume and main effort trials, respectively, in the range 0-30% at 5% interval and then in the entire 30-100% range. As can be seen, ∼83% of tidal volume scaling factors and ∼83% of main effort scaling factors are within an error of less than 10%. Only ∼2% of tidal volume scaling factors and ∼1% of main effort scaling factors have errors of greater than 30%.
Further, for each test trial, to compare the estimated intrasubject tidal volume and main effort scaling factors ξ tv and ξ me to the original scaling factors ξ o tv and ξ o me , their normalized L 2 error is computed similar to (19) and (20). As seen in Fig. 14(c) and (d), which presents the distribution of errors for all tidal volume and main effort trials, ∼81% of tidal volume scaling factors and ∼87% of main effort scaling factors have an error of less than 15%. Only ∼4% of tidal volume scaling factors and ∼2% of main effort scaling factors have an error of greater than 30%.
We also analyzed the correlation between the tidal volume and main effort scaling factor normalized L 2 errors E o ξ tv and E o ξ me , and error of FVC and SVC computed measures. Fig. 15(a) and (b) presents this correlation for FVC and TV measures. As can be seen, there is a high correlation between the FVC measure error and the main effort scaling factor error across all trials. This correlation is less strong for the TV measure error and tidal volume scaling factor error. The reason for this is that tidal volume scaling factors are computed using all data points of tidal volume part of volume-time curve and TV measure itself is computed using group keypoints F and G (12). However, FVC measure and main effort scaling factors are both computed using the same keypoints A and B. Thus, they are better correlated (see Fig. 15(a)) than the TV measure error and the tidal volume scaling factor error (see Fig. 15(b)).

E. Measurement Stability
It is important to note that even spirometer readings differ between multiple consecutive trials for the same subject, thus requiring at least three trials with similar readings before a clinician considers the results. This is illustrated in Fig. 16(a)-(d), which presents some examples measures (FVC, FEF 25% , PEF, and TV), provided by the spirometer and the proposed method for one subject from four consecutive trials.
To find out the correlation between spirometry reproducibility and the proposed method's error in computation of measures, we obtained the standard deviation of each measure and its corresponding error in all repeated trials for each patient. Fig. 16(e)-(h) shows the computed correlation for FVC, FEF 25% , PEF, and TV measures. These results indicate that when the measures provided by the spirometer are less consistent, the error between measures obtained by the proposed method and the spirometer increases.
The subject's body movement during a test is a primary reason for poor correlation and this is more evident in main effort measures. A specific example of how body movement (due to expiration pressure) can affect the FEF 25−75% measure is shown in Fig. 17, where the estimation of 0.75FVC is sometimes compromised. In another observation, illustrated in Fig. 18, we found that as the FEF 25% and PEF readings from the spirometer increase, our proposed method's error also increases. To the best of our knowledge, this happens as subjects try to attain better lung function measures by blowing faster into the spirometer which inevitably results in more trunk movement.
PEF and FEF 25% are more affected by the patient's trunk translation because (a) they are calculated using flow data, which is the first derivative of the volume over time and so is sensitive to displacements, and because (b) PEF and FEF 25% are located at the beginning of main effort section (see Fig. 19), which is more affected by the movement. Even subtle movements caused by leaning forward, due to forcible expiration, affect keypoint positions of these measures. In Fig. 19(a), although the main effort parts of the curves match very well, their flow-volume loop is considerably different in Fig. 19(b) between the start of exhalation and the location of the FEF 50% point.

IX. CONCLUSION AND FUTURE WORK
We proposed a remote, noninvasive depth-based approach for PFT. The proposed system generates Kinect-based volumetime and flow-time curves, and by locating several keypoints automatically, we computed several FVC and SVC measures, which we compared with a spirometer and evaluated their reproducibility. We analyzed the subject's trunk motion pattern to generalize scaling factors to be able to compute intrasubject PFT measures for each subject, without having to use a spirometer to calibrate for each trial. We validated our system in a clinical environment with 85 actual patients and achieved high intra-test and intra-subject correlation against the spirometer.
This paper is a considerable step forward in the development of remote non-contact monitoring of patients with respiratory disease. This "real world" clinical data, collected from a large group of patients with a wide range of lung function is unique. We are able to accurately obtain respiratory measures remotely, which has potential clinical applications for monitoring of patients in the home, gating (timing) of thoracic imaging, and synchronization with ventilatory support. In summary, in this paper, we have taken a vital step toward the aim of applying the Kinect as an independent surrogate for spirometry by only needing the spirometer one time for each patient to obtain a personalized scaling factor.
In our future work, we plan to use two Kinects to decouple body motion and chest motion to increase the accuracy of our PFT measures. We also plan to use machine learning techniques to generalize the scaling factors by introducing parameters such as height, weight, and age in the estimation and remove the need for subject-specific spirometry.  between engineering and clinical researchers. Data from this study are unavailable for sharing due to insufficient consent from the study participants.