|
|
|
|
Report Cover Table of Contents Sec. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. 6 Sec. 7 Sec. 8 SECTION 4 SAMPLING RESULTS AND COMPARISON 4.1 DATA PRESENTATION Field measurements collected during this demonstration are summarized in Tables 3.2 and 3.3. Laboratory analytical results are included on CD as an attachment to this report. 4.2 DATA VALIDATION A project-specific Level III data validation protocol was performed, which evaluated sample data and QC data and results summarized on AFCEE reporting forms. In performing the data validation, it was assumed that the laboratorys documentation was acceptable and that the data reported by the laboratory were an accurate representation of the raw data. The raw data were not reviewed. A complete review of the applicable data was performed, and the project-specific QAPP and the McClellan QAPP 5.0 were used as the primary tools in the validation of the data. The data quality assessment report (Appendix A) is based on the reviewed information, and on the data quality specifications of the project QAPP, as well as Sections 1-17 of the McClellan AFB QAPP 5.0 and the appended SOP McAFB-028 (Data Review Procedures) and SOP McAFB-029 (Data Validation Standard Operating Procedures). In accordance with the Work Plan (Parsons, 2004a) and as described in Section 3.4, QA/QC samples were collected during this demonstration. These samples included field duplicate, MS/MSD, equipment rinseate, source water blank, purified water, and trip blank samples. A brief summary of the data validation results is provided in the following paragraphs, and more complete details are presented in Appendix A.
Some data quality issues were noted either in the laboratory case narratives or during the data validation process. Despite these issues, nearly all of the validated data were deemed usable for the intended purposes (only one result was rejected) based on this validation. The reader is directed to Appendix A for a detailed discussion of the data validation results. 4.3 WELL-SPECIFIC DATA PLOTS Figures were prepared that present the concentrations of selected analytes in each well, as reported for each sampling method used and for each sampling depth (shallow, intermediate, and deep). These figures are included in Appendix B. Graphs were prepared for one VOC of concern (trichloroethene [TCE]), one anion (sulfate), one reduction-oxidation (redox)-sensitive metal (iron), one metal that is less redox-sensitive (zinc), 1,4 dioxane, and hexavalent chromium. Results for the three-volume purge are shown using a vertical line across all depths since that method is not depth specific. Results for the low-flow purge are shown as a single point located at the intermediate depth, despite uncertainty about the depth-discrete nature of a low-flow sample. When a low-flow sample is collected, determining the portion of the screened interval of the aquifer that contributed water to the sample can be problematic. As noted in Section 4.4, for instances where more than one value was available per comparison, the maximum value was used in the sampling results comparison. 4.4 SAMPLING RESULTS COMPARISON Numerous potential methods of data evaluation are possible due to the relatively large amount of analytical data and number of comparisons. For this report, three different types of evaluation processes were used to compare the data sets:
Each of these processes was applied with the objective of identifying general trends or tendencies present in the data sets. After all of the processes were applied, overall conclusions related to sampler performance were made. In this comparative analysis, the results for each sampler type were compared to the corresponding results (i.e., same well, same depth, same analyte) for each of the other sampling methods. Additionally, the analytical data set was subdivided into the following six categories for comparison purposes:
Each sampler-to-sampler comparison was performed for each of the analytical subgroups listed above, resulting in a total of 113 dataset comparisons. The quantitative evaluation processes described in Sections 4.4.1 and 4.4.2 (Conventional Statistical Methods and Other Quantitative Comparative Tools, respectively) were applied to each of these 113 comparison instances. The results of these quantitative comparisons were then considered using a holistic qualitative review to derive final conclusions about each specific comparison. Prior to applying the statistical analysis tools, the datasets used for comparison were pared down through the application of several logical filters. These filters are described below.
Tables 4.1 through 4.6 show the number of data pairs that were available for each comparison after all filters had been applied to the data set. For the quantitative evaluation processes, a lower confidence or meaning was ascribed to comparisons with fewer data pairs. In Tables 4.1 through 4.6, instances where less than 10 data pairs were available for a particular comparison were highlighted in red in the Number of Comparisons column. TABLE 4.1 TABLE 4.2 TABLE 4.3 TABLE 4.4 TABLE 4.5 TABLE 4.6 4.4.1 CONVENTIONAL STATISTICAL ANALYSES The distribution of the data was evaluated in order select the most appropriate statistical methods to apply to the data. Conventional statistical methods were then selected and used to evaluate the data sets that were being compared. 4.4.1.1 DATA DISTRIBUTION Each of the data sets was tested for normality in order to determine whether parametric or non-parametric statistical tests were appropriate for the data analysis. The Shapiro-Wilk's W test was used to determine data distribution. Several groupings of data were evaluated for normality as described below. Initially, data sets for each of the eight different sampling methods were tested for each of the six different compound groupings (all data combined, 1,4 dioxane only, anions only, hexavalent chromium only, metals only, and VOCs only). Data used for normality testing in this application included both primary and field duplicate sample results; results that were not detected or rejected during data validation were excluded. Additionally, since the difference between two sampling methods was the end-use of the data for comparison purposes, the Shapiro-Wilks W test also was applied to the populations of differences between two sampling methods being compared. Data used for this variance of normality testing were taken from the pared-down data sets described in Section 4.4. As an example of this variance of normality testing, each time there was an available comparison between two sampling methods (e.g., all VOC concentrations at the shallow sample depth in well MW-1065 obtained using the PDBS and RCS sampling methods), the difference between those two concentrations was calculated. After the differences were calculated for all possible comparisons of analytical results obtained using those two sampling methods, the Shapiro-Wilks W test was then applied to that population (e.g., PDBS versus RCS for VOCs only). The results of all normality tests are included in Appendix C. Tests for normality failed (i.e., data sets are not normally distributed) for almost all of the data subsets evaluated. Some exceptions were noted (see Appendix C) but were due mostly to small sample populations (e.g., 1,4 dioxane in the difference between PsMS and RPPS results). The overall lack of normally-distributed data sets supports the use of nonparametric statistical tools as described below. 4.4.1.2 WILCOXON MATCHED-PAIRS SIGNED RANKS TEST The Wilcoxon Matched-Pairs Signed Ranks Test (Wilcoxon test) was applied to determine if two dependent variables (e.g., RPPS and HydraSleeve® analytical results) represent two different populations. The Wilcoxon test is a nonparametric procedure used in hypothesis testing when one or more of the assumptions of the students paired ttest (e.g., normal population distributions) are violated. The Wilcoxon test determines if the median of the differences between the pairs of data (e.g., the RPPS measurement and the HydraSleeve® measurement for a given well [ D ? ]) is equal to zero. If a significant difference is obtained, it indicates that there is a high likelihood that the two data sets represent different populations. A test statistic (the Wilcoxon T statistic) is calculated and associated with a p-value (and corresponding confidence level). For example, a Wilcoxon T statistic that resulted in a p-value of 0.03 would correspond to a confidence level of 97 percent that the two samples represent two different populations. Tables 4.1 through 4.6 include summaries of the results of the Wilcoxon test analyses. Values presented in these tables are the confidence level (i.e., 1 minus the p-value) that the two sampling methods represent different populations. For this analysis, if the confidence was greater than or equal to 90 percent, the two populations were deemed to be different at a statistically significant level, and are highlighted in yellow. 4.4.1.3 SIGN TEST The sign test is a nonparametric alternative to the students paired t-test for dependent samples. The test is applicable to situations where the researcher has two measures (e.g., under two conditions) for each subject and wants to establish whether or not the two measurements (or conditions) are different. The only assumption required by this test is that the underlying distribution of the variable of interest is continuous; no assumptions about the nature or shape of the underlying distribution are required. The test simply computes the number of times (across subjects) that the value of the first variable (A) is larger than that of the second variable (B). Under the null hypothesis, which states that the two variables are not different from each other, this is expected to be the case about 50 percent of the time. Based on the number of observed cases where A is greater than B, a p-value and associated confidence level can be calculated for the data set. Tables 4.1 through 4.6 include summaries of the results of the sign test analyses. Values presented in these tables are the confidence that the two sampling methods represent different populations. For this analysis, if the confidence was greater than or equal to 90 percent, the two populations were deemed to be different at a statistically significant level, and are highlighted in yellow. The results of the sign test also indicate the percentage of times that non-equal values are greater than or less than the comparative set of values (i.e., the percent of times that values in population A were greater than values in population B). These values also are shown in Tables 4.1 through 4.6. 4.4.2 OTHER QUANTITATIVE COMPARATIVE TOOLS In addition to the traditional statistical tools discussed in Section 4.4.1, two other quantitative tools were used to compare the various combinations of data sets. 4.4.2.1 LINEAR REGRESSION The results for each sampling method were plotted against the corresponding results for each of the other sampling methods using X-Y scatter plots. Best-fit linear trend lines were then fitted to these data sets, and the slope and goodness-of-fit (R2) value for each line was calculated. Best-fit linear trend lines were fitted to each of the subgroups of compounds/analytes listed in Section 4.4. These plots are included in Appendix D. Slopes that are close to 1 suggest that the average correlation between both sampling devices being compared approaches a 1 to 1 ratio, whereas higher or lower slopes suggest that one sampling method is more likely to result in higher or lower concentrations than the other method. Likewise, the closer the R2 value is to 1, the better the fit of the data to the trend line, and the lower the degree of scatter of data about the best-fit linear trend line. Tables 4.1 through 4.6 include summaries of the slope and R2 values for each of the figures discussed above. The slope and R2 values shown in Tables 4.1 through 4.6 are highlighted in yellow to indicate when the two populations being compared were deemed to not be similar to each other based on the magnitude of the slope. The following guidelines were followed when applying highlighting to these values: Slope Guidelines
R2 Guidelines
The threshold values described above were selected somewhat arbitrarily, but also were based on a qualitative review of the data as described in Section 4.4.3. The guidelines established for R2 values were used primarily in the qualitative evaluation. 4.4.2.2 MEDIAN RPD Another quantitative analysis tool applied to the data sets is referred to as the median relative percent difference (RPD). The first step in this analysis is to calculate the RPD of each data pair using the following equation:
Where:
A positive RPD indicates that the result from sampling method A is higher than the result from sampling method B, while a negative RPD indicates the opposite. RPDs close to zero generally indicate that results from both sampling methods were similar. Once all the RPDs were calculated, the median of the RPDs for each data comparison group was calculated by ranking the RPD values from lowest to highest and choosing the middle value of the ranked set of calculated RPDs. If the number of RPD values was even, then the median was selected as the mean RPD of the middle two values. This median RPD was then used as an indicator of the comparability of the two sampling methods for each compound/analyte subset. A positive value for the median RPD indicated that sampling method A results were more frequently higher than sampling method B results (the reverse is true for negative values). Additionally, the closer the median RPD was to zero, the more likely the two sampling methods returned similar results (essentially, for every time sampling method A was greater than sampling method B, there were an equal number of times where sampling method B was greater than sampling method A). Conversely, if the median RPD was much greater than or less than zero, the more likely one sampling method was to return results that were significantly greater than the other method. For this analysis, a median RPD that was either greater than or equal to 10 or less than or equal to -10 was considered to indicate that one method was more likely to return a meaningfully higher (or lower) concentration than the other sampling method compared. Median RPD values between 10 and -10 were considered to indicate that both sampling methods returned similar concentrations. As with the guideline values described for the linear regression analysis, these values were selected somewhat arbitrarily, but also were based on a qualitative review of the data as described in Section 4.4.3. Tables 4.1 through 4.6 include summaries of the results of the median RPD analysis. RPD results greater than or less than 10 are highlighted in yellow. 4.4.3 HOLISTIC QUALITATIVE ASSESSMENT Each of the statistical analyses described above was applied to the 113 possible comparison combinations. Of the 113 possible comparison combinations, 26 (23 percent) had sufficiently small populations (i.e., fewer than 10 data pairs) that the results of the statistical analyses are not considered to be particularly meaningful. Of the remaining 87 combinations, there were 41 instances where both the conventional statistical and other quantitative comparison tests resulted in consistent observations. Conversely, there were 46 instances where the results of each of these tests were not internally consistent. If the results of each of the four quantitative comparisons were consistent for a particular comparison (as shown in Tables 4.1 through 4.6 by having consistent highlighting of all four comparative test results), the resulting observation was validated and deemed correct without further review. The resulting observation is shown in Tables 4.1 through 4.6 under the column titled Holistic Conclusion. For those instances where the results of the four quantitative analyses varied, the results of the two populations being compared were scrutinized qualitatively, and a general conclusion regarding the comparison was made based not only on the results of the statistical analyses, but also on professional judgment. For example, some of the following criteria were considered during the holistic qualitative evaluation.
A discussion of the comparison results is presented in Section 6. However, if the reader is interested in better understanding the comparison results for particular analytes and/or sampling methods, they are encouraged to perform a detailed review of all of the comparison results presented in Tables 4.2 through 4.6 rather than limiting their review to the holistic conclusions. For example, the holistic conclusion for comparison of VOC concentrations obtained using the three-volume purge and HydraSleeve® methods is that these methods provided essentially equivalent results. The orange highlight indicates a lower degree of confidence in this conclusion because the results of all of the comparison tests were not internally consistent. Further inspection of the comparison results shown in Table 4.6 indicates that the Sign and Wilcoxon tests both indicated that the two data sets are statistically similar. However, the RPD and X-Y Scatter Slope/R2 tests both indicate differences in the data sets. The slope result (0.59) indicates that the VOC concentrations obtained using the 3-volume purge method tended to be higher than the concentrations obtained using the HydraSleeve®. However, the relatively low R2 value (0.50) indicates a high degree of scatter about the best-fit trend line and a correspondingly low confidence that the slope value is accurate and meaningful. Therefore, some comparisons termed equal in the holistic sense are more equal than others (i.e., equal defines a range of conditions rather than one specific condition). The combined (i.e., all data) results presented in Table 4.1 are solely for illustration purposes as a way to provide a summary analysis of the entire evaluation. However, these summary data may be misleading when compared with the results for the individual analytes or analyte groups presented in Tables 4.2 through 4.6. Tables 4.2 through 4.6 should be used to evaluate a particular sampling methods utility for a specific analyte or analyte group. Report Cover Table of Contents Sec. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. 6 Sec. 7 Sec. 8
|