Hydrasleeve No-purge Passive Sampler
U.S. Patents No. 6,481,300; No. 6,837,120; others pending

Hydrasleeve No-purge Passive Sampler - Simple by Design


 

Report Cover   Table of Contents   Sec. 1   Sec. 2   Sec. 3   Sec. 4   Sec. 5   Sec. 6   Sec. 7   Sec. 8

Download the Report PDF (874 KB)
("Right-click", then "Save as..." to your desktop)
View Report Appendices

SECTION 4

SAMPLING RESULTS AND COMPARISON

4.1 DATA PRESENTATION

Field measurements collected during this demonstration are summarized in Tables 3.2 and 3.3. Laboratory analytical results are included on CD as an attachment to this report.

4.2 DATA VALIDATION

A project-specific “Level III” data validation protocol was performed, which evaluated sample data and QC data and results summarized on AFCEE reporting forms. In performing the data validation, it was assumed that the laboratory’s documentation was acceptable and that the data reported by the laboratory were an accurate representation of the raw data. The raw data were not reviewed. A complete review of the applicable data was performed, and the project-specific QAPP and the McClellan QAPP 5.0 were used as the primary tools in the validation of the data.

The data quality assessment report (Appendix A) is based on the reviewed information, and on the data quality specifications of the project QAPP, as well as Sections 1-17 of the McClellan AFB QAPP 5.0 and the appended SOP McAFB-028 (“Data Review Procedures”) and SOP McAFB-029 (“Data Validation Standard Operating Procedures”).

In accordance with the Work Plan (Parsons, 2004a) and as described in Section 3.4, QA/QC samples were collected during this demonstration. These samples included field duplicate, MS/MSD, equipment rinseate, source water blank, purified water, and trip blank samples. A brief summary of the data validation results is provided in the following paragraphs, and more complete details are presented in Appendix A.

  • Accuracy is considered acceptable for all VOC, 1,4 dioxane, and anion results, all but one hexavalent chromium result, and all metals results with the exception of the aluminum result in several samples.
  • Overall precision (sampling and analysis) is considered to be acceptable for all parameters, recognizing that, as shown in Table 3.7, a field duplicate HydraSleeve® sample was not collected. Therefore, information regarding precision of the HydraSleeve® sampling process is not available.
  • Analytical precision is considered to be acceptable, recognizing that in the instances where project samples were not analyzed as MS/MSDs, measurements of accuracy and analytical precision based on MS/MSD results were not developed for samples collected using a given sampling technology.
  • Representativeness is considered to be acceptable for all parameters, with the exception of many of the extremely low (below or near the practical quantitation limit [PQL]) results for VOCs, anions, and metals that have been qualified as undetected (“U”) due to associated contamination of laboratory method blanks or field blanks.
  • Completeness is considered to be acceptable for all parameters.

Some data quality issues were noted either in the laboratory case narratives or during the data validation process. Despite these issues, nearly all of the validated data were deemed usable for the intended purposes (only one result was rejected) based on this validation. The reader is directed to Appendix A for a detailed discussion of the data validation results.

4.3 WELL-SPECIFIC DATA PLOTS

Figures were prepared that present the concentrations of selected analytes in each well, as reported for each sampling method used and for each sampling depth (shallow, intermediate, and deep). These figures are included in Appendix B. Graphs were prepared for one VOC of concern (trichloroethene [TCE]), one anion (sulfate), one reduction-oxidation (redox)-sensitive metal (iron), one metal that is less redox-sensitive (zinc), 1,4 dioxane, and hexavalent chromium. Results for the three-volume purge are shown using a vertical line across all depths since that method is not depth specific. Results for the low-flow purge are shown as a single point located at the intermediate depth, despite uncertainty about the depth-discrete nature of a low-flow sample. When a low-flow sample is collected, determining the portion of the screened interval of the aquifer that contributed water to the sample can be problematic. As noted in Section 4.4, for instances where more than one value was available per comparison, the maximum value was used in the sampling results comparison.

4.4 SAMPLING RESULTS COMPARISON

Numerous potential methods of data evaluation are possible due to the relatively large amount of analytical data and number of comparisons. For this report, three different types of evaluation processes were used to compare the data sets:

  • Conventional statistical methods,
  • Other quantitative comparative tools, and
  • Holistic qualitative data evaluation.

Each of these processes was applied with the objective of identifying general trends or tendencies present in the data sets. After all of the processes were applied, overall conclusions related to sampler performance were made. In this comparative analysis, the results for each sampler type were compared to the corresponding results (i.e., same well, same depth, same analyte) for each of the other sampling methods. Additionally, the analytical data set was subdivided into the following six categories for comparison purposes:

  1. All data combined,
  2. 1,4 dioxane,
  3. Anions,
  4. Hexavalent chromium,
  5. Metals, and
  6. VOCs.

Each sampler-to-sampler comparison was performed for each of the analytical subgroups listed above, resulting in a total of 113 dataset comparisons. The quantitative evaluation processes described in Sections 4.4.1 and 4.4.2 (Conventional Statistical Methods and Other Quantitative Comparative Tools, respectively) were applied to each of these 113 comparison instances. The results of these quantitative comparisons were then considered using a holistic qualitative review to derive final conclusions about each specific comparison.

Prior to applying the statistical analysis tools, the datasets used for comparison were “pared down” through the application of several logical filters. These filters are described below.

  • Instances where both results being compared were not detected (e.g., TCE was not detected in both low-flow purge and PDB samples) were excluded from the data set.
  • If a result was qualified as non-detect (U) based on data validation only, it was excluded from the comparative analyses. This alleviates any concerns about skewing the dataset comparisons due to biases that may be caused by laboratory or field contamination.
  • One result was rejected based on the data validation; this value was not used in the statistical analyses.
  • For instances where more than one value was available per comparison (i.e., multidepth sampling versus single-point sampling, primary and duplicate samples), the maximum value was used in the statistical analyses.
  • For instances where a result was not detected (ND) at the method detection limit (MDL) using one sampling method, and the corresponding result using the other sampling method was a detected value, a value of one-half of the MDL was used for the ND measurement in the statistical analyses. This permitted use of a log-log scale to plot results, whereas if a value of zero had been assigned to the result instead, it would not plot on that type of scale. One exception to this filter was applied however. In circumstances where one result was ND and the other result being compared was detected but had a lower MDL, the comparison was excluded from the statistical analyses. This prevented the comparison from being biased, because one-half of the MDL for the non-detected analyte may have been greater than the detected result.
  • One PDB sample was analyzed for 1,4 dioxane to see if that compound would diffuse through the membrane. Although 1,4 dioxane was detected in other samples from the same well, it was not detected in the PDB sample. This indicates that the PDBS method is not suitable for monitoring 1,4 dioxane. Therefore, the PDBS results for 1,4 dioxane were excluded from the statistical analyses.

Tables 4.1 through 4.6 show the number of data pairs that were available for each comparison after all filters had been applied to the data set. For the quantitative evaluation processes, a lower confidence or meaning was ascribed to comparisons with fewer data pairs. In Tables 4.1 through 4.6, instances where less than 10 data pairs were available for a particular comparison were highlighted in red in the “Number of Comparisons” column.

TABLE 4.1
STATISTICAL SUMMARY - ALL DATA
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.1

TABLE 4.2
STATISTICAL SUMMARY - 1,4 DIOXANE
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.2

TABLE 4.3
STATISTICAL SUMMARY - ANIONS
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.3

TABLE 4.4
STATISTICAL SUMMARY - HEXAVALENT CHROMIUM
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.4

TABLE 4.5
STATISTICAL SUMMARY - METALS
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.5

TABLE 4.6
STATISTICAL SUMMARY - VOCs
NO-PURGE SAMPLER DEMONSTRATION
McCLELLAN AFB, CALIFORNIA
View TABLE 4.6

4.4.1 CONVENTIONAL STATISTICAL ANALYSES

The distribution of the data was evaluated in order select the most appropriate statistical methods to apply to the data. Conventional statistical methods were then selected and used to evaluate the data sets that were being compared.

4.4.1.1 DATA DISTRIBUTION

Each of the data sets was tested for normality in order to determine whether parametric or non-parametric statistical tests were appropriate for the data analysis. The Shapiro-Wilk's W test was used to determine data distribution. Several groupings of data were evaluated for normality as described below.

Initially, data sets for each of the eight different sampling methods were tested for each of the six different compound groupings (all data combined, 1,4 dioxane only, anions only, hexavalent chromium only, metals only, and VOCs only). Data used for normality testing in this application included both primary and field duplicate sample results; results that were not detected or rejected during data validation were excluded. Additionally, since the difference between two sampling methods was the end-use of the data for comparison purposes, the Shapiro-Wilk’s W test also was applied to the populations of differences between two sampling methods being compared. Data used for this variance of normality testing were taken from the “pared-down” data sets described in Section 4.4. As an example of this variance of normality testing, each time there was an available comparison between two sampling methods (e.g., all VOC concentrations at the shallow sample depth in well MW-1065 obtained using the PDBS and RCS sampling methods), the difference between those two concentrations was calculated. After the differences were calculated for all possible comparisons of analytical results obtained using those two sampling methods, the Shapiro-Wilk’s W test was then applied to that population (e.g., PDBS versus RCS for VOCs only). The results of all normality tests are included in Appendix C.

Tests for normality failed (i.e., data sets are not normally distributed) for almost all of the data subsets evaluated. Some exceptions were noted (see Appendix C) but were due mostly to small sample populations (e.g., 1,4 dioxane in the difference between PsMS and RPPS results). The overall lack of normally-distributed data sets supports the use of nonparametric statistical tools as described below.

4.4.1.2 WILCOXON MATCHED-PAIRS SIGNED RANKS TEST

The Wilcoxon Matched-Pairs Signed Ranks Test (Wilcoxon test) was applied to determine if two dependent variables (e.g., RPPS and HydraSleeve® analytical results) represent two different populations. The Wilcoxon test is a nonparametric procedure used in hypothesis testing when one or more of the assumptions of the students paired ttest (e.g., normal population distributions) are violated. The Wilcoxon test determines if the median of the differences between the pairs of data (e.g., the RPPS measurement and the HydraSleeve® measurement for a given well [ D ? ]) is equal to zero. If a significant difference is obtained, it indicates that there is a high likelihood that the two data sets represent different populations.

A test statistic (the Wilcoxon T statistic) is calculated and associated with a p-value (and corresponding confidence level). For example, a Wilcoxon T statistic that resulted in a p-value of 0.03 would correspond to a confidence level of 97 percent that the two samples represent two different populations.

Tables 4.1 through 4.6 include summaries of the results of the Wilcoxon test analyses. Values presented in these tables are the confidence level (i.e., 1 minus the p-value) that the two sampling methods represent different populations. For this analysis, if the confidence was greater than or equal to 90 percent, the two populations were deemed to be different at a statistically significant level, and are highlighted in yellow.

4.4.1.3 SIGN TEST

The sign test is a nonparametric alternative to the students paired t-test for dependent samples. The test is applicable to situations where the researcher has two measures (e.g., under two conditions) for each subject and wants to establish whether or not the two measurements (or conditions) are different.

The only assumption required by this test is that the underlying distribution of the variable of interest is continuous; no assumptions about the nature or shape of the underlying distribution are required. The test simply computes the number of times (across subjects) that the value of the first variable (A) is larger than that of the second variable (B). Under the null hypothesis, which states that the two variables are not different from each other, this is expected to be the case about 50 percent of the time. Based on the number of observed cases where A is greater than B, a p-value and associated confidence level can be calculated for the data set.

Tables 4.1 through 4.6 include summaries of the results of the sign test analyses. Values presented in these tables are the confidence that the two sampling methods represent different populations. For this analysis, if the confidence was greater than or equal to 90 percent, the two populations were deemed to be different at a statistically significant level, and are highlighted in yellow. The results of the sign test also indicate the percentage of times that non-equal values are greater than or less than the comparative set of values (i.e., the percent of times that values in population A were greater than values in population B). These values also are shown in Tables 4.1 through 4.6.

4.4.2 OTHER QUANTITATIVE COMPARATIVE TOOLS

In addition to the traditional statistical tools discussed in Section 4.4.1, two other quantitative tools were used to compare the various combinations of data sets.

4.4.2.1 LINEAR REGRESSION

The results for each sampling method were plotted against the corresponding results for each of the other sampling methods using X-Y scatter plots. Best-fit linear trend lines were then fitted to these data sets, and the slope and goodness-of-fit (R2) value for each line was calculated. Best-fit linear trend lines were fitted to each of the subgroups of compounds/analytes listed in Section 4.4. These plots are included in Appendix D. Slopes that are close to 1 suggest that the average correlation between both sampling devices being compared approaches a 1 to 1 ratio, whereas higher or lower slopes suggest that one sampling method is more likely to result in higher or lower concentrations than the other method. Likewise, the closer the R2 value is to 1, the better the fit of the data to the trend line, and the lower the degree of scatter of data about the best-fit linear trend line.

Tables 4.1 through 4.6 include summaries of the slope and R2 values for each of the figures discussed above. The slope and R2 values shown in Tables 4.1 through 4.6 are highlighted in yellow to indicate when the two populations being compared were deemed to not be similar to each other based on the magnitude of the slope. The following guidelines were followed when applying highlighting to these values:

  Slope Guidelines

  • If the slope was between 0.90 and 1.10, the two sets of sampling results were deemed to be similar.
  • If the slope was equal to or greater than 1.10, the sampling device represented on the Y-axis of the plot was deemed to be more likely to return a higher-magnitude result than the sampling device represented on the X-axis.
  • If the slope was equal to or less than 0.90, the sampling device represented on the X-axis of the plot was deemed to be more likely to return a higher-magnitude result than the sampling device represented on the Y-axis.

  R2 Guidelines

  • If the R2 value was greater than or equal to 0.90, the degree of scatter of the data relative to the best-fit linear trend line was deemed to be low; therefore the observation made based on the slope was considered more meaningful.
  • If the R2 value was less than 0.90, the degree of scatter of the data relative to the best-fit linear trend line was deemed to be significant; therefore the observation made based on the slope was considered less meaningful.

The threshold values described above were selected somewhat arbitrarily, but also were based on a qualitative review of the data as described in Section 4.4.3. The guidelines established for R2 values were used primarily in the qualitative evaluation.

4.4.2.2 MEDIAN RPD

Another quantitative analysis tool applied to the data sets is referred to as the median relative percent difference (RPD). The first step in this analysis is to calculate the RPD of each data pair using the following equation:

RPD = 100*[(A-B)/{(A+B)/2}]

Where:

A = Result from sampling method A; and

B = Result from sampling method B.

A positive RPD indicates that the result from sampling method A is higher than the result from sampling method B, while a negative RPD indicates the opposite. RPDs close to zero generally indicate that results from both sampling methods were similar. Once all the RPDs were calculated, the median of the RPDs for each data comparison group was calculated by ranking the RPD values from lowest to highest and choosing the middle value of the ranked set of calculated RPDs. If the number of RPD values was even, then the median was selected as the mean RPD of the middle two values. This median RPD was then used as an indicator of the comparability of the two sampling methods for each compound/analyte subset. A positive value for the median RPD indicated that sampling method A results were more frequently higher than sampling method B results (the reverse is true for negative values). Additionally, the closer the median RPD was to zero, the more likely the two sampling methods returned similar results (essentially, for every time sampling method A was greater than sampling method B, there were an equal number of times where sampling method B was greater than sampling method A). Conversely, if the median RPD was much greater than or less than zero, the more likely one sampling method was to return results that were significantly greater than the other method. For this analysis, a median RPD that was either greater than or equal to 10 or less than or equal to -10 was considered to indicate that one method was more likely to return a meaningfully higher (or lower) concentration than the other sampling method compared. Median RPD values between 10 and -10 were considered to indicate that both sampling methods returned similar concentrations. As with the guideline values described for the linear regression analysis, these values were selected somewhat arbitrarily, but also were based on a qualitative review of the data as described in Section 4.4.3.

Tables 4.1 through 4.6 include summaries of the results of the median RPD analysis. RPD results greater than or less than 10 are highlighted in yellow.

4.4.3 HOLISTIC QUALITATIVE ASSESSMENT

Each of the statistical analyses described above was applied to the 113 possible comparison combinations. Of the 113 possible comparison combinations, 26 (23 percent) had sufficiently small populations (i.e., fewer than 10 data pairs) that the results of the statistical analyses are not considered to be particularly meaningful. Of the remaining 87 combinations, there were 41 instances where both the conventional statistical and other quantitative comparison tests resulted in consistent observations. Conversely, there were 46 instances where the results of each of these tests were not internally consistent.

If the results of each of the four quantitative comparisons were consistent for a particular comparison (as shown in Tables 4.1 through 4.6 by having consistent highlighting of all four comparative test results), the resulting observation was validated and deemed correct without further review. The resulting observation is shown in Tables 4.1 through 4.6 under the column titled “Holistic Conclusion”.

For those instances where the results of the four quantitative analyses varied, the results of the two populations being compared were scrutinized qualitatively, and a general conclusion regarding the comparison was made based not only on the results of the statistical analyses, but also on professional judgment. For example, some of the following criteria were considered during the holistic qualitative evaluation.

  • The paired data sets were reviewed to identify whether outlier points may have contributed to anomalous comparison results.
  • The R2 value calculated as part of the linear regression was reviewed to evaluate the degree of confidence in the linear regression results.
  • The median RPD and linear regression results were compared to the threshold values derived for those comparative methods (i.e., 10 and -10 for median RPD and 0.90 to 1.10 for linear regression slope) to determine if the results were close to those values.

A discussion of the comparison results is presented in Section 6. However, if the reader is interested in better understanding the comparison results for particular analytes and/or sampling methods, they are encouraged to perform a detailed review of all of the comparison results presented in Tables 4.2 through 4.6 rather than limiting their review to the holistic conclusions. For example, the holistic conclusion for comparison of VOC concentrations obtained using the three-volume purge and HydraSleeve® methods is that these methods provided essentially equivalent results. The orange highlight indicates a lower degree of confidence in this conclusion because the results of all of the comparison tests were not internally consistent. Further inspection of the comparison results shown in Table 4.6 indicates that the Sign and Wilcoxon tests both indicated that the two data sets are statistically similar. However, the RPD and X-Y Scatter Slope/R2 tests both indicate differences in the data sets. The slope result (0.59) indicates that the VOC concentrations obtained using the 3-volume purge method tended to be higher than the concentrations obtained using the HydraSleeve®. However, the relatively low R2 value (0.50) indicates a high degree of scatter about the best-fit trend line and a correspondingly low confidence that the slope value is accurate and meaningful. Therefore, some comparisons termed “equal” in the holistic sense are more equal than others (i.e., “equal” defines a range of conditions rather than one specific condition).

The combined (i.e., “all data”) results presented in Table 4.1 are solely for illustration purposes as a way to provide a summary analysis of the entire evaluation. However, these summary data may be misleading when compared with the results for the individual analytes or analyte groups presented in Tables 4.2 through 4.6. Tables 4.2 through 4.6 should be used to evaluate a particular sampling method’s utility for a specific analyte or analyte group.

Report Cover   Table of Contents   Sec. 1   Sec. 2   Sec. 3   Sec. 4   Sec. 5   Sec. 6   Sec. 7   Sec. 8



HydraSleeve No-Purge Sampling  •  Passive Ground Water Sampling
Copyright © 2008
Sitemap