The Virtual Marchant Subsampler
Some further thoughts on subsampling after Walsh (1997)
By Chris Walsh
- Is the Marchant subsampler a truly random subsampler?
A few trials with the VMSS show that the Marchant box subsampler is not in
reality a perfect random sampler. The test of randomness used in Marchant
(1989) (which was also used in Walsh 1997 to show that the simulated
subsampler mimicked the real thing well) showed that 1% subsamples from the
subsampler are almost always distributed randomly. However, the
distribution for a given count actually becomes more even (i.e. 95%CLs
decrease) for larger percentage subsamples. For example take a standard
count of 16 animals:
| a sample contains |
subsample size |
mean count |
apprx 95%CL |
1600 |
1% |
16 |
50 |
320 |
5% |
16 |
49 |
160 |
10% |
16 |
47 |
80 |
20% |
16 |
43 |
50 |
32% |
16 |
40 |
32 |
50% |
16 |
35 |
20 |
80% |
16 |
20 |
(These CLs, which are expressed as percentage of the mean, are approximate
estimates based on trials with the virtual subsampler) Similar results
were had for an even smaller count
| a sample contains |
subsample size |
mean count |
apprx 95%CL |
50 |
1% |
5 |
95 |
100 |
5% |
5 |
92 |
50 |
10% |
5 |
88 |
25 |
20% |
5 |
80 |
15 |
33% |
5 |
72 |
10 |
50% |
5 |
60 |
- How does this finding affect the relative merits of fixed-number vs.
fixed-count subsampling?
Not much, it appears. For example, take the first set of hypothetical
samples above and apply a set percentage subsample of, say, 20%
| a sample contains |
subsample size |
mean count |
apprx 95%CL |
1600 |
20% |
320 |
8 |
320 |
20% |
64 |
24 |
160 |
20% |
32 |
35 |
80 |
20% |
16 |
43 |
50 |
20% |
10 |
70 |
32 |
20% |
6 |
80 |
20 |
20% |
4 |
100 |
This subsampling strategy requires picking 452 animals from the seven
samples, and produces a very large range of subsampling errors.
Subsampling to a fixed 16 animals required picking only 112 animals, and
resulted in a much more uniform range of errors.
Thus a set-number subsampling strategy controls subsampling error more
consistently than a set percentage (and with less picking effort involved),
even if the subsampler subsamples more evenly as percentage increases.
- Multivariate implications
I believe this control of error translates to the multivariate context well
because, within a group of replicate samples, the relative abundances of
the common taxa tend to be similar. Therefore, even if total abundances
vary greatly between replicates, the control of error for each taxon will
be constant within replicated treatments.
For example let's say we had four replicate samples with three species
| Replicate |
1 |
2 |
3 |
4 |
| sp. A |
5000 |
1000 |
500 |
100 |
| sp. B |
120 |
20 |
8 |
1 |
| sp. C |
600 |
100 |
38 |
4 |
| total abund |
5720 |
1120 |
546 |
114 |
And try a set 20% subsampling strategy [I'll show mean count(approx.
+/-95%CLs expressed as percentage of the mean)]
| Replicate |
1 |
2 |
3 |
4 |
| sp. A |
1000(20) |
200(10) |
100(20) |
20(40) |
| sp. B |
24(40) |
4(100) |
1.6(>300) |
0.2(>1000) |
| sp. C |
120(10) |
20(40) |
7.6(70) |
1.8(1000) |
(total number counted 1500)
As opposed to 10% or 300 (just as an example)
| Rep. |
1 |
2 |
3 |
4 |
| %SS needed |
10 |
27 |
55 |
100 |
| sp. A |
500(6) |
270(9) |
275(8) |
100(0) |
| sp. B |
12(70) |
5.4(75) |
4.4(80) |
1(0) |
| sp. C |
60(25) |
27(40) |
21(30) |
4(0) |
(total number counted 1030)
Both methods give an adequate estimate of abundances across all four
replicates for sp.A. The second method gives adequate estimates for sp.C
in all replicates, but 20% subsampling results in a very high error for
sp.C in replicate 4, and a moderately high error for replicate 3. It's
sp.B that's the problem. The probable range of estimated abundances for
sp.B from the two methods look like this:
| Rep. |
1 |
2 |
3 |
4 |
20% |
70-170 |
0-40 |
0-25 |
0-5 |
10%/300 |
40-200 |
4-33 |
2-16 |
1 |
true abund |
120 |
20 |
8 |
1 |
The 10%/300 method produces a smaller range of errors, and more
importantly, is less likely to miss less abundant taxa.
The results of this hypothetical exercise are consistent with the empirical
findings of Walsh (1997).
|