Skip to content | Change text size
WSC home Commercial analytical services About the WSC Strategic research Contract research Knowledge exchange Postgraduate students WSC publications Staff section Links Contact us
 

The Virtual Marchant Subsampler

Some further thoughts on subsampling after Walsh (1997)

By Chris Walsh

  1. Is the Marchant subsampler a truly random subsampler?
    A few trials with the VMSS show that the Marchant box subsampler is not in reality a perfect random sampler.  The test of randomness used in Marchant (1989) (which was also used in Walsh 1997 to show that the simulated subsampler mimicked the real thing well) showed that 1% subsamples from the subsampler are almost always distributed randomly.  However, the distribution for a given count actually becomes more even (i.e. 95%CLs decrease) for larger percentage subsamples.  For example take a standard count of 16 animals:
a sample contains subsample size mean count apprx 95%CL
1600
1%
16
50
320
5%
16
49
160
10%
16
47
80
20%
16
43
50
32%
16
40
32
50%
16
35
20
80%
16
20

(These CLs, which are expressed as percentage of the mean, are approximate estimates based on trials with the virtual subsampler)  Similar results were had for an even smaller count

a sample contains subsample size mean count apprx 95%CL
50
1%
5
95
100
5%
5
92
50
10%
5
88
25
20%
5
80
15
33%
5
72
10
50%
5
60

 

  1. How does this finding affect the relative merits of fixed-number vs. fixed-count subsampling?

    Not much, it appears.  For example, take the first set of hypothetical samples above and apply a set percentage  subsample of, say, 20%
a sample contains subsample size mean count apprx 95%CL
1600
20%
320
8
320
20%
64
24
160
20%
32
35
80
20%
16
43
50
20%
10
70
32
20%
6
80
20
20%
4
100

This subsampling strategy requires picking 452 animals from the seven samples, and produces a very large range of subsampling errors. Subsampling to a fixed 16 animals required picking only 112 animals, and resulted in a much more uniform range of errors.

Thus a set-number subsampling strategy controls subsampling error more consistently than a set percentage (and with less picking effort involved), even if the subsampler subsamples more evenly as percentage increases.

  1. Multivariate implications

I believe this control of error translates to the multivariate context well because, within a group of replicate samples, the relative abundances of the common taxa tend to be similar.  Therefore, even if total abundances vary greatly between replicates, the control of error for each taxon will be constant within replicated treatments.

For example let's say we had four replicate samples with three species

Replicate 1 2 3 4
sp. A
5000
1000
500
100
sp. B
120
20
8
1
sp. C
600
100
38
4
total abund
5720
1120
546
114

And try a set 20% subsampling strategy [I'll show mean count(approx. +/-95%CLs expressed as percentage of the mean)]

Replicate 1 2 3 4
sp. A 1000(20) 200(10) 100(20) 20(40)
sp. B 24(40) 4(100) 1.6(>300) 0.2(>1000)
sp. C 120(10) 20(40) 7.6(70) 1.8(1000)

(total number counted 1500)

As opposed to 10% or 300 (just as an example)

Rep. 1 2 3 4
%SS needed 10 27 55 100
sp. A 500(6) 270(9) 275(8) 100(0)
sp. B 12(70) 5.4(75) 4.4(80) 1(0)
sp. C 60(25) 27(40) 21(30) 4(0)

(total number counted 1030)

Both methods give an adequate estimate of abundances across all four replicates for sp.A.  The second method gives adequate estimates for sp.C in all replicates, but 20% subsampling results in a very high error for sp.C in replicate 4, and a moderately high error for replicate 3.  It's sp.B that's the problem.  The probable range of estimated abundances for sp.B from the two methods look like this:

Rep. 1 2 3 4
20%
70-170
0-40
0-25
0-5
10%/300
40-200
4-33
2-16
1
true abund
120
20
8
1

The 10%/300 method produces a smaller range of errors, and more importantly, is less likely to miss less abundant taxa.

The results of this hypothetical exercise are consistent with the empirical findings of Walsh (1997).