SAMPLING
MOULE ‐ I : Sampling
i) Concept of population and sample in
Qualitative, Quantitative and Mixed research
ii) Techniques of sampling‐ Probability and Non
probability sampling‐Different types.
(8 hours)
Introduction
The
quality of a piece of research stands or falls not only by the appropriateness
of methodology and instrumentation but also by the suitability of the sampling
strategy that has been adopted (see also Morrison 1993: 112–17). Researchers must take sampling decisions
early in the overall planning of a piece of research. Factors such as expense,
time, and accessibility frequently prevent researchers from gaining information
from the whole population. Therefore they often need to be able to obtain data
from a smaller group or subset of the total population in such a way that the
knowledge gained is representative of the total population (however defined)
under study. This smaller group or subset is the sample.
Experienced
researchers start with the total population and work down to the sample. By
contrast, less experienced researchers often work from the bottom up, that is,
they determine the minimum number of respondents needed to conduct the research
(Bailey 1978). However, unless they identify the total population in advance,
it is virtually impossible for them to assess how representative the sample is
that they have drawn.
Decisions
and problems face researchers in deciding the sampling strategy to be used.
Judgements have to be made about four
key factors in sampling:
1
the sample size
2
representativeness and parameters of the sample
3
access to the sample
4
the sampling strategy to be used.
The
decisions here will determine the sampling strategy to be used. This assumes
that a sample is actually required; there may be occasions on which the
researcher can access the whole population rather than a sample.
The sample size
A
question that often plagues novice researchers is just how large their samples
for the research should be. There is no clear-cut answer, for the correct
sample size depends on the purpose of the study and the nature of the
population under scrutiny. However, it is possible to give some advice on this
matter. Generally speaking, the larger the sample the better, as this not only
gives greater reliability but also enables more sophisticated statistics to be
used. Thus, a sample size of thirty
is held by many to be the minimum number of cases if researchers plan to use
some form of statistical analysis on their data, though this is a very small
number and we would advise very considerably more. Researchers need to think
out in advance of any data collection the sorts of relationships that they wish
to explore within subgroups of their eventual sample. The number of variables
researchers set out to control in their analysis and the types of statistical
tests that they wish to make must inform their decisions about sample size
prior to the actual research undertaking. Typically an anticipated minimum of thirty cases per variable
should be used as a ‘rule of thumb’, i.e. one must be assured of having a
minimum of thirty cases for each variable (of course, the thirty cases for
variable one could also be the same thirty as for variable two), though this is
a very low estimate indeed. This number rises rapidly if different subgroups of
the population are included in the sample. Further, depending on the kind of
analysis to be performed, some statistical tests will require larger samples.
For example, less us imagine that one wished to calculate the chi-square
statistic, with cross-tabulated data, for example looking at two subgroups of
stakeholders in a primary school containing sixty 10-year-old pupils and twenty
teachers and their responses to a question on a 5-point scale. Here one can
notice that the sample size is eighty cases, an apparently reasonably sized
sample. However, six of the ten cells of responses (60 per cent) contain fewer
than five cases.
Our original sample size
of 278 has now increased, very quickly, to 428. The message is very clear: the
greater the number of strata (subgroups), the larger the sample will be. Much
educational research concerns itself with strata rather than whole samples, so
the issue is significant. One can rapidly generate the need for a very large
sample.
If subgroups are
required then the same rules for calculating overall sample size applies to
each of the subgroups.
Further,
determining the size of the sample will also have to take account of
non-response, attrition and respondent mortality, i.e. some participants will
fail to return questionnaires, leave the research, and return incomplete or
spoiled questionnaires (e.g. missing out items, putting two ticks in a row of
choices instead of only one). Hence it is advisable to overestimate rather than
to underestimate the size of the sample required, to build in redundancy
(Gorard 2003: 60). Unless one has guarantees of access, response and, perhaps,
the researcher’s own presence at the time of conducting the research (e.g.
presence when questionnaires are being completed), then it might be advisable
to estimate up to double the size of required sample in order to allow for such
loss of clean and complete copies of questionnaires or responses.
In some circumstances, meeting the
requirements of sample size can be done on an evolutionary basis. For example,
let us imagine that you wish to sample 300 teachers, randomly selected. You
succeed in gaining positive responses from 250 teachers to, for example, a
telephone survey or a questionnaire survey, but you are 50 short of the
required number. The matter can be resolved simply by adding another 50 to the
random sample, and, if not all of these are successful, then adding some more
until the required number is reached.
Borg and Gall (1979: 195) suggest that, as a
general rule, sample sizes should be large where
there are
many variables
only small
differences or small relationships are expected or predicted
the sample
will be broken down into subgroups
the sample is
heterogeneous in terms of the variables under study
reliable
measures of the dependent variable are unavailable.
Oppenheim (1992: 44) adds to this the view
that the nature of the scales to be used also exerts an influence on the sample
size. For nominal data the sample sizes
may well have to be larger than for interval and ratio data (i.e. a variant
of the issue of the number of subgroups to be addressed, the greater the number
of subgroups or possible categories, the larger the sample will have to be).
Borg and Gall (1979) set out a formula driven approach to determining sample
size (see also Moser and Kalton 1977; Ross and Rust 1997: 427–38), and they
also suggest using correlational tables for correlational studies – available
in most texts on statistics – as it were ‘in reverse’ to determine sample size
(Borg and Gall 1979: 201), i.e. looking at the significance levels of
correlation coefficients and then reading off the sample sizes usually required
to demonstrate that level of significance. For example, a correlational
significance level of 0.01 would require a sample size of 10 if the estimated
coefficient of correlation is 0.65, or a sample size of 20 if the estimated
correlation coefficient is 0.45, and a sample size of 100 if the estimated
correlation coefficient is 0.20. Again, an inverse proportion can be seen – the
larger the sample population, the smaller the estimated correlation coefficient
can be to be deemed significant.
With both qualitative and quantitative data,
the essential requirement is that the sample is representative of the
population from which it is drawn. In a dissertation concerned with a life
history (i.e. n= 1), the sample is the population!
Qualitative data
In a qualitative study of thirty highly able girls
of similar socio-economic background following
an A level Biology course, a sample of five or
six may suffice the researcher who is prepared to
obtain additional corroborative data by way of
validation.
Where there is heterogeneity in the population,
then a larger sample must be selected on
some basis that respects that heterogeneity. Thus,
from a staff of sixty secondary school teachers
differentiated by gender, age, subject specialism,
management or classroom responsibility, etc., it
106 SAMPLING
would be insufficient to construct a sample consisting
of ten female classroom teachers of Arts
and Humanities subjects.
Quantitative data
For quantitative data, a precise sample number
can be calculated according to the level of accuracy
and the level of probability that researchers require
in their work. They can then report in their
study the rationale and the basis of their research
decisions (Blalock 1979).
By way of example, suppose a teacher/researcher
wishes to sample opinions among 1,000 secondary
school students. She intends to use a 10-point
scale ranging from 1 = totally unsatisfactory to
10 = absolutely fabulous. She already has data
from her own class of thirty students and suspects
that the responses of other students will be
broadly similar. Her own students rated the
activity (an extracurricular event) as follows: mean
score = 7.27;
standard deviation = 1.98. In other
words, her students were pretty much ‘bunched’
about a warm, positive appraisal on the 10-point
scale. How many of the 1,000 students does she
need to sample in order to gain an accurate (i.e.
reliable) assessment of what the whole school
(n = 1, 000) thinks of the extracurricular event?
It all depends on
what degree of accuracy and what level
of probability she
is willing to accept.
A simple calculation from a formula by Blalock
(1979: 215–18) shows that:
if she
is happy to be within + or− 0.5 of a scale
point and accurate 19 times out of 20, then she
requires a sample of 60 out of the 1,000;
if she
is happy to be within + or − 0.5 of a
scale point and accurate 99 times out of 100,
then she requires a sample of 104 out of the
1,000
if she
is happy to be within + or− 0.5 of a scale
point and accurate 999 times out of 1,000, then
she requires a sample of 170 out of the 1,000
if she
is a perfectionist and wishes to be within
+ or − 0.25 of a scale point and accurate 999
times out of 1,000, then she requires a sample
of 679 out of the 1,000.
It is clear that sample size is a matter of
judgement as well as mathematical precision; even
formula-driven approaches make it clear that there
are elements of prediction, standard error and
human judgement involved in determining sample
size.
Sampling
error
If many samples are taken from the same
population, it is unlikely that they will all have
characteristics identical with each other or with
the population; their means will be different. In
brief, there will be sampling error (see Cohen
and Holliday 1979, 1996). Sampling error is often
taken to be the difference between the sample
mean and the population mean. Sampling error
is not necessarily the result of mistakes made
in sampling procedures. Rather, variations may
occur due to the chance selection of different
individuals. For example, if we take a large
number of samples from the population and
measure the mean value of each sample, then
the sample means will not be identical. Some
will be relatively high, some relatively low, and
many will cluster around an average or mean value
of the samples. We show this diagrammatically in
Box 4.2 (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.4.ppt).
Why should this occur? We can explain the
phenomenon by reference to the Central Limit
Theorem which is derived from the laws of
probability. This states that if random large
samples of equal size are repeatedly drawn from
any population, then the mean of those samples
will be approximately normally distributed. The
distribution of sample means approaches the
normal distribution as the size of the sample
increases, regardless of the shape – normal or
otherwise – of the parent population (Hopkins
et al. 1996: 159, 388). Moreover, the average or
mean of the sample means will be approximately
the same as the population mean. Hopkins et al.
(1996: 159–62) demonstrate this by reporting
the use of computer simulation to examine the
sampling distribution of means when computed
10,000 times (a method that we discuss in
SAMPLING ERROR 107
Chapter 4
Box 4.2
Distribution of sample means showing the spread
of a selection of sample means around the
population mean
Ms Ms Ms Ms Mpop
Ms Ms Ms Ms
Mpop ! Population mean
Ms ! Sample means
Source: Cohen and Holliday 1979
Chapter 10). Rose and Sullivan (1993: 144)
remind us that 95 per cent of all sample means
fall between plus or minus 1.96 standard errors
of the sample and population means, i.e. that we
have a 95 per cent chance of having a single
sample mean within these limits, that the sample
mean will fall within the limits of the population
mean.
By drawing a large number of samples of equal
size from a population, we create a sampling
distribution. We can calculate the error involved
in such sampling (see http://www.routledge.
com/textbooks/9780415368780 – Chapter 4, file
4.5.ppt). The standard deviation of the theoretical
distribution of sample means is a measure of
sampling error and is called the standard error
of the mean (SEM). Thus,
SE =
SDs $N
where SDS = the standard deviation of
the sample
and N = the number in the sample.
Strictly speaking, the formula for the standard
error of the mean is:
SE =
SDpop
$N
where SDpop = the standard deviation of
the
population.
However, as we are usually unable to ascertain the
SD of the total population, the standard deviation
of the sample is used instead. The standard error
of the mean provides the best estimate of the
sampling error. Clearly, the sampling error depends
on the variability (i.e. the heterogeneity) in the
population as measured by SDpop as well as the
sample size (N) (Rose and Sullivan 1993: 143).
The smaller the SDpop the smaller the sampling
error; the larger the N, the smaller the sampling
error. Where the SDpop is very large, then N
needs to be very large to counteract it. Where
SDpop is very small, then N, too, can be small
and still give a reasonably small sampling error.
As the sample size increases the sampling error
decreases. Hopkins et al. (1996: 159) suggest that,
unless there are some very unusual distributions,
samples of twenty-five or greater usually yield a
normal sampling distribution of the mean. For
further analysis of steps that can be taken to cope
with the estimation of sampling in surveys we refer
the reader to Ross and Wilson (1997).
The standard error of proportions
We said earlier that one answer to ‘How big a
sample must I obtain?’ is ‘How accurate do I want
my results to be?’ This is well illustrated in the
following example:
A school principal finds that the 25 students she talks
to at random are reasonably in favour of a proposed
change in the lunch break hours, 66 per cent being in
favour and 34 per cent being against. How can she be
sure that these proportions are truly representative of
the whole school of 1,000 students?
A simple calculation of the standard error of
proportions provides the principal with her answer.
SE = !P × Q
N
where
P = the
percentage in favour
Q = 100
per cent − P
N = the
sample size
108 SAMPLING
The formula assumes that each sample is drawn
on a simple random basis. A small correction factor
called the finite population correction (fpc) is
generally applied as follows:
SE of proportions = !(1 − f)P × Q
N
where f is the
proportion included in the sample.
Where, for example, a sample is 100 out of 1,000,
f is 0.1.
SE of proportions = !(1 − 0.1)(66 × 34)
100 = 4.49
With a sample of twenty-five, the SE = 9.4. In
other words, the favourable vote can vary between
56.6 per cent and 75.4 per cent; likewise, the unfavourable
vote can vary between 43.4 per cent
and 24.6 per cent. Clearly, a voting possibility
ranging from 56.6 per cent in favour to 43.4 per
cent against is less decisive than 66 per cent as opposed
to 34 per cent. Should the school principal
enlarge her sample to include 100 students, then
the SE becomes 4.5 and the variation in the range
is reduced to 61.5 per cent−70.5 per cent in favour
and 38.5 per cent−29.5 per cent against.Sampling
the whole school’s opinion (n = 1, 000) reduces
the SE to 1.5 and the ranges to 64.5 per cent−67.5
per cent in favour and 35.5 per cent−32.5 per cent
against. It is easy to see why political opinion surveys
are often based upon sample sizes of 1,000 to
1,500 (Gardner 1978).
What is being suggested here generally is that,
in order to overcome problems of sampling error,
in order to ensure that one can separate random
effects and variation from non-random effects,
and in order for the power of a statistic to be
felt, one should opt for as large a sample as
possible. As Gorard (2003: 62) says, ‘power is an
estimate of the ability of the test you are using
to separate the effect size from random variation’,
and a large sample helps the researcher to achieve
statistical power. Samples of fewer than thirty are
dangerously small, as they allow the possibility of
considerable standard error, and, for over around
eighty cases, any increases to the sample size have
little effect on the standard error.
The
representativeness of the sample
The researcher will need to consider the extent
to which it is important that the sample in fact
represents the whole population in question (in
the example above, the 1,000 students), if it is
to be a valid sample. The researcher will need
to be clear what it is that is being represented,
i.e. to set the parameter characteristics of the
wider population – the sampling frame – clearly
and correctly. There is a popular example of
how poor sampling may be unrepresentative and
unhelpful for a researcher. A national newspaper
reports that one person in every two suffers
from backache; this headline stirs alarm in every
doctor’s surgery throughout the land. However,
the newspaper fails to make clear the parameters
of the study which gave rise to the headline.
It turns out that the research took place in a
damp part of the country where the incidence
of backache might be expected to be higher
than elsewhere, in a part of the country which
contained a disproportionate number of elderly
people, again who might be expected to have more
backaches than a younger population, in an area
of heavy industry where the working population
might be expected to have more backache than
in an area of lighter industry or service industries,
and used only two doctors’ records, overlooking
the fact that many backache sufferers went to
those doctors’ surgeries because the two doctors
concerned were known to be overly sympathetic
to backache sufferers rather than responsibly
suspicious.
These four variables – climate, age group,
occupation and reported incidence – were seen
to exert a disproportionate effect on the study,
i.e. if the study were to have been carried
out in an area where the climate, age group,
occupation and reporting were to have been
different, then the results might have been
different. The newspaper report sensationally
generalized beyond the parameters of the data,
thereby overlooking the limited representativeness
of the study.
It is important to consider adjusting the
weightings of subgroups in the sample once the
THE ACCESS TO THE SAMPLE 109
Chapter 4
data have been collected. For example, in a
secondary school where half of the students are
male and half are female, consider pupils’ responses
to the question ‘How far does your liking of the
form teacher affect your attitude to work?’
Variable: How far does your liking of the form
teacher affect your attitude to school work?
Very A Some- Quite A very
little little what a lot great
deal
Male 10 20 30 25 15
Female 50 80 30 25 15
Total 60 100 60 50 30
Let us say that we are interested in the attitudes
according to the gender of the respondents, as well
as overall. In this example one could surmise that
generally the results indicate that the liking of the
form teacher has only a small to moderate effect
on the students’ attitude to work. However, we
have to observe that twice as many girls as boys
are included in the sample, and this is an unfair
representation of the population of the school,
which comprises 50 per cent girls and 50 per cent
boys, i.e. girls are over-represented and boys are
under-represented. If one equalizes the two sets
of scores by gender to be closer to the school
population (either by doubling the number of boys
or halving the number of girls) then the results
look very different.
Variable: How far does your liking of the form
teacher affect your attitude to school work?
Very A Some- Quite A very
little little what a lot great
deal
Male 20 40 60 50 30
Female 50 80 30 25 15
Total 70 120 90 75 45
In this latter case a much more positive picture is
painted, indicating that the students regard their
liking of the form teacher as a quite important
feature in their attitude to school work. Here
equalizing the sample to represent more fairly
the population by weighting yields a different
picture. Weighting the results is an important
consideration.
The access to
the sample
Access is a key issue and is an early factor that must
be decided in research. Researchers will need to
ensure that access is not only permitted but also, in
fact, practicable. For example, if a researcher were
to conduct research into truancy and unauthorized
absence from school, and decided to interview
a sample of truants, the research might never
commence as the truants, by definition, would not
be present! Similarly access to sensitive areas might
be not only difficult but also problematical both
legally and administratively, for example, access
to child abuse victims, child abusers, disaffected
students, drug addicts, school refusers, bullies and
victims of bullying. In some sensitive areas access
to a sample might be denied by the potential
sample participants themselves, for example AIDS
counsellors might be so seriously distressed by their
work that they simply cannot face discussing with
a researcher the subject matter of their traumatic
work; it is distressing enough to do the job without
living through it again with a researcher.
Access might also be denied by the potential
sample participants themselves for very practical
reasons, for example a doctor or a teacher
simply might not have the time to spend with
the researcher. Further, access might be denied
by people who have something to protect, for
example a school which has recently received
a very poor inspection result or poor results on
external examinations, or people who have made
an important discovery or a new invention and
who do not wish to disclose the secret of their
success; the trade in intellectual property has
rendered this a live issue for many researchers.
There are very many reasons that might prevent
access to the sample, and researchers cannot afford
to neglect this potential source of difficulty in
planning research.
In many cases access is guarded by ‘gatekeepers’
– people who can control researchers’ access to
those whom they really want to target. For school
staff this might be, for example, headteachers,
110 SAMPLING
school governors, school secretaries, form teachers;
for pupils this might be friends, gang members,
parents, social workers and so on. It is critical
for researchers to consider not only whether
access is possible but also how access will be
undertaken – to whom does one have to go, both
formally and informally, to gain access to the target
group.
Not only might access be difficult but also
its corollary – release of information – might be
problematic. For example, a researcher might gain
access to a wealth of sensitive information and
appropriate people, but there might be a restriction
on the release of the data collection; in the field
of education in the UK reports have been known
to be suppressed, delayed or ‘doctored’. It is not
always enough to be able to ‘get to’ the sample, the
problem might be to ‘get the information out’ to
the wider public, particularly if it could be critical
of powerful people.
The sampling
strategy to be used
There are two main methods of sampling (Cohen
and Holliday 1979; 1982; 1996; Schofield 1996).
The researcher must decide whether to opt for
a probability (also known as a random sample)
or a non-probability sample (also known as a
purposive sample). The difference between them
is this: in a probability sample the chances of
members of the wider population being selected
for the sample are known, whereas in a nonprobability
sample the chances of members of the
wider population being selected for the sample
are unknown. In the former (probability sample)
every member of the wider population has an
equal chance of being included in the sample;
inclusion or exclusion from the sample is a matter
of chance and nothing else. In the latter (nonprobability
sample) some members of the wider
population definitely will be excluded and others
definitely included (i.e. every member of the wider
population does not have an equal chance of being
included in the sample). In this latter type the
researcher has deliberately – purposely – selected
a particular section of the wider population to
include in or exclude from the sample.
Probability
samples
A probability sample, because it draws randomly
from the wider population, will be useful if the
researcher wishes to be able to make generalizations,
because it seeks representativeness of the
wider population. It also permits two-tailed tests
to be administered in statistical analysis of quantitative
data. Probability sampling is popular in
randomized controlled trials. On the other hand,
a non-probability sample deliberately avoids representing
the wider population; it seeks only to
represent a particular group, a particular named
section of the wider population, such as a class
of students, a group of students who are taking
a particular examination, a group of teachers
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.6.ppt).
A probability sample will have less risk of
bias than a non-probability sample, whereas,
by contrast, a non-probability sample, being
unrepresentative of the whole population, may
demonstrate skewness or bias. (For this type of
sample a one-tailed test will be used in processing
statistical data.) This is not to say that the former is
bias free; there is still likely to be sampling error in a
probability sample (discussed below), a feature that
has to be acknowledged, for example opinion polls
usually declare their error factors, e.g. ±3 per cent.
There are several types of probability samples:
simple random samples; systematic samples; stratified
samples; cluster samples; stage samples, and
multi-phase samples. They all have a measure of
randomness built into them and therefore have a
degree of generalizability.
Simple random sampling
In simple random sampling, each member of the
population under study has an equal chance of
being selected and the probability of a member
of the population being selected is unaffected
by the selection of other members of
the population, i.e. each selection is entirely
independent of the next. The method involves
selecting at random from a list of the population
(a sampling frame) the required number of
PROBABILITY SAMPLES 111
Chapter 4
subjects for the sample. This can be done by
drawing names out of a container until the required
number is reached, or by using a table of
random numbers set out in matrix form (these
are reproduced in many books on quantitative
research methods and statistics), and allocating
these random numbers to participants or cases
(e.g. Hopkins et al. 1996: 148–9). Because of
probability and chance, the sample should contain
subjects with characteristics similar to the
population as a whole; some old, some young,
some tall, some short, some fit, some unfit,
some rich, some poor etc. One problem associated
with this particular sampling method
is that a complete list of the population is
needed and this is not always readily available
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.7.ppt).
Systematic sampling
This method is a modified form of simple random
sampling. It involves selecting subjects from a
population list in a systematic rather than a
random fashion. For example, if from a population
of, say, 2,000, a sample of 100 is required,
then every twentieth person can be selected.
The starting point for the selection is chosen at
random (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.8.ppt).
One can decide how frequently to make
systematic sampling by a simple statistic – the total
number of the wider population being represented
divided by the sample size required:
f =
N
sn
f = frequency
interval
N = the
total number of the wider population
sn = the
required number in the sample.
Let us say that the researcher is working with a
school of 1,400 students; by looking at the table
of sample size (Box 4.1) required for a random
sample of these 1,400 students we see that 302
students are required to be in the sample. Hence
the frequency interval (f) is:
1, 400
302 = 4.635 (which rounds up to 5.0)
Hence the researcher would pick out every fifth
name on the list of cases.
Such a process, of course, assumes that the
names on the list themselves have been listed in a
random order. A list of females and males might
list all the females first, before listing all the males;
if there were 200 females on the list, the researcher
might have reached the desired sample size before
reaching that stage of the list which contained
males, thereby distorting (skewing) the sample.
Another example might be where the researcher
decides to select every thirtieth person identified
from a list of school students, but it happens that:
(a) the school has just over thirty students in each
class; (b) each class is listed from high ability to
low ability students; (c) the school listing identifies
the students by class.
In this case, although the sample is drawn
from each class, it is not fairly representing the
whole school population since it is drawing almost
exclusively on the lower ability students. This is
the issue of periodicity (Calder 1979). Not only is
there the question of the order in which names
are listed in systematic sampling, but also there
is the issue that this process may violate one of
the fundamental premises of probability sampling,
namely that every person has an equal chance
of being included in the sample. In the example
above where every fifth name is selected, this
guarantees that names 1–4, 6–9 etc. will be
excluded, i.e. everybody does not have an equal
chance to be chosen. The ways to minimize this
problem are to ensure that the initial listing is
selected randomly and that the starting point for
systematic sampling is similarly selected randomly.
Stratified sampling
Stratified sampling involves dividing the population
into homogenous groups, each group
containing subjects with similar characteristics.
For example, group A might contain males and
group B, females. In order to obtain a sample
representative of the whole population in
112 SAMPLING
terms of sex, a random selection of subjects
from group A and group B must be taken. If
needed, the exact proportion of males to females
in the whole population can be reflected
in the sample. The researcher will have to identify
those characteristics of the wider population
which must be included in the sample, i.e. to
identify the parameters of the wider population.
This is the essence of establishing the sampling
frame (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.9.ppt).
To organize a stratified random sample is a
simple two-stage process. First, identify those
characteristics that appear in the wider population
that must also appear in the sample, i.e. divide
the wider population into homogenous and, if
possible, discrete groups (strata), for example
males and females. Second, randomly sample
within these groups, the size of each group
being determined either by the judgement of
the researcher or by reference to Boxes 4.1
or 4.2.
The decision on which characteristics to include
should strive for simplicity as far as possible, as
the more factors there are, not only the more
complicated the sampling becomes, but often the
larger the sample will have to be to include
representatives of all strata of the wider population.
A stratified random sample is, therefore, a
useful blend of randomization and categorization,
thereby enabling both a quantitative and
qualitative piece of research to be undertaken.
A quantitative piece of research will be able
to use analytical and inferential statistics, while
a qualitative piece of research will be able to
target those groups in institutions or clusters of
participants who will be able to be approached to
participate in the research.
Cluster sampling
When the population is large and widely dispersed,
gathering a simple random sample poses
administrative problems. Suppose we want to survey
students’ fitness levels in a particularly large
community or across a country. It would be completely
impractical to select students randomly
and spend an inordinate amount of time travelling
about in order to test them. By cluster sampling,
the researcher can select a specific number of
schools and test all the students in those selected
schools, i.e. a geographically close cluster is sampled
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.10.ppt).
One would have to be careful to ensure that
cluster sampling does not build in bias. For
example, let us imagine that we take a cluster
sample of a city in an area of heavy industry or
great poverty; this may not represent all kinds of
cities or socio-economic groups, i.e. there may be
similarities within the sample that do not catch
the variability of the wider population. The issue
here is one of representativeness; hence it might be
safer to take several clusters and to sample lightly
within each cluster, rather to take fewer clusters
and sample heavily within each.
Cluster samples are widely used in small-scale
research. In a cluster sample the parameters of the
wider population are often drawn very sharply; a
researcher, therefore, would have to comment on
the generalizability of the findings. The researcher
may also need to stratify within this cluster sample
if useful data, i.e. those which are focused and
which demonstrate discriminability, are to be
acquired.
Stage sampling
Stage sampling is an extension of cluster sampling.
It involves selecting the sample in stages, that
is, taking samples from samples. Using the large
community example in cluster sampling, one type
of stage sampling might be to select a number of
schools at random, and from within each of these
schools, select a number of classes at random,
and from within those classes select a number of
students.
Morrison (1993: 121–2) provides an example
of how to address stage sampling in practice. Let
us say that a researcher wants to administer a
questionnaire to all 16-year-old pupils in each
of eleven secondary schools in one region. By
contacting the eleven schools she finds that there
are 2,000 16-year-olds on roll. Because of questions
NON-PROBABILITY SAMPLES 113
Chapter 4
of confidentiality she is unable to find out the
names of all the students so it is impossible to
draw their names out of a container to achieve
randomness (and even if she had the names, it
would be a mind-numbing activity to write out
2,000 names to draw out of a container!). From
looking at Box 4.1 she finds that, for a random
sample of the 2,000 students, the sample size is
322 students. How can she proceed?
The first stage is to list the eleven schools on
a piece of paper and then to write the names of
the eleven schools on to small cards and place
each card in a container. She draws out the first
name of the school, puts a tally mark by the
appropriate school on her list and returns the
card to the container. The process is repeated 321
times, bringing the total to 322. The final totals
might appear thus:
School 1 2 3 4 5 6 7 8 9 10 11 Total
Required no.
of students 22 31 32 24 29 20 35 28 32 38 31 322
For the second stage the researcher then
approaches the eleven schools and asks each of
them to select randomly the required number of
students for each school. Randomness has been
maintained in two stages and a large number
(2,000) has been rendered manageable. The
process at work here is to go from the general to
the specific, the wide to the focused, the large to
the small. Caution has to be exercised here, as the
assumption is that the schools are of the same size
and are large; that may not be the case in practice,
in which case this strategy may be inadvisable.
Multi-phase sampling
In stage sampling there is a single unifying purpose
throughout the sampling. In the previous example
the purpose was to reach a particular group of
students from a particular region. In a multi-phase
sample the purposes change at each phase, for
example, at phase one the selection of the sample
might be based on the criterion of geography
(e.g. students living in a particular region); phase
two might be based on an economic criterion
(e.g. schools whose budgets are administered in
markedly different ways); phase three might be
based on a political criterion (e.g. schools whose
students are drawn from areas with a tradition
of support for a particular political party), and
so on. What is evident here is that the sample
population will change at each phase of the research
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.11.ppt).
Non-probability
samples
The selectivity which is built into a nonprobability
sample derives from the researcher
targeting a particular group, in the full knowledge
that it does not represent the wider population; it
simply represents itself. This is frequently the case
in small-scale research, for example, as with one
or two schools, two or three groups of students, or
a particular group of teachers, where no attempt
to generalize is desired; this is frequently the case
for some ethnographic research, action research
or case study research (see http://www.routledge.
com/textbooks/9780415368780 – Chapter 4, file
4.12.ppt). Small-scale research often uses nonprobability
samples because, despite the disadvantages
that arise from their non-representativeness,
they are far less complicated to set up, are considerably
less expensive, and can prove perfectly
adequate where researchers do not intend to generalize
their findings beyond the sample in question,
or where they are simply piloting a questionnaire
as a prelude to the main study.
Just as there are several types of probability sample,
so there are several types of non-probability
sample: convenience sampling, quota sampling,
dimensional sampling, purposive sampling and
snowball sampling. Each type of sample seeks only
to represent itself or instances of itself in a similar
population, rather than attempting to represent
the whole, undifferentiated population.
Convenience sampling
Convenience sampling – or, as it is sometimes
called, accidental or opportunity sampling –
involves choosing the nearest individuals to serve
as respondents and continuing that process until
114 SAMPLING
the required sample size has been obtained or
those who happen to be available and accessible
at the time. Captive audiences such as students or
student teachers often serve as respondents based
on convenience sampling. Researchers simply
choose the sample from those to whom they
have easy access. As it does not represent any
group apart from itself, it does not seek to
generalize about the wider population; for a
convenience sample that is an irrelevance. The
researcher, of course, must take pains to report
this point – that the parameters of generalizability
in this type of sample are negligible. A
convenience sample may be the sampling strategy
selected for a case study or a series of case
studies (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.13.ppt).
Quota sampling
Quota sampling has been described as the
non-probability equivalent of stratified sampling
(Bailey 1978). Like a stratified sample, a
quota sample strives to represent significant characteristics
(strata) of the wider population; unlike
stratified sampling it sets out to represent these
in the proportions in which they can be found
in the wider population. For example, suppose
that the wider population (however defined) were
composed of 55 per cent females and 45 per cent
males, then the sample would have to contain 55
per cent females and 45 per cent males; if the
population of a school contained 80 per cent of
students up to and including the age of 16 and
20 per cent of students aged 17 and over, then
the sample would have to contain 80 per cent of
students up to the age of 16 and 20 per cent of students
aged 17 and above. A quota sample, then,
seeks to give proportional weighting to selected
factors (strata) which reflects their weighting in
which they can be found in the wider population
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.14.ppt). The
researcher wishing to devise a quota sample can
proceed in three stages:
1 Identifythosecharacteristics(factors)which
appear in the wider population which must
also appear in the sample, i.e. divide the wider
population into homogenous and, if possible,
discrete groups (strata), for example, males
and females, Asian, Chinese and African
Caribbean.
2 Identifytheproportionsinwhichtheselected
characteristics appear in the wider population,
expressed as a percentage.
3 Ensure that the percentaged proportions of
the characteristics selected from the wider
population appear in the sample.
Ensuring correct proportions in the sample may
be difficult to achieve if the proportions in the
wider community are unknown or if access to the
sample is difficult; sometimes a pilot survey might
be necessary in order to establish those proportions
(and even then sampling error or a poor response
rate might render the pilot data problematical).
It is straightforward to determine the minimum
number required in a quota sample. Let us say that
the total number of students in a school is 1,700,
made up thus:
Performing arts 300 students
Natural sciences 300 students
Humanities 600 students
Business and Social Sciences 500 students
The proportions being 3:3:6:5, a minimum of 17
students might be required (3 + 3 + 6 + 5) for
the sample. Of course this would be a minimum
only, and it might be desirable to go higher than
this. The price of having too many characteristics
(strata) in quota sampling is that the minimum
number in the sample very rapidly could become
very large, hence in quota sampling it is advisable
to keep the numbers of strata to a minimum. The
larger the number of strata, the larger the number
in the sample will become, usually at a geometric
rather than an arithmetic rate of progression.
Purposive sampling
In purposive sampling, often (but by no means
exclusively) a feature of qualitative research,
researchers handpick the cases to be included
in the sample on the basis of their judgement
of their typicality or possession of the particular
NON-PROBABILITY SAMPLES 115
Chapter 4
characteristics being sought. In this way, they build
up a sample that is satisfactory to their specific
needs. As its name suggests, the sample has been
chosen for a specific purpose, for example: a group
of principals and senior managers of secondary
schools is chosen as the research is studying the
incidence of stress among senior managers; a group
of disaffected students has been chosen because
they might indicate most distinctly the factors
which contribute to students’ disaffection (they
are critical cases, akin to ‘critical events’ discussed
in Chapter 18, or deviant cases – those cases which
go against the norm: (Anderson and Arsenault
1998: 124); one class of students has been selected
to be tracked throughout a week in order to report
on the curricular and pedagogic diet which is
offered to them so that other teachers in the
school might compare their own teaching to that
reported. While it may satisfy the researcher’s
needs to take this type of sample, it does not
pretend to represent the wider population; it
is deliberately and unashamedly selective and
biased (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.15.ppt).
In many cases purposive sampling is used in
order to access ‘knowledgeable people’, i.e. those
who have in-depth knowledge about particular
issues, maybe by virtue of their professional
role, power, access to networks, expertise or
experience (Ball 1990). There is little benefit
in seeking a random sample when most of
the random sample may be largely ignorant of
particular issues and unable to comment on
matters of interest to the researcher, in which
case a purposive sample is vital. Though they may
not be representative and their comments may not
be generalizable, this is not the primary concern
in such sampling; rather the concern is to acquire
in-depth information from those who are in a
position to give it.
Another variant of purposive sampling is the
boosted sample. Gorard (2003: 71) comments on
the need to use a boosted sample in order to include
those who may otherwise be excluded from, or
under-represented in, a sample because there are
so few of them. For example, one might have a very
small number of special needs teachers or pupils in
a primary school or nursery, or one might have a
very small number of children from certain ethnic
minorities in a school, such that they may not
feature in a sample. In this case the researcher will
deliberately seek to include a sufficient number of
them to ensure appropriate statistical analysis or
representation in the sample, adjusting any results
from them, through weighting, to ensure that they
are not over-represented in the final results. This
is an endeavour, perhaps, to reach and meet the
demands of social inclusion.
A further variant of purposive sampling
is negative case sampling. Here the researcher
deliberately seeks those people who might
disconfirm the theories being advanced (the
Popperian equivalent of falsifiability), thereby
strengthening the theory if it survives such
disconfirming cases. A softer version of negative
case sampling is maximum variation sampling,
selecting cases from as diverse a population as
possible (Anderson and Arsenault 1998: 124) in
order to ensure strength and richness to the data,
their applicability and their interpretation. In this
latter case, it is almost inevitable that the sample
size will increase or be large.
Dimensional sampling
One way of reducing the problem of sample size in
quota sampling is to opt for dimensional sampling.
Dimensional sampling is a further refinement of
quota sampling. It involves identifying various
factors of interest in a population and obtaining
at least one respondent of every combination of
those factors. Thus, in a study of race relations,
for example, researchers may wish to distinguish
first, second and third generation immigrants.
Their sampling plan might take the form of a
multidimensional table with ‘ethnic group’ across
the top and ‘generation’ down the side. A second
example might be of a researcher who may be interested
in studying disaffected students, girls and
secondary-aged students and who may find a single
disaffected secondary female student, i.e. a respondent
who is the bearer of all of the sought characteristics
(see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.16.ppt).
116 SAMPLING
Snowball sampling
In snowball sampling researchers identify a small
number of individuals who have the characteristics
in which they are interested. These people are
then used as informants to identify, or put the
researchers in touch with, others who qualify
for inclusion and these, in turn, identify yet
others – hence the term snowball sampling. This
method is useful for sampling a population where
access is difficult, maybe because it is a sensitive
topic (e.g. teenage solvent abusers) or where
communication networks are undeveloped (e.g.
where a researcher wishes to interview stand-in
‘supply’ teachers – teachers who are brought in
on an ad-hoc basis to cover for absent regular
members of a school’s teaching staff – but finds
it difficult to acquire a list of these stand-in
teachers), or where an outside researcher has
difficulty in gaining access to schools (going
through informal networks of friends/acquaintance
and their friends and acquaintances and so on
rather than through formal channels). The task for
the researcher is to establish who are the critical or
key informants with whom initial contact must be
made (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 4, file 4.17.ppt).
Volunteer sampling
In cases where access is difficult, the researcher may
have to rely on volunteers, for example, personal
friends, or friends of friends, or participants who
reply to a newspaper advertisement, or those who
happen to be interested from a particular school,
or those attending courses. Sometimes this is
inevitable (Morrison 2006), as it is the only kind
of sampling that is possible, and it may be better
to have this kind of sampling than no research
at all.
In these cases one has to be very cautious
in making any claims for generalizability or
representativeness, as volunteers may have a range
of different motives for volunteering, e.g. wanting
to help a friend, interest in the research, wanting
to benefit society, an opportunity for revenge on a
particular school or headteacher. Volunteers may
be well intentioned, but they do not necessarily
represent the wider population, and this would
have to be made clear.
Theoretical sampling
This is a feature of grounded theory. In grounded
theory the sample size is relatively immaterial, as
one works with the data that one has. Indeed
grounded theory would argue that the sample size
could be infinitely large, or, as a fall-back position,
large enough to saturate the categories and issues,
such that new data will not cause the theory that
has been generated to be modified.
Theoretical sampling requires the researcher
to have sufficient data to be able to generate
and ‘ground’ the theory in the research context,
however defined, i.e. to create a theoretical
explanation of what is happening in the situation,
without having any data that do not fit the theory.
Since the researcher will not know in advance
how much, or what range of data will be required,
it is difficult, to the point of either impossibility,
exhaustion or time limitations, to know in advance
the sample size required. The researcher proceeds
in gathering more and more data until the theory
remains unchanged or until the boundaries of
the context of the study have been reached,
until no modifications to the grounded theory are
made in light of the constant comparison method.
Theoretical saturation (Glaser and Strauss 1967:
61) occurs when no additional data are found that
advance, modify, qualify, extend or add to the
theory developed.
Glaser and Strauss (1967) write that
theoretical sampling is the process of data collection
for generating theory whereby the analyst jointly
collects, codes, and analyzes his [sic.] data and decides
what data to collect next and where to find them, in
order to develop his theory as it emerges.
(Glaser and Strauss 1967: 45)
The two key questions, for the grounded theorist
using theoretical sampling are, first, to which
CONCLUSION 117
Chapter 4
groups does one turn next for data? Second, for
what theoretical purposes does one seek further
data? In response to the first, Glaser and Strauss
(1967: 49) suggest that the decision is based on
theoretical relevance, i.e. those groups that will
assist in the generation of as many properties and
categories as possible.
Hence the size of the data setmay be fixed by the
number of participants in the organization, or the
number of people to whom one has access, but
the researcher has to consider that the door may
have to be left open for him/her to seek further
data in order to ensure theoretical adequacy and to
check what has been found so far with further data
(Flick et al. 2004: 170). In this case it is not always
possible to predict at the start of the research just
how many, and who, the research will need for the
sampling; it becomes an iterative process.
Non-probability samples also reflect the issue
that sampling can be of people but it can also
be of issues. Samples of people might be selected
because the researcher is concerned to address
specific issues, for example, those students who
misbehave, those who are reluctant to go to school,
those with a history of drug dealing, those who
prefer extra-curricular to curricular activities. Here
it is the issue that drives the sampling, and so the
question becomes not only ‘whom should I sample’
but also ‘what should I sample’ (Mason 2002:
127–32). In turn this suggests that it is not only
people who may be sampled, but texts, documents,
records, settings, environments, events, objects,
organizations, occurrences, activities and so on.
Planning a
sampling strategy
There are several steps in planning the sampling
strategy:
1 Decidewhetheryouneedasample,orwhether
it is possible to have the whole population.
2 Identifythepopulation,itsimportantfeatures
(the sampling frame) and its size.
3 Identify the kind of sampling strategy you
require (e.g. which variant of probability and
non-probability sample you require).
4 Ensurethataccesstothesampleisguaranteed.
If not, be prepared to modify the sampling
strategy (step 2).
5 For probability sampling, identify the confidence
level and confidence intervals that you
require.
For non-probability sampling, identify the
people whom you require in the sample.
6 Calculatethenumbersrequiredinthesample,
allowing for non-response, incomplete or
spoiled responses, attrition and sample
mortality, i.e. build in redundancy.
7 Decide how to gain and manage access
and contact (e.g. advertisement, letter,
telephone, email, personal visit, personal
contacts/friends).
8 Be prepared to weight (adjust) the data, once
collected.
Conclusion
The message from this chapter is the same as for
many of the others – that every element of the
research should not be arbitrary but planned and
deliberate, and that, as before, the criterion of
planning must be fitness for purpose. The selection
of a sampling strategy must be governed by the
criterion of suitability. The choice of which
strategy to adopt must be mindful of the purposes
of the research, the time scales and constraints on
the research, the methods of data collection, and
the methodology of the research. The sampling
chosen must be appropriate for all of these factors
if validity is to be served.
To the question ‘how large should my sample
be?’, the answer is complicated. This chapter has
suggested that it all depends on:
population size
confidence level and
confidence interval
required
accuracy required (the
smallest sampling error
sought)
number of strata required
number of variables
included in the study
variability of the factor
under study
118 SAMPLING
the
kind of sample (different kinds of
sample within probability and non-probability
sampling)
representativeness
of the sample
allowances
to be made for attrition and nonresponse
need
to keep proportionality in a proportionate
sample.
That said, this chapter has urged researchers to
use large rather than small samples, particularly in
quantitative research.