Monday, July 7, 2014


MOULE I : Sampling
i) Concept of population and sample in Qualitative, Quantitative and Mixed research
ii) Techniques of sampling‐ Probability and Non probability sampling‐Different types.
(8 hours)
The quality of a piece of research stands or falls not only by the appropriateness of methodology and instrumentation but also by the suitability of the sampling strategy that has been adopted (see also Morrison 1993: 112–17).  Researchers must take sampling decisions early in the overall planning of a piece of research. Factors such as expense, time, and accessibility frequently prevent researchers from gaining information from the whole population. Therefore they often need to be able to obtain data from a smaller group or subset of the total population in such a way that the knowledge gained is representative of the total population (however defined) under study. This smaller group or subset is the sample.
Experienced researchers start with the total population and work down to the sample. By contrast, less experienced researchers often work from the bottom up, that is, they determine the minimum number of respondents needed to conduct the research (Bailey 1978). However, unless they identify the total population in advance, it is virtually impossible for them to assess how representative the sample is that they have drawn.
Decisions and problems face researchers in deciding the sampling strategy to be used. Judgements have to be made about four key factors in sampling:
1 the sample size
2 representativeness and parameters of the sample
3 access to the sample
4 the sampling strategy to be used.
The decisions here will determine the sampling strategy to be used. This assumes that a sample is actually required; there may be occasions on which the researcher can access the whole population rather than a sample.
The sample size
A question that often plagues novice researchers is just how large their samples for the research should be. There is no clear-cut answer, for the correct sample size depends on the purpose of the study and the nature of the population under scrutiny. However, it is possible to give some advice on this matter. Generally speaking, the larger the sample the better, as this not only gives greater reliability but also enables more sophisticated statistics to be used. Thus, a sample size of thirty is held by many to be the minimum number of cases if researchers plan to use some form of statistical analysis on their data, though this is a very small number and we would advise very considerably more. Researchers need to think out in advance of any data collection the sorts of relationships that they wish to explore within subgroups of their eventual sample. The number of variables researchers set out to control in their analysis and the types of statistical tests that they wish to make must inform their decisions about sample size prior to the actual research undertaking. Typically an anticipated minimum of thirty cases per variable should be used as a ‘rule of thumb’, i.e. one must be assured of having a minimum of thirty cases for each variable (of course, the thirty cases for variable one could also be the same thirty as for variable two), though this is a very low estimate indeed. This number rises rapidly if different subgroups of the population are included in the sample. Further, depending on the kind of analysis to be performed, some statistical tests will require larger samples. For example, less us imagine that one wished to calculate the chi-square statistic, with cross-tabulated data, for example looking at two subgroups of stakeholders in a primary school containing sixty 10-year-old pupils and twenty teachers and their responses to a question on a 5-point scale. Here one can notice that the sample size is eighty cases, an apparently reasonably sized sample. However, six of the ten cells of responses (60 per cent) contain fewer than five cases.
 The chi-square statistic requires there to be five cases or more in 80 per cent of the cells (i.e. eight out of the ten cells). In this example only 40 per cent of the cells contained more than five cases, so even with a comparatively large sample, the statistical requirements for reliable data with a straightforward statistic such as chisquare have not been met. The message is clear, one needs to anticipate, as far as one is able, some possible distributions of the data and see if these will prevent appropriate statistical analysis; if the distributions look unlikely to enable reliable statistics to be calculated then one should increase the sample size, or exercise great caution in interpreting the data because of problems of reliability, or not use particular statistics, or, indeed, consider abandoning the exercise if the increase in sample size cannot be achieved. The point here is that each variable may need to be ensured of a reasonably large sample size (a minimum of maybe six–ten cases). Indeed Gorard (2003: 63) suggests that one can start from the minimum number of cases required in each cell, multiply this by the number of cells, and then double the total. In the example above, with six cases in each cell, the minimum sample would be 120 (6 × 10 × 2), though, to be on the safe side, to try to ensure ten cases in each cell, a minimum sample of 200 might be better (10 × 10 × 2), though even this is no guarantee. The issue arising out of the example here is also that one can observe considerable variation in the responses from the participants in the research. Gorard (2003: 62) suggests that if a phenomenon contains a lot of potential variability then this will increase the sample size. Surveying a variable such as intelligence quotient (IQ) for example, with a potential range from 70 to around 150, may require a larger sample rather than a smaller sample. As well as the requirement of a minimum number of cases in order to examine relationships between subgroups, researchers must obtain the minimum sample size that will accurately represent the population being targeted. With respect to size, will a large sample guarantee representativeness? Not necessarily!, if a researcher could have interviewed a total sample of 450 females and still not have represented the male population. Will a small size guarantee representativeness? Again, not necessarily! The latter falls into the trap of saying that 50 per cent of those who expressed an opinion said that they enjoyed science, when the 50 per cent was only one student, a researcher having interviewed only two students in all. Furthermore, too large a sample might become unwieldy and too small a sample might be unrepresentative Where simple random sampling is used, the sample size needed to reflect the population value of a particular variable depends both on the size of the population and the amount of heterogeneity in the population (Bailey 1978). Generally, for populations of equal heterogeneity, the larger the population, the larger the sample that must be drawn. For populations of equal size, the greater the heterogeneity on a particular variable, the larger the sample that is needed. To the extent that a sample fails to represent accurately the population involved, there is sampling error, discussed below. Sample size is also determined to some extent by the style of the research. For example, a survey style usually requires a large sample, particularly if inferential statistics are to be calculated. In ethnographic or qualitative research it is more likely that the sample size will be small. Sample size might also be constrained by cost – in terms of time, money, stress, administrative support, the number of researchers, and resources. Borg and Gall (1979: 194–5) suggest that correlational research requires a sample size of no fewer than thirty cases, that causal-comparative and experimental methodologies require a sample size of no fewer than fifteen cases, and that survey research should have no fewer than 100 cases in each major subgroup and twenty–fifty in each minor subgroup. Borg and Gall (1979: 186) advise that sample size has to begin with an estimation of the smallest number of cases in the smallest subgroup of the sample, and ‘work up’ from that, rather than vice versa. So, for example, if 5 per cent of the sample must be teenage boys, and this subsample must be thirty cases (e.g. for correlational research), then the total sample will be 30 ÷ 0.05 = 600; if 15 per cent of the sample must be teenage girls and the subsample must be forty-five cases, then the total sample must be 45 ÷ 0.15 = 300 cases. The size of a probability (random) sample can be determined in two ways, either by the researcher exercising prudence and ensuring that the sample represents the wider features of the population with the minimum number of cases or by using a table which, from amathematical formula, indicates the appropriate size of a random sample for a given number of the wider population (Morrison 1993: 117). One such example is provided by Krejcie and Morgan (1970), whose work suggests that if the researcher were devising a sample from a wider population of thirty or fewer (e.g. a class of students or a group of young children in a class) then she or he would be well advised to include the whole of the wider population as the sample. Krejcie and Morgan (1970) indicate that the smaller the number of cases there are in the wider, whole population, the larger the proportion of that population must be which appears in the sample. The converse of this is true: the larger the number of cases there are in the wider, whole population, the smaller and the proportion of that population can be which appears in the sample. They note that as the population increases the proportion of the population required in the sample diminishes and, indeed, remains constant at around 384 cases (Krejcie and Morgan 1970: 610). Hence, for example, a piece of research involving all the children in a small primary or elementary school (up to 100 students in all) might require between 80 per cent and 100 per cent of the school to be included in the sample, while a large secondary school of 1,200 students might require a sample of 25 per cent of the school in order to achieve randomness. As a rough guide in a random sample, the larger the sample, the greater is its chance of being representative. In determining sample size for a probability sample one has to consider not only the population size but also the confidence level and confidence interval, two further pieces of terminology. The confidence level, usually expressed as a percentage (usually 95 per cent or 99 per cent), is an index of how sure we can be (95 per cent of the time or 99 per cent of the time) that the responses lie within a given variation range, a given confidence interval. The confidence interval is that degree of variation or variation range (e.g. ±1 per cent, or ±2 per cent, or ±3 per cent) that one wishes to ensure. For example, the confidence interval in many opinion polls is ±3 per cent; this means that, if a voting survey indicates that a political party has 52 per cent of the votes then it could be as low as 49 per cent (52 − 3) or as high as 55 per cent (52 + 3). A confidence level of 95 per cent here would indicate that we could be sure of this result within this range (±3 per cent) for 95 per cent of the time. If we want to have a very high confidence level (say 99 per cent of the time) then the sample size will be high. On the other hand, if we want a less stringent confidence level (say 90 per cent of the time), then the sample size will be smaller. Usually a compromise is reached, and researchers opt for a 95 per cent confidence level. Similarly, if we want a very small confidence interval (i.e. a limited range of variation, e.g. 3 per cent) then the sample size will be high, and if we are comfortable with a larger degree of variation (e.g. 5 per cent) then the sample size will be lower. A full table of sample sizes for a probability sample is given in Box 4.1, with three confidence levels (90 per cent, 95 per cent and 99 per cent) and three confidence intervals (5 per cent, 4 per cent and 3 per cent).
 We can see that the size of the sample reduces at an increasing rate as the population size increases; generally (but, clearly, not always) the larger the population, the smaller the proportion of the probability sample can be. Also, the higher the confidence level, the greater the sample, and the lower the confidence interval, the higher the sample. A conventional sampling strategy will be to use a 95 per cent confidence level and a 3 per cent confidence interval. There are several web sites that offer sample size calculation services for random samples. One free site at the time of writing is from Creative Service Systems (http://www.surveysystem. com/sscalc.htm), and another is from Pearson NCS ( sample-calc.htm), in which the researcher inputs the desired confidence level, confidence interval and the population size, and the sample size is automatically calculated. If different subgroups or strata (discussed below) are to be used then the requirements placed on the total sample also apply to each subgroup. For example, let us imagine that we are surveying a whole school of 1,000 students in a multiethnic school. The formulae above suggest that we need 278 students in our random sample, to ensure representativeness. However, let us imagine that we wished to stratify our groups into, for example, Chinese (100 students), Spanish (50 students), English (800 students) and American (50 students). From tables of random sample sizes we work out a random sample.
Our original sample size of 278 has now increased, very quickly, to 428. The message is very clear: the greater the number of strata (subgroups), the larger the sample will be. Much educational research concerns itself with strata rather than whole samples, so the issue is significant. One can rapidly generate the need for a very large sample.
If subgroups are required then the same rules for calculating overall sample size applies to each of the subgroups.
Further, determining the size of the sample will also have to take account of non-response, attrition and respondent mortality, i.e. some participants will fail to return questionnaires, leave the research, and return incomplete or spoiled questionnaires (e.g. missing out items, putting two ticks in a row of choices instead of only one). Hence it is advisable to overestimate rather than to underestimate the size of the sample required, to build in redundancy (Gorard 2003: 60). Unless one has guarantees of access, response and, perhaps, the researcher’s own presence at the time of conducting the research (e.g. presence when questionnaires are being completed), then it might be advisable to estimate up to double the size of required sample in order to allow for such loss of clean and complete copies of questionnaires or responses.
 In some circumstances, meeting the requirements of sample size can be done on an evolutionary basis. For example, let us imagine that you wish to sample 300 teachers, randomly selected. You succeed in gaining positive responses from 250 teachers to, for example, a telephone survey or a questionnaire survey, but you are 50 short of the required number. The matter can be resolved simply by adding another 50 to the random sample, and, if not all of these are successful, then adding some more until the required number is reached.
 Borg and Gall (1979: 195) suggest that, as a general rule, sample sizes should be large where
􀁏 there are many variables
􀁏 only small differences or small relationships are expected or predicted
􀁏 the sample will be broken down into subgroups
􀁏 the sample is heterogeneous in terms of the variables under study
􀁏 reliable measures of the dependent variable are unavailable.
 Oppenheim (1992: 44) adds to this the view that the nature of the scales to be used also exerts an influence on the sample size. For nominal data the sample sizes may well have to be larger than for interval and ratio data (i.e. a variant of the issue of the number of subgroups to be addressed, the greater the number of subgroups or possible categories, the larger the sample will have to be). Borg and Gall (1979) set out a formula driven approach to determining sample size (see also Moser and Kalton 1977; Ross and Rust 1997: 427–38), and they also suggest using correlational tables for correlational studies – available in most texts on statistics – as it were ‘in reverse’ to determine sample size (Borg and Gall 1979: 201), i.e. looking at the significance levels of correlation coefficients and then reading off the sample sizes usually required to demonstrate that level of significance. For example, a correlational significance level of 0.01 would require a sample size of 10 if the estimated coefficient of correlation is 0.65, or a sample size of 20 if the estimated correlation coefficient is 0.45, and a sample size of 100 if the estimated correlation coefficient is 0.20. Again, an inverse proportion can be seen – the larger the sample population, the smaller the estimated correlation coefficient can be to be deemed significant.
 With both qualitative and quantitative data, the essential requirement is that the sample is representative of the population from which it is drawn. In a dissertation concerned with a life history (i.e. n= 1), the sample is the population! 

Qualitative data
In a qualitative study of thirty highly able girls
of similar socio-economic background following
an A level Biology course, a sample of five or
six may suffice the researcher who is prepared to
obtain additional corroborative data by way of
Where there is heterogeneity in the population,
then a larger sample must be selected on
some basis that respects that heterogeneity. Thus,
from a staff of sixty secondary school teachers
differentiated by gender, age, subject specialism,
management or classroom responsibility, etc., it
would be insufficient to construct a sample consisting
of ten female classroom teachers of Arts
and Humanities subjects.
Quantitative data
For quantitative data, a precise sample number
can be calculated according to the level of accuracy
and the level of probability that researchers require
in their work. They can then report in their
study the rationale and the basis of their research
decisions (Blalock 1979).
By way of example, suppose a teacher/researcher
wishes to sample opinions among 1,000 secondary
school students. She intends to use a 10-point
scale ranging from 1 = totally unsatisfactory to
10 = absolutely fabulous. She already has data
from her own class of thirty students and suspects
that the responses of other students will be
broadly similar. Her own students rated the
activity (an extracurricular event) as follows: mean
score = 7.27; standard deviation = 1.98. In other
words, her students were pretty much ‘bunched’
about a warm, positive appraisal on the 10-point
scale. How many of the 1,000 students does she
need to sample in order to gain an accurate (i.e.
reliable) assessment of what the whole school
(n = 1, 000) thinks of the extracurricular event?
It all depends on what degree of accuracy and what level
of probability she is willing to accept.
A simple calculation from a formula by Blalock
(1979: 215–18) shows that:
􀁏 if she is happy to be within + or0.5 of a scale
point and accurate 19 times out of 20, then she
requires a sample of 60 out of the 1,000;
􀁏 if she is happy to be within + or 0.5 of a
scale point and accurate 99 times out of 100,
then she requires a sample of 104 out of the
􀁏 if she is happy to be within + or0.5 of a scale
point and accurate 999 times out of 1,000, then
she requires a sample of 170 out of the 1,000
􀁏 if she is a perfectionist and wishes to be within
+ or 0.25 of a scale point and accurate 999
times out of 1,000, then she requires a sample
of 679 out of the 1,000.
It is clear that sample size is a matter of
judgement as well as mathematical precision; even
formula-driven approaches make it clear that there
are elements of prediction, standard error and
human judgement involved in determining sample
Sampling error
If many samples are taken from the same
population, it is unlikely that they will all have
characteristics identical with each other or with
the population; their means will be different. In
brief, there will be sampling error (see Cohen
and Holliday 1979, 1996). Sampling error is often
taken to be the difference between the sample
mean and the population mean. Sampling error
is not necessarily the result of mistakes made
in sampling procedures. Rather, variations may
occur due to the chance selection of different
individuals. For example, if we take a large
number of samples from the population and
measure the mean value of each sample, then
the sample means will not be identical. Some
will be relatively high, some relatively low, and
many will cluster around an average or mean value
of the samples. We show this diagrammatically in
Box 4.2 (see
9780415368780 – Chapter 4, file 4.4.ppt).
Why should this occur? We can explain the
phenomenon by reference to the Central Limit
Theorem which is derived from the laws of
probability. This states that if random large
samples of equal size are repeatedly drawn from
any population, then the mean of those samples
will be approximately normally distributed. The
distribution of sample means approaches the
normal distribution as the size of the sample
increases, regardless of the shape – normal or
otherwise – of the parent population (Hopkins
et al. 1996: 159, 388). Moreover, the average or
mean of the sample means will be approximately
the same as the population mean. Hopkins et al.
(1996: 159–62) demonstrate this by reporting
the use of computer simulation to examine the
sampling distribution of means when computed
10,000 times (a method that we discuss in
Chapter 4
Box 4.2
Distribution of sample means showing the spread
of a selection of sample means around the
population mean
Ms Ms Ms Ms Mpop Ms Ms Ms Ms
Mpop ! Population mean
Ms ! Sample means
Source: Cohen and Holliday 1979
Chapter 10). Rose and Sullivan (1993: 144)
remind us that 95 per cent of all sample means
fall between plus or minus 1.96 standard errors
of the sample and population means, i.e. that we
have a 95 per cent chance of having a single
sample mean within these limits, that the sample
mean will fall within the limits of the population
By drawing a large number of samples of equal
size from a population, we create a sampling
distribution. We can calculate the error involved
in such sampling (see http://www.routledge.
com/textbooks/9780415368780 – Chapter 4, file
4.5.ppt). The standard deviation of the theoretical
distribution of sample means is a measure of
sampling error and is called the standard error
of the mean (SEM). Thus,
SE =
SDs $N
where SDS = the standard deviation of the sample
and N = the number in the sample.
Strictly speaking, the formula for the standard
error of the mean is:
SE =
where SDpop = the standard deviation of the
However, as we are usually unable to ascertain the
SD of the total population, the standard deviation
of the sample is used instead. The standard error
of the mean provides the best estimate of the
sampling error. Clearly, the sampling error depends
on the variability (i.e. the heterogeneity) in the
population as measured by SDpop as well as the
sample size (N) (Rose and Sullivan 1993: 143).
The smaller the SDpop the smaller the sampling
error; the larger the N, the smaller the sampling
error. Where the SDpop is very large, then N
needs to be very large to counteract it. Where
SDpop is very small, then N, too, can be small
and still give a reasonably small sampling error.
As the sample size increases the sampling error
decreases. Hopkins et al. (1996: 159) suggest that,
unless there are some very unusual distributions,
samples of twenty-five or greater usually yield a
normal sampling distribution of the mean. For
further analysis of steps that can be taken to cope
with the estimation of sampling in surveys we refer
the reader to Ross and Wilson (1997).
The standard error of proportions
We said earlier that one answer to ‘How big a
sample must I obtain?’ is ‘How accurate do I want
my results to be?’ This is well illustrated in the
following example:
A school principal finds that the 25 students she talks
to at random are reasonably in favour of a proposed
change in the lunch break hours, 66 per cent being in
favour and 34 per cent being against. How can she be
sure that these proportions are truly representative of
the whole school of 1,000 students?
A simple calculation of the standard error of
proportions provides the principal with her answer.
SE = !P × Q
P = the percentage in favour
Q = 100 per cent P
N = the sample size
The formula assumes that each sample is drawn
on a simple random basis. A small correction factor
called the finite population correction (fpc) is
generally applied as follows:
SE of proportions = !(1 f)P × Q
where f is the
proportion included in the sample.
Where, for example, a sample is 100 out of 1,000,
f is 0.1.
SE of proportions = !(1 0.1)(66 × 34)
100 = 4.49
With a sample of twenty-five, the SE = 9.4. In
other words, the favourable vote can vary between
56.6 per cent and 75.4 per cent; likewise, the unfavourable
vote can vary between 43.4 per cent
and 24.6 per cent. Clearly, a voting possibility
ranging from 56.6 per cent in favour to 43.4 per
cent against is less decisive than 66 per cent as opposed
to 34 per cent. Should the school principal
enlarge her sample to include 100 students, then
the SE becomes 4.5 and the variation in the range
is reduced to 61.5 per cent70.5 per cent in favour
and 38.5 per cent29.5 per cent against.Sampling
the whole school’s opinion (n = 1, 000) reduces
the SE to 1.5 and the ranges to 64.5 per cent67.5
per cent in favour and 35.5 per cent32.5 per cent
against. It is easy to see why political opinion surveys
are often based upon sample sizes of 1,000 to
1,500 (Gardner 1978).
What is being suggested here generally is that,
in order to overcome problems of sampling error,
in order to ensure that one can separate random
effects and variation from non-random effects,
and in order for the power of a statistic to be
felt, one should opt for as large a sample as
possible. As Gorard (2003: 62) says, ‘power is an
estimate of the ability of the test you are using
to separate the effect size from random variation’,
and a large sample helps the researcher to achieve
statistical power. Samples of fewer than thirty are
dangerously small, as they allow the possibility of
considerable standard error, and, for over around
eighty cases, any increases to the sample size have
little effect on the standard error.
The representativeness of the sample
The researcher will need to consider the extent
to which it is important that the sample in fact
represents the whole population in question (in
the example above, the 1,000 students), if it is
to be a valid sample. The researcher will need
to be clear what it is that is being represented,
i.e. to set the parameter characteristics of the
wider population – the sampling frame – clearly
and correctly. There is a popular example of
how poor sampling may be unrepresentative and
unhelpful for a researcher. A national newspaper
reports that one person in every two suffers
from backache; this headline stirs alarm in every
doctor’s surgery throughout the land. However,
the newspaper fails to make clear the parameters
of the study which gave rise to the headline.
It turns out that the research took place in a
damp part of the country where the incidence
of backache might be expected to be higher
than elsewhere, in a part of the country which
contained a disproportionate number of elderly
people, again who might be expected to have more
backaches than a younger population, in an area
of heavy industry where the working population
might be expected to have more backache than
in an area of lighter industry or service industries,
and used only two doctors’ records, overlooking
the fact that many backache sufferers went to
those doctors’ surgeries because the two doctors
concerned were known to be overly sympathetic
to backache sufferers rather than responsibly
These four variables – climate, age group,
occupation and reported incidence – were seen
to exert a disproportionate effect on the study,
i.e. if the study were to have been carried
out in an area where the climate, age group,
occupation and reporting were to have been
different, then the results might have been
different. The newspaper report sensationally
generalized beyond the parameters of the data,
thereby overlooking the limited representativeness
of the study.
It is important to consider adjusting the
weightings of subgroups in the sample once the
Chapter 4
data have been collected. For example, in a
secondary school where half of the students are
male and half are female, consider pupils’ responses
to the question ‘How far does your liking of the
form teacher affect your attitude to work?’
Variable: How far does your liking of the form
teacher affect your attitude to school work?
Very A Some- Quite A very
little little what a lot great
Male 10 20 30 25 15
Female 50 80 30 25 15
Total 60 100 60 50 30
Let us say that we are interested in the attitudes
according to the gender of the respondents, as well
as overall. In this example one could surmise that
generally the results indicate that the liking of the
form teacher has only a small to moderate effect
on the students’ attitude to work. However, we
have to observe that twice as many girls as boys
are included in the sample, and this is an unfair
representation of the population of the school,
which comprises 50 per cent girls and 50 per cent
boys, i.e. girls are over-represented and boys are
under-represented. If one equalizes the two sets
of scores by gender to be closer to the school
population (either by doubling the number of boys
or halving the number of girls) then the results
look very different.
Variable: How far does your liking of the form
teacher affect your attitude to school work?
Very A Some- Quite A very
little little what a lot great
Male 20 40 60 50 30
Female 50 80 30 25 15
Total 70 120 90 75 45
In this latter case a much more positive picture is
painted, indicating that the students regard their
liking of the form teacher as a quite important
feature in their attitude to school work. Here
equalizing the sample to represent more fairly
the population by weighting yields a different
picture. Weighting the results is an important
The access to the sample
Access is a key issue and is an early factor that must
be decided in research. Researchers will need to
ensure that access is not only permitted but also, in
fact, practicable. For example, if a researcher were
to conduct research into truancy and unauthorized
absence from school, and decided to interview
a sample of truants, the research might never
commence as the truants, by definition, would not
be present! Similarly access to sensitive areas might
be not only difficult but also problematical both
legally and administratively, for example, access
to child abuse victims, child abusers, disaffected
students, drug addicts, school refusers, bullies and
victims of bullying. In some sensitive areas access
to a sample might be denied by the potential
sample participants themselves, for example AIDS
counsellors might be so seriously distressed by their
work that they simply cannot face discussing with
a researcher the subject matter of their traumatic
work; it is distressing enough to do the job without
living through it again with a researcher.
Access might also be denied by the potential
sample participants themselves for very practical
reasons, for example a doctor or a teacher
simply might not have the time to spend with
the researcher. Further, access might be denied
by people who have something to protect, for
example a school which has recently received
a very poor inspection result or poor results on
external examinations, or people who have made
an important discovery or a new invention and
who do not wish to disclose the secret of their
success; the trade in intellectual property has
rendered this a live issue for many researchers.
There are very many reasons that might prevent
access to the sample, and researchers cannot afford
to neglect this potential source of difficulty in
planning research.
In many cases access is guarded by ‘gatekeepers’
– people who can control researchers’ access to
those whom they really want to target. For school
staff this might be, for example, headteachers,
school governors, school secretaries, form teachers;
for pupils this might be friends, gang members,
parents, social workers and so on. It is critical
for researchers to consider not only whether
access is possible but also how access will be
undertaken – to whom does one have to go, both
formally and informally, to gain access to the target
Not only might access be difficult but also
its corollary – release of information – might be
problematic. For example, a researcher might gain
access to a wealth of sensitive information and
appropriate people, but there might be a restriction
on the release of the data collection; in the field
of education in the UK reports have been known
to be suppressed, delayed or ‘doctored’. It is not
always enough to be able to ‘get to’ the sample, the
problem might be to ‘get the information out’ to
the wider public, particularly if it could be critical
of powerful people.
The sampling strategy to be used
There are two main methods of sampling (Cohen
and Holliday 1979; 1982; 1996; Schofield 1996).
The researcher must decide whether to opt for
a probability (also known as a random sample)
or a non-probability sample (also known as a
purposive sample). The difference between them
is this: in a probability sample the chances of
members of the wider population being selected
for the sample are known, whereas in a nonprobability
sample the chances of members of the
wider population being selected for the sample
are unknown. In the former (probability sample)
every member of the wider population has an
equal chance of being included in the sample;
inclusion or exclusion from the sample is a matter
of chance and nothing else. In the latter (nonprobability
sample) some members of the wider
population definitely will be excluded and others
definitely included (i.e. every member of the wider
population does not have an equal chance of being
included in the sample). In this latter type the
researcher has deliberately – purposely – selected
a particular section of the wider population to
include in or exclude from the sample.
Probability samples
A probability sample, because it draws randomly
from the wider population, will be useful if the
researcher wishes to be able to make generalizations,
because it seeks representativeness of the
wider population. It also permits two-tailed tests
to be administered in statistical analysis of quantitative
data. Probability sampling is popular in
randomized controlled trials. On the other hand,
a non-probability sample deliberately avoids representing
the wider population; it seeks only to
represent a particular group, a particular named
section of the wider population, such as a class
of students, a group of students who are taking
a particular examination, a group of teachers
9780415368780 – Chapter 4, file 4.6.ppt).
A probability sample will have less risk of
bias than a non-probability sample, whereas,
by contrast, a non-probability sample, being
unrepresentative of the whole population, may
demonstrate skewness or bias. (For this type of
sample a one-tailed test will be used in processing
statistical data.) This is not to say that the former is
bias free; there is still likely to be sampling error in a
probability sample (discussed below), a feature that
has to be acknowledged, for example opinion polls
usually declare their error factors, e.g. ±3 per cent.
There are several types of probability samples:
simple random samples; systematic samples; stratified
samples; cluster samples; stage samples, and
multi-phase samples. They all have a measure of
randomness built into them and therefore have a
degree of generalizability.
Simple random sampling
In simple random sampling, each member of the
population under study has an equal chance of
being selected and the probability of a member
of the population being selected is unaffected
by the selection of other members of
the population, i.e. each selection is entirely
independent of the next. The method involves
selecting at random from a list of the population
(a sampling frame) the required number of
Chapter 4
subjects for the sample. This can be done by
drawing names out of a container until the required
number is reached, or by using a table of
random numbers set out in matrix form (these
are reproduced in many books on quantitative
research methods and statistics), and allocating
these random numbers to participants or cases
(e.g. Hopkins et al. 1996: 148–9). Because of
probability and chance, the sample should contain
subjects with characteristics similar to the
population as a whole; some old, some young,
some tall, some short, some fit, some unfit,
some rich, some poor etc. One problem associated
with this particular sampling method
is that a complete list of the population is
needed and this is not always readily available
9780415368780 – Chapter 4, file 4.7.ppt).
Systematic sampling
This method is a modified form of simple random
sampling. It involves selecting subjects from a
population list in a systematic rather than a
random fashion. For example, if from a population
of, say, 2,000, a sample of 100 is required,
then every twentieth person can be selected.
The starting point for the selection is chosen at
random (see
9780415368780 – Chapter 4, file 4.8.ppt).
One can decide how frequently to make
systematic sampling by a simple statistic – the total
number of the wider population being represented
divided by the sample size required:
f =
f = frequency interval
N = the total number of the wider population
sn = the required number in the sample.
Let us say that the researcher is working with a
school of 1,400 students; by looking at the table
of sample size (Box 4.1) required for a random
sample of these 1,400 students we see that 302
students are required to be in the sample. Hence
the frequency interval (f) is:
1, 400
302 = 4.635 (which rounds up to 5.0)
Hence the researcher would pick out every fifth
name on the list of cases.
Such a process, of course, assumes that the
names on the list themselves have been listed in a
random order. A list of females and males might
list all the females first, before listing all the males;
if there were 200 females on the list, the researcher
might have reached the desired sample size before
reaching that stage of the list which contained
males, thereby distorting (skewing) the sample.
Another example might be where the researcher
decides to select every thirtieth person identified
from a list of school students, but it happens that:
(a) the school has just over thirty students in each
class; (b) each class is listed from high ability to
low ability students; (c) the school listing identifies
the students by class.
In this case, although the sample is drawn
from each class, it is not fairly representing the
whole school population since it is drawing almost
exclusively on the lower ability students. This is
the issue of periodicity (Calder 1979). Not only is
there the question of the order in which names
are listed in systematic sampling, but also there
is the issue that this process may violate one of
the fundamental premises of probability sampling,
namely that every person has an equal chance
of being included in the sample. In the example
above where every fifth name is selected, this
guarantees that names 1–4, 6–9 etc. will be
excluded, i.e. everybody does not have an equal
chance to be chosen. The ways to minimize this
problem are to ensure that the initial listing is
selected randomly and that the starting point for
systematic sampling is similarly selected randomly.
Stratified sampling
Stratified sampling involves dividing the population
into homogenous groups, each group
containing subjects with similar characteristics.
For example, group A might contain males and
group B, females. In order to obtain a sample
representative of the whole population in
terms of sex, a random selection of subjects
from group A and group B must be taken. If
needed, the exact proportion of males to females
in the whole population can be reflected
in the sample. The researcher will have to identify
those characteristics of the wider population
which must be included in the sample, i.e. to
identify the parameters of the wider population.
This is the essence of establishing the sampling
frame (see
9780415368780 – Chapter 4, file 4.9.ppt).
To organize a stratified random sample is a
simple two-stage process. First, identify those
characteristics that appear in the wider population
that must also appear in the sample, i.e. divide
the wider population into homogenous and, if
possible, discrete groups (strata), for example
males and females. Second, randomly sample
within these groups, the size of each group
being determined either by the judgement of
the researcher or by reference to Boxes 4.1
or 4.2.
The decision on which characteristics to include
should strive for simplicity as far as possible, as
the more factors there are, not only the more
complicated the sampling becomes, but often the
larger the sample will have to be to include
representatives of all strata of the wider population.
A stratified random sample is, therefore, a
useful blend of randomization and categorization,
thereby enabling both a quantitative and
qualitative piece of research to be undertaken.
A quantitative piece of research will be able
to use analytical and inferential statistics, while
a qualitative piece of research will be able to
target those groups in institutions or clusters of
participants who will be able to be approached to
participate in the research.
Cluster sampling
When the population is large and widely dispersed,
gathering a simple random sample poses
administrative problems. Suppose we want to survey
students’ fitness levels in a particularly large
community or across a country. It would be completely
impractical to select students randomly
and spend an inordinate amount of time travelling
about in order to test them. By cluster sampling,
the researcher can select a specific number of
schools and test all the students in those selected
schools, i.e. a geographically close cluster is sampled
9780415368780 – Chapter 4, file 4.10.ppt).
One would have to be careful to ensure that
cluster sampling does not build in bias. For
example, let us imagine that we take a cluster
sample of a city in an area of heavy industry or
great poverty; this may not represent all kinds of
cities or socio-economic groups, i.e. there may be
similarities within the sample that do not catch
the variability of the wider population. The issue
here is one of representativeness; hence it might be
safer to take several clusters and to sample lightly
within each cluster, rather to take fewer clusters
and sample heavily within each.
Cluster samples are widely used in small-scale
research. In a cluster sample the parameters of the
wider population are often drawn very sharply; a
researcher, therefore, would have to comment on
the generalizability of the findings. The researcher
may also need to stratify within this cluster sample
if useful data, i.e. those which are focused and
which demonstrate discriminability, are to be
Stage sampling
Stage sampling is an extension of cluster sampling.
It involves selecting the sample in stages, that
is, taking samples from samples. Using the large
community example in cluster sampling, one type
of stage sampling might be to select a number of
schools at random, and from within each of these
schools, select a number of classes at random,
and from within those classes select a number of
Morrison (1993: 121–2) provides an example
of how to address stage sampling in practice. Let
us say that a researcher wants to administer a
questionnaire to all 16-year-old pupils in each
of eleven secondary schools in one region. By
contacting the eleven schools she finds that there
are 2,000 16-year-olds on roll. Because of questions
Chapter 4
of confidentiality she is unable to find out the
names of all the students so it is impossible to
draw their names out of a container to achieve
randomness (and even if she had the names, it
would be a mind-numbing activity to write out
2,000 names to draw out of a container!). From
looking at Box 4.1 she finds that, for a random
sample of the 2,000 students, the sample size is
322 students. How can she proceed?
The first stage is to list the eleven schools on
a piece of paper and then to write the names of
the eleven schools on to small cards and place
each card in a container. She draws out the first
name of the school, puts a tally mark by the
appropriate school on her list and returns the
card to the container. The process is repeated 321
times, bringing the total to 322. The final totals
might appear thus:
School 1 2 3 4 5 6 7 8 9 10 11 Total
Required no.
of students 22 31 32 24 29 20 35 28 32 38 31 322
For the second stage the researcher then
approaches the eleven schools and asks each of
them to select randomly the required number of
students for each school. Randomness has been
maintained in two stages and a large number
(2,000) has been rendered manageable. The
process at work here is to go from the general to
the specific, the wide to the focused, the large to
the small. Caution has to be exercised here, as the
assumption is that the schools are of the same size
and are large; that may not be the case in practice,
in which case this strategy may be inadvisable.
Multi-phase sampling
In stage sampling there is a single unifying purpose
throughout the sampling. In the previous example
the purpose was to reach a particular group of
students from a particular region. In a multi-phase
sample the purposes change at each phase, for
example, at phase one the selection of the sample
might be based on the criterion of geography
(e.g. students living in a particular region); phase
two might be based on an economic criterion
(e.g. schools whose budgets are administered in
markedly different ways); phase three might be
based on a political criterion (e.g. schools whose
students are drawn from areas with a tradition
of support for a particular political party), and
so on. What is evident here is that the sample
population will change at each phase of the research
9780415368780 – Chapter 4, file 4.11.ppt).
Non-probability samples
The selectivity which is built into a nonprobability
sample derives from the researcher
targeting a particular group, in the full knowledge
that it does not represent the wider population; it
simply represents itself. This is frequently the case
in small-scale research, for example, as with one
or two schools, two or three groups of students, or
a particular group of teachers, where no attempt
to generalize is desired; this is frequently the case
for some ethnographic research, action research
or case study research (see http://www.routledge.
com/textbooks/9780415368780 – Chapter 4, file
4.12.ppt). Small-scale research often uses nonprobability
samples because, despite the disadvantages
that arise from their non-representativeness,
they are far less complicated to set up, are considerably
less expensive, and can prove perfectly
adequate where researchers do not intend to generalize
their findings beyond the sample in question,
or where they are simply piloting a questionnaire
as a prelude to the main study.
Just as there are several types of probability sample,
so there are several types of non-probability
sample: convenience sampling, quota sampling,
dimensional sampling, purposive sampling and
snowball sampling. Each type of sample seeks only
to represent itself or instances of itself in a similar
population, rather than attempting to represent
the whole, undifferentiated population.
Convenience sampling
Convenience sampling – or, as it is sometimes
called, accidental or opportunity sampling –
involves choosing the nearest individuals to serve
as respondents and continuing that process until
the required sample size has been obtained or
those who happen to be available and accessible
at the time. Captive audiences such as students or
student teachers often serve as respondents based
on convenience sampling. Researchers simply
choose the sample from those to whom they
have easy access. As it does not represent any
group apart from itself, it does not seek to
generalize about the wider population; for a
convenience sample that is an irrelevance. The
researcher, of course, must take pains to report
this point – that the parameters of generalizability
in this type of sample are negligible. A
convenience sample may be the sampling strategy
selected for a case study or a series of case
studies (see
9780415368780 – Chapter 4, file 4.13.ppt).
Quota sampling
Quota sampling has been described as the
non-probability equivalent of stratified sampling
(Bailey 1978). Like a stratified sample, a
quota sample strives to represent significant characteristics
(strata) of the wider population; unlike
stratified sampling it sets out to represent these
in the proportions in which they can be found
in the wider population. For example, suppose
that the wider population (however defined) were
composed of 55 per cent females and 45 per cent
males, then the sample would have to contain 55
per cent females and 45 per cent males; if the
population of a school contained 80 per cent of
students up to and including the age of 16 and
20 per cent of students aged 17 and over, then
the sample would have to contain 80 per cent of
students up to the age of 16 and 20 per cent of students
aged 17 and above. A quota sample, then,
seeks to give proportional weighting to selected
factors (strata) which reflects their weighting in
which they can be found in the wider population
9780415368780 – Chapter 4, file 4.14.ppt). The
researcher wishing to devise a quota sample can
proceed in three stages:
1 Identifythosecharacteristics(factors)which
appear in the wider population which must
also appear in the sample, i.e. divide the wider
population into homogenous and, if possible,
discrete groups (strata), for example, males
and females, Asian, Chinese and African
2 Identifytheproportionsinwhichtheselected
characteristics appear in the wider population,
expressed as a percentage.
3 Ensure that the percentaged proportions of
the characteristics selected from the wider
population appear in the sample.
Ensuring correct proportions in the sample may
be difficult to achieve if the proportions in the
wider community are unknown or if access to the
sample is difficult; sometimes a pilot survey might
be necessary in order to establish those proportions
(and even then sampling error or a poor response
rate might render the pilot data problematical).
It is straightforward to determine the minimum
number required in a quota sample. Let us say that
the total number of students in a school is 1,700,
made up thus:
Performing arts 300 students
Natural sciences 300 students
Humanities 600 students
Business and Social Sciences 500 students
The proportions being 3:3:6:5, a minimum of 17
students might be required (3 + 3 + 6 + 5) for
the sample. Of course this would be a minimum
only, and it might be desirable to go higher than
this. The price of having too many characteristics
(strata) in quota sampling is that the minimum
number in the sample very rapidly could become
very large, hence in quota sampling it is advisable
to keep the numbers of strata to a minimum. The
larger the number of strata, the larger the number
in the sample will become, usually at a geometric
rather than an arithmetic rate of progression.
Purposive sampling
In purposive sampling, often (but by no means
exclusively) a feature of qualitative research,
researchers handpick the cases to be included
in the sample on the basis of their judgement
of their typicality or possession of the particular
Chapter 4
characteristics being sought. In this way, they build
up a sample that is satisfactory to their specific
needs. As its name suggests, the sample has been
chosen for a specific purpose, for example: a group
of principals and senior managers of secondary
schools is chosen as the research is studying the
incidence of stress among senior managers; a group
of disaffected students has been chosen because
they might indicate most distinctly the factors
which contribute to students’ disaffection (they
are critical cases, akin to ‘critical events’ discussed
in Chapter 18, or deviant cases – those cases which
go against the norm: (Anderson and Arsenault
1998: 124); one class of students has been selected
to be tracked throughout a week in order to report
on the curricular and pedagogic diet which is
offered to them so that other teachers in the
school might compare their own teaching to that
reported. While it may satisfy the researcher’s
needs to take this type of sample, it does not
pretend to represent the wider population; it
is deliberately and unashamedly selective and
biased (see
9780415368780 – Chapter 4, file 4.15.ppt).
In many cases purposive sampling is used in
order to access ‘knowledgeable people’, i.e. those
who have in-depth knowledge about particular
issues, maybe by virtue of their professional
role, power, access to networks, expertise or
experience (Ball 1990). There is little benefit
in seeking a random sample when most of
the random sample may be largely ignorant of
particular issues and unable to comment on
matters of interest to the researcher, in which
case a purposive sample is vital. Though they may
not be representative and their comments may not
be generalizable, this is not the primary concern
in such sampling; rather the concern is to acquire
in-depth information from those who are in a
position to give it.
Another variant of purposive sampling is the
boosted sample. Gorard (2003: 71) comments on
the need to use a boosted sample in order to include
those who may otherwise be excluded from, or
under-represented in, a sample because there are
so few of them. For example, one might have a very
small number of special needs teachers or pupils in
a primary school or nursery, or one might have a
very small number of children from certain ethnic
minorities in a school, such that they may not
feature in a sample. In this case the researcher will
deliberately seek to include a sufficient number of
them to ensure appropriate statistical analysis or
representation in the sample, adjusting any results
from them, through weighting, to ensure that they
are not over-represented in the final results. This
is an endeavour, perhaps, to reach and meet the
demands of social inclusion.
A further variant of purposive sampling
is negative case sampling. Here the researcher
deliberately seeks those people who might
disconfirm the theories being advanced (the
Popperian equivalent of falsifiability), thereby
strengthening the theory if it survives such
disconfirming cases. A softer version of negative
case sampling is maximum variation sampling,
selecting cases from as diverse a population as
possible (Anderson and Arsenault 1998: 124) in
order to ensure strength and richness to the data,
their applicability and their interpretation. In this
latter case, it is almost inevitable that the sample
size will increase or be large.
Dimensional sampling
One way of reducing the problem of sample size in
quota sampling is to opt for dimensional sampling.
Dimensional sampling is a further refinement of
quota sampling. It involves identifying various
factors of interest in a population and obtaining
at least one respondent of every combination of
those factors. Thus, in a study of race relations,
for example, researchers may wish to distinguish
first, second and third generation immigrants.
Their sampling plan might take the form of a
multidimensional table with ‘ethnic group’ across
the top and ‘generation’ down the side. A second
example might be of a researcher who may be interested
in studying disaffected students, girls and
secondary-aged students and who may find a single
disaffected secondary female student, i.e. a respondent
who is the bearer of all of the sought characteristics
9780415368780 – Chapter 4, file 4.16.ppt).
Snowball sampling
In snowball sampling researchers identify a small
number of individuals who have the characteristics
in which they are interested. These people are
then used as informants to identify, or put the
researchers in touch with, others who qualify
for inclusion and these, in turn, identify yet
others – hence the term snowball sampling. This
method is useful for sampling a population where
access is difficult, maybe because it is a sensitive
topic (e.g. teenage solvent abusers) or where
communication networks are undeveloped (e.g.
where a researcher wishes to interview stand-in
‘supply’ teachers – teachers who are brought in
on an ad-hoc basis to cover for absent regular
members of a school’s teaching staff – but finds
it difficult to acquire a list of these stand-in
teachers), or where an outside researcher has
difficulty in gaining access to schools (going
through informal networks of friends/acquaintance
and their friends and acquaintances and so on
rather than through formal channels). The task for
the researcher is to establish who are the critical or
key informants with whom initial contact must be
made (see
9780415368780 – Chapter 4, file 4.17.ppt).
Volunteer sampling
In cases where access is difficult, the researcher may
have to rely on volunteers, for example, personal
friends, or friends of friends, or participants who
reply to a newspaper advertisement, or those who
happen to be interested from a particular school,
or those attending courses. Sometimes this is
inevitable (Morrison 2006), as it is the only kind
of sampling that is possible, and it may be better
to have this kind of sampling than no research
at all.
In these cases one has to be very cautious
in making any claims for generalizability or
representativeness, as volunteers may have a range
of different motives for volunteering, e.g. wanting
to help a friend, interest in the research, wanting
to benefit society, an opportunity for revenge on a
particular school or headteacher. Volunteers may
be well intentioned, but they do not necessarily
represent the wider population, and this would
have to be made clear.
Theoretical sampling
This is a feature of grounded theory. In grounded
theory the sample size is relatively immaterial, as
one works with the data that one has. Indeed
grounded theory would argue that the sample size
could be infinitely large, or, as a fall-back position,
large enough to saturate the categories and issues,
such that new data will not cause the theory that
has been generated to be modified.
Theoretical sampling requires the researcher
to have sufficient data to be able to generate
and ‘ground’ the theory in the research context,
however defined, i.e. to create a theoretical
explanation of what is happening in the situation,
without having any data that do not fit the theory.
Since the researcher will not know in advance
how much, or what range of data will be required,
it is difficult, to the point of either impossibility,
exhaustion or time limitations, to know in advance
the sample size required. The researcher proceeds
in gathering more and more data until the theory
remains unchanged or until the boundaries of
the context of the study have been reached,
until no modifications to the grounded theory are
made in light of the constant comparison method.
Theoretical saturation (Glaser and Strauss 1967:
61) occurs when no additional data are found that
advance, modify, qualify, extend or add to the
theory developed.
Glaser and Strauss (1967) write that
theoretical sampling is the process of data collection
for generating theory whereby the analyst jointly
collects, codes, and analyzes his [sic.] data and decides
what data to collect next and where to find them, in
order to develop his theory as it emerges.
(Glaser and Strauss 1967: 45)
The two key questions, for the grounded theorist
using theoretical sampling are, first, to which
Chapter 4
groups does one turn next for data? Second, for
what theoretical purposes does one seek further
data? In response to the first, Glaser and Strauss
(1967: 49) suggest that the decision is based on
theoretical relevance, i.e. those groups that will
assist in the generation of as many properties and
categories as possible.
Hence the size of the data setmay be fixed by the
number of participants in the organization, or the
number of people to whom one has access, but
the researcher has to consider that the door may
have to be left open for him/her to seek further
data in order to ensure theoretical adequacy and to
check what has been found so far with further data
(Flick et al. 2004: 170). In this case it is not always
possible to predict at the start of the research just
how many, and who, the research will need for the
sampling; it becomes an iterative process.
Non-probability samples also reflect the issue
that sampling can be of people but it can also
be of issues. Samples of people might be selected
because the researcher is concerned to address
specific issues, for example, those students who
misbehave, those who are reluctant to go to school,
those with a history of drug dealing, those who
prefer extra-curricular to curricular activities. Here
it is the issue that drives the sampling, and so the
question becomes not only ‘whom should I sample’
but also ‘what should I sample’ (Mason 2002:
127–32). In turn this suggests that it is not only
people who may be sampled, but texts, documents,
records, settings, environments, events, objects,
organizations, occurrences, activities and so on.
Planning a sampling strategy
There are several steps in planning the sampling
1 Decidewhetheryouneedasample,orwhether
it is possible to have the whole population.
2 Identifythepopulation,itsimportantfeatures
(the sampling frame) and its size.
3 Identify the kind of sampling strategy you
require (e.g. which variant of probability and
non-probability sample you require).
4 Ensurethataccesstothesampleisguaranteed.
If not, be prepared to modify the sampling
strategy (step 2).
5 For probability sampling, identify the confidence
level and confidence intervals that you
For non-probability sampling, identify the
people whom you require in the sample.
6 Calculatethenumbersrequiredinthesample,
allowing for non-response, incomplete or
spoiled responses, attrition and sample
mortality, i.e. build in redundancy.
7 Decide how to gain and manage access
and contact (e.g. advertisement, letter,
telephone, email, personal visit, personal
8 Be prepared to weight (adjust) the data, once
The message from this chapter is the same as for
many of the others – that every element of the
research should not be arbitrary but planned and
deliberate, and that, as before, the criterion of
planning must be fitness for purpose. The selection
of a sampling strategy must be governed by the
criterion of suitability. The choice of which
strategy to adopt must be mindful of the purposes
of the research, the time scales and constraints on
the research, the methods of data collection, and
the methodology of the research. The sampling
chosen must be appropriate for all of these factors
if validity is to be served.
To the question ‘how large should my sample
be?’, the answer is complicated. This chapter has
suggested that it all depends on:
􀁏 population size
􀁏 confidence level and confidence interval
􀁏 accuracy required (the smallest sampling error
􀁏 number of strata required
􀁏 number of variables included in the study
􀁏 variability of the factor under study
􀁏 the kind of sample (different kinds of
sample within probability and non-probability
􀁏 representativeness of the sample
􀁏 allowances to be made for attrition and nonresponse
􀁏 need to keep proportionality in a proportionate
That said, this chapter has urged researchers to
use large rather than small samples, particularly in

quantitative research.

No comments:

Post a Comment