For my current thesis project, I'm developing an inference methodology for snowball samples. Snowball samples, or more generally respondent driven samples, are used to sample hard-to-reach populations or to explore network structure. The sampling protocol I'm working with is
- Select someone to interview from the population by simple random sampling.
- In the interview, ask that person the number of connections they have (where the definition of a 'connection' is defined by the study; it could be business partners, sexual partners, close friends, or fellow injection drug users.)
- For each connection the interviewed person has, give that person a recruitment coupon to invite that connected person to come in for an interview, often with an incentive.
- If no connected person responds, go to Step 1 and repeat.
- If one connected person responds, go to Step 2 and repeat.
- If multiple people respond, go to Step 2 and interview each of them.
- When the desired sample size has been attained, no more people are interviewed, even if additional people respond to the invitation. (Budget limitation)
- Connections that lead back to people already in the sample are ignored. Even the information about there being a connection is missing. (Confidentiality limitation)
|Figure 1, a network measure based only on observed network structure.|
|Figure 2, a network measure based on sampling order.|