Beware of Black Boxes – The Strength and Validity of Internet Surveys

Thursday, February 28, 2008 0 No tags Permalink 0

ZINC research was recently a sponsor for Net Gain 2.0 in Ottawa.

This was an excellent conference on the current state and future on online research and using wireless technologies. I recommend that fellow researchers attend future iterations of this conference.

One area that many speakers dedicated much attention was the reliability of online polls. And specifically, the representativeness of the sample. I take the firm stance that online polling, with demographic-based weighting, is able to deliver findings reflective of the general population.

The fact is online is just another medium for data collection, and should be treated as such. With 70% of North Americans being Internet Users (with a higher incidence in Canada), given the nature of the study being conducted, we are at a point of acceptability of online gen-pop polling.

At the conference there were a number of issues raised about online data collection. Let me elaborate on the topics and my associated thoughts.

First, there was discussion as to how do you overcome the issue of the offline population. This is the traditional argument among researchers who argue that telephone polling is better at generating a “random and representative” sample. Two points here. First, young people are opting not to have a land line. Given the diminished capability of contacting this group, one must question the weighting procedure for younger respondents within a sample. Second, telephone response rates are most probably at the lowest point that they ever were. Telephone polling now has a host of tools, such as predictive dialers and pre-screened sample, to assist in their trade. So the question arises – are they approaching a representative population? Like the online population, in reality there are limits of who can be contacted and who wants to be contacted via telephone surveys. Thus, it is likely that the offline population may share much in common with the “uncontactable” population.

Another discussion was on self selection bias with online surveys. As indicated above, given the declining telephone response rates there is likely a self-selection process in participating in a phone survey. Many of us are aware that there are respondents who pick and chose to participate in the telephone survey, especially when asked how long it would take and what is the topic. Online researchers have been challenged to conduct parallel studies with telephone methods and the findings have repeatedly shown that there are minimal self selection differences. Further, given the vehicle of online data collection, it has delivered innovation in using incentives to motivate participation without any major quality effects.

Mode effects also raised some questions. Does the medium attract and produce skewed results? There is ample evidence – from polling companies and academic researchers – that has proven that the presence of “hyperactive” panel members is a myth and, if there are any withing a survey, have had little effect on data quality. Further, there are enough controls to assess fraudulent responses and “automaton” type responses via pattern recognition algorithms. The latter is easily investigated within longer surveys – and truth be told, we as market researchers do a great injustice to our discipline with lengthy surveys, regardless of the medium.

Finally, there is the subject of social desirability. The presence of a person automatically offers a “barrier” and “perceived filter” for the honesty of respondents. We have seen this across the breadth of research we have conducted – including self-complete and administered onsite projects. We have noted that online respondents tend to me MORE honest on contentious issues as know that their feedback is anonymous. Clients should appreciate this as there is less “sugar coating” within the actual data. Ultimately, what needs to be assessed is the consistency of method – if it is a tracking study, keeping a similar data collection is critical to mitigate any of the surprises that may result in shift in methodology.

Considering all these points, one thing that is noted is many government departments (who I regard as the most stringent in their data collection methods as they cannot afford to get it wrong), regardless of order, have moved over to online methodologies. So they must be comfortable with its application.

My last thought is this notion on proprietary weighting schemes. I personally disapprove of these. As they are proprietary, by definition they are not transparent nor reproducible. And the question arises, is the weighting applied designed to “fit the data?” We as researchers have a responsibility to apply simple methods to rebalance data from publicly available sources (such as the census) to ensure that our approach is reproducible and easily validated. I personally believe, given the magnitude of the online population at this time, that weighting by age, gender, region and (if the data exists) education. For those who have proprietary “black box” weighting schemes, it has been demonstrated that their data is within the margins of error of demographic weightings. With the issue of margins of error being debatable for online studies, the goal should be to deliver insights that are from a sample reflective of the population to ensure and reinforce the credibility of this means of data collection.

No Comments Yet.

Leave a Reply

Your email address will not be published. Required fields are marked *