 I'm going to largely talk about something called a dress-based online surveying, or at least that's what we call it within our company, Cantar. But a dress-based online surveying is really part of a larger family of designs which are called push to web, term you may well have come across before now. What I'm not talking about though is online panels such as YouGov have and Cantar itself has a very large online panel as well. These are used for more online research than anything I talk about, the vast majority of it comes through online panels, but at its heart an online panel is a convenient sample. We have a large number of volunteers, YouGov and Cantar and all the rest of them will then sample from these panels and bring in a sort of broadly representative samples that are demographically balanced from this pool. But ultimately this pool is of people who are quite keen to do surveys very, very frequently indeed, which is I would say a convenient sample. Probably only about 0.5% of the population is a member of an online panel. So far as there are biases within that group that go beyond the demographic factors, they will carry through into the surveys themselves. Now that isn't always a problem, almost all opinion polling for example is carried out using online panels because you can put the survey through and within sometimes only 24 hours, sometimes even less than that, get some answers back from that. There's no form of random sampling with any kind of response maximisation which is typical of social research. None of that's going to work in a polling scenario though Curtis may mention a few counter examples to that. So I'm not talking about that, I'm really talking about when we're trying to get a random sample of some sort, i.e. we're going to believe that random sampling is better than sampling from convenient samples. Anyway, so just as a prelude to that. So first of all, what and why has pushed a web? So what has pushed a web surveying? Effectively the sample is going to be a sample of anything really, any population is contacted offline, which might be by post, it might be by telephone, it might be in person but it's not online because there is no sample frame of people that you can contact online that is a tall representative. So they're contacted offline and asked to complete a questionnaire online while other response modes which may be available are actually downplayed or are only offered to people after they have not responded to the request to do it online. So that's effectively what it is. But why do we want to push people to respond online in the first place? First of all, and this is probably 98% of the reason in some circumstances, well probably most circumstances, online completions cost less than other types, which means that either the research commissioner can simply spend less on research and spend more on other things or spend less in grand total or they can spend just as much but get larger samples. So Pat showed for example how with the science education tracker he was able to get a sample of 4000 whereas previously he had a sample of 450 so there are a lot more analytical options available to him once he had a larger sample size. So that's the main reason. It's also more convenient for some people because there's no need for interview appointments to be kept and the questionnaire can be done bit by bit. It doesn't have to be done all in one chunk. I don't know why in theory interviews can be done in little chunks as well. In practice that's quite impractical and most people once they've started an interview feel they've got to finish it. Whereas the online survey completion can be done fitted around other things. So it's got a convenience factor there. Also with online surveying you can use visual prompts. So you can use pictures, even video, more dynamic kind of question design and it's possible with say paper or with telephone and you can do most of these things with face-to-face interviewing but face-to-face interviewing is the most expensive type. So that when you're doing online surveying it's not really the rival. The rival is really these other forms like paper and telephone and so on. It also finally and least consequently seems modern to many research commissioners or at least not as antiquated as interviewing. Many research commissioners and especially their political masters think it's vaguely ridiculous to actually send somebody around to somebody's house and take up their time with an interview or even call people up to do that because they themselves would never do any of those things and they don't know any of their friends who would do these things either whereas they can imagine it more the other way round. There's also the government itself has a digital by default attitude if you like that's percolated through which is that they try to do that first even to the point where if somebody calls up and says I'm not online I don't want to do this online. Their first approach is actually to try and help them do it online rather than offer some alternative mode of getting a service and so on. So they are very much in that mindset so that's one of the other reasons advantages if you like of online or there's a rather small advantage. Now there's lots of varieties of push-to-web designs and the contact approach of course depends on what is known about the sampled household or individual or organisation. Now the contact itself although I've said it's offline and that could be through the telephone or in person it's almost always written contact rather than expensive personal contact as soon as you get people involved you get cost involved and one of the principal advantages of online surveying is lost. Now the contact mode itself can include mailed letters it can include emails, it can include SMS text messaging it can include use of various social media platforms and so on all depending on what you've got and all available contact modes tend to be used when they are in fact available and it's also typical to send two to three reminders if you're doing social research as most of the literature suggests that maximising the number of contact opportunities if you like maximises the probability of response. More than anything else you can put into your design simply putting the survey in front of people's faces is by far the most effective thing you can do. It's quite easy to hear this and say well this sounds like bombardment and potentially that could reduce response probabilities to your second mode for instance so if somebody doesn't respond online you might then offer them say a paper questionnaire as an alternative but you think well just a minute I've bombarded them with the online request now they're definitely not going to do the paper questionnaire so there's some risk of that. There's also if you're using online within a longitudinal study if you've done all this bombardment it can put people off potentially so these are the theories at least now so far I at least have not seen much evidence that any of these things actually matter in practice but it's also fair to say that pushed web designs are pretty new in social research and so the absence of evidence of this sort of effect is not really evidence of that effect isn't there, evidence of absence. So give me a couple of examples in the UK there are several major longitudinal surveys that have used pushed web as part of their design understanding society itself which is the largest survey carried out in Britain from wave seven onwards is pushed web and if they don't do it by web they do it face to face so it goes straight to face to face interview it doesn't go through the telephone mode although there are a very tiny number of people who do complete it by telephone when it's complete desperation at the end of the wave but basically it's just web and then face to face since wave seven gradually a greater percentage of the understanding society has been put into this model although there's a reserved one-fifth of it that just goes straight to face to face and that's done so that some effort can be made to identify measurement effects between completing a questionnaire on web and completing it through an interview but initially it was quite a struggle to get people to do it by web but now generally of those that are put through there about 50% to complete it by web and a little under that will complete it face to face another example the National Child Development study started in 1958 and this was the age 55 sweep this one is they already had varied it with face to face for most of the waves of that survey but sometimes using telephone and this particular one they used web and then telephone for the remainder primarily because it was the age 55 and we'd seen in other surveys that this was the absolute sweet spot to get people to respond to an online survey it's not young people it's people in middle age and late middle age and indeed the response rate from the web part was about 62-63% and the overall response rate adding on the telephone part was greater than telephone on its own which isn't always the case that a lot of these mixed mode studies don't have higher response rates than a single mode study in this case it did slightly better another one is next steps which follows a group of people who were 13 or 14 in a particular year follows them every year for about 7 years until they get to a point where they're either in higher education or well into the labour market DFE has sponsored that and there are two waves of that waves 5 to 8 of the original cohort were done with web then telephone for the rest and then face to face and that model is used from wave 4 of the new cohort as well and in this case response rate to online is about 45% and the other 45% respond by one of the other modes face to face is barely used at all in next steps only about 10 or 15% completed that way both next steps on understanding society both have response rates around the 85-90% mark because they're quite well in studies that are well along now mixed mode studies like this are far more complex to manage than single mode studies and the data is more complex to use so it's not just a sort of free lunch in using these cheaper modes now the measurement effects in particular are sometimes substantial which makes causal inference more difficult for those using the data so I had a look at the very latest wave of next steps and in this most of them have gone web then telephone then face to face but there are others that have gone telephone then face to face and then another set that just went straight to face to face there's a number of experimental subgroups there and I hoped that this approach would allow me to actually identify some measurement effects given that response to each of these mode sequences didn't seem particularly different and in fact there are masses of measurement effects if indeed they are measurement effects there are masses of them all over the place so that those using that data in almost every bit of analysis you do you need to in some sense account for the fact that the mode is collected from different modes data collection savings in a model which is web and then face to face which is understanding society now the ESRC very much pressed ISA who run understanding society to make sure that web data collection was in there but in fact the savings of data collection costs are only about 10% the reason is that interviewers still have to travel to all the same sample areas as they otherwise would have and they've got a smaller set of households to go to but they're all the harder ones the ones that don't respond online so it's all the low hanging fruit has gone so the cost of each per unit is higher than it otherwise would be and we still need this vast field force to do it so although online is used there it's not the principle of saving costs it's not being realised that much although it's still amounts to a couple of million quid because it's a massive study cross sectional studies there's two types of these first are with name based samples which are great Pat's given a good example there of the science education monitor but another similar design was with the DFV which is a pupil and parent survey so again they were sampled from the national pupil database wrote to the parents, parents completed a survey and then the child also, the sample child also completed it and the data was put together confidentiality sort of separating the two off so one group didn't see the other so a couple of examples of that so these are great cases because there's no additional sampling if you're sampling addresses you're never actually interested in the population of addresses you've got to do a bit more sampling beyond that so here you're not having to do that there's a better chance of response if you're actually writing to somebody personally and actually if you've got their name right for starters that's nice stronger data safeguards in the sense that if you're writing to an address and then you're trying to randomly sample somebody almost anybody, it's quite difficult to put together data safeguards so that one person who completed the survey their data can't be looked at by somebody else in the household there's a little bit more protection there and also if the sample frame is informative enough as I think the national pupil database is there is a potential for some tailored communications to encourage people to take part a tailored communications when you don't know anything such as when you're writing to addresses they have almost undetectably small effects and are not really worth spending a massive amount of time on but if you've actually got some reasonable data from a sample frame you've got some options there but what I'm going to talk about is of course is address based samples now most of the big national surveys done in Britain they're all sample from the postcode address file and interviewers go out there of course but of course so we use the same frame the same postcode address file for that it's comprehensive covers about 98% of the general population of Britain but it's sparse so it hasn't got anything else on it you can attach some things to it so you can use small area data which is available mostly census data but there's also some benefits database database data medical data and so on which is aggregated at the small area level that tells you something about the place there's also if you go to vendors like CACI or Experian they'll tell you that they can give you some address level or even individual level data which you can stick onto this sample to make it richer however we've done some work with that and the accuracy level is probably not what you were hoping for it's useful in the sense it can increase the efficiency of sample designs but that's all it can do despite what they say they can't even get the count of people right in the household a lot of the time so I wouldn't trust that all together but it can be useful in terms of survey design our approach with address based samples is we call A-BOSS within Cantile which is a dress based online survey but it's ultimately based on designs that were originally worked up in the United States by Donald Dilman and some of his colleagues adapted for the UK so what is A-BOSS itself well it's got a very simple basic design a stratified random sample of addresses is drawn from the postcode address file an invitation letter is sent to the residents of that address containing one or more usernames and passwords plus the URL of a survey website questionnaires are always device agnostic these days which really means you've designed it to be done on a smartphone or at least to work fairly well there there are a few still going where the questionnaire is not what you might call mobile optimised but for the most part that is kind of routine these days but within that basic design there are lots of variants possible firstly how do you sample within these addresses do you take one person at random who lives there or do you take all of them or do you take one or two of them that sort of thing what sort of reminder communications should you use should you use letters should you use postcards what kind of responsive design might you use in order to try and make sure that the responding sample is as representative as possible is there any evolving message strategy you can adopt to get yourself through the various stages of this communication cos that's all you've got is your letters in this case incentives what value should be should they be targeted at particular sorts of addresses rather than others and complementary modes should you offer a paper option for those who can't or won't do it online or a separate interview survey any of these kinds of things there's lots of variants here most of which we've done at least one survey of that type so here is and I realise from having sitting there nobody's ever going to be able to read any of this this is an example of letter this is actually taken from the 2016 to 2017 community life survey which is was originally a cabinet office survey about sort of social action and volunteerism is now in the department of culture, media and sport and this is the original ABOS study in the UK but you can see I've printed the left and right hand side so you can see there's a nice logo for the study you've got a very short introduction what it's about incredibly vague because to be honest with random sampling I'd rather have a vague introduction to what the survey is about and something super specific that's trying to gauge people in it because I don't really want a sample full of people who are interested in the topic I want some people to have different motivations for actually taking part so there's a vague but friendly vague but friendly introduction then you've got underneath it the pass codes highlighted and the passwords to use and on the back of it though you don't see that here is lots of sort of frequently asked questions that sort of thing and it's in colour and importantly on the top left is HM Government and we don't tend to actually say the actual department's name we always try and use HM Government with a couple of exceptions those being the Home Office HMRC and one or two others that have equal impact to being written to by HM Government now we originally conceptualised a boss as a lower cost alternative to RDD which Pat mentioned random digit dialing in which the Ofcom's grand scheme of telephone numbers is used as a sample frame and numbers are randomly generated from that now it used to be that you just use landline numbers but now lots of people don't answer landlines and lots of people don't even plug the phones in and lots of people don't even have them at all so now you have to have mobile as well the problem is that mobile is tremendously expensive because every call is screen so it's incredibly difficult to actually get people to even answer the phone which means that interviewers are spending much longer there than the otherwise would and also the actual call costs themselves are greater there's masses of answer phones and all that sort of thing so it's no longer a particularly viable format and the most recent one we've done which was actually only last month found that the A-boss equivalence it was an A-boss on one side and a parallel RDD and the A-boss cost half as much and the RDD itself was about £50 per completed interview so it was no longer the fearful method it used to be but the advantage of A-boss is also that it's more flexible with regard to both the size and its size and shape than interview surveys so the biggest A-boss study currently in Britain has a sample size of I think about 170,000 annually which is certainly bigger than any telephone survey there is but also we've done about a dozen hyperlocal surveys so we've did a survey of people who just live around Liverpool's Anfield Stadium but also a number of other locations where we're effectively sampling a whole chunk of addresses getting a sample of 300 or 400 respondents from hyperlocal areas impossible to do with RDD hard enough to do with face-to-face interviews as well so there's some advantages there it's also unclustered and even if an online panel has ultimately been sourced by interviews that will have been clustered so there's some advantages there in terms of the precision of the data now the first test as I said was although we originally thought well we're sick of dual frame RDD we've got to have something else the first test we actually did was with a face-to-face interview survey Community Life was actually a success of a much longer running survey called the Citizenship Survey which is a face-to-face interview survey in 2012 in fact we ran it again as the first year of this and the response rate is about 60% it's about half an hour interview and so on now half an hour is quite long traditionally for online and 60% of course is very high response rate so it's going to be different from that and there's been a lot of parallel work lots of testing over the last few years that is now used as a sole data collection method for Community Life so it broadly passed the tests that were set for it over that period we've also used it for about six other clients most recently with the Financial Conduct Authority on some vast survey which was of unbelievable complexity about people's financial products that they have bought or not bought but also as I say Ipsos Moray uses a variant of it for Active Lies which is the vast sport survey that I mentioned before now I wrote an article about nearly a year ago now which was published in the SRA's social research practice volume which attempted to answer these seven questions and I'm going to try and go through each of those questions here so the first question was if the samples of addresses how do you convert it into a sample of individuals second was how do you verify that the data you get is in fact from the sampled individual or thirdly how do you cover offline individuals who are still a part of the population here fourth what response rate do you get what's the impact of design features that you've tested fifth how does response rate vary between subpopulations what if anything can you do about it sixth what evidence do you have for non-response bias on the actual substantive topics and number seven how much does it cost so I attempted to answer all those questions I've got more evidence now and I've actually brought more in now in today than I've actually got in that during an article so firstly how do you convert a sample of addresses into a sample of individuals well first of all there are a small number of addresses about 2% of addresses contained multiple households they're reducing in number and there's more of those in Scotland and in London than in other parts of the country and there's nothing much we can do about that because they haven't been identified by the Royal Mail they won't be identified by us so we accept a certain amount and it's whichever household picks up the letter but also of course we don't really know who lives in each household there is an electoral role but the one that's actually a bit of it that's available for commercial use only contains about 35% to 40% of households and it itself is dated in the sense that only really tells you about who was living there at a particular point when they updated their electoral role and some of that data can be out of date especially for certain groups so renters for instance it can often be out of date for that use of other database now we keep hearing about O&S building this master database of absolutely everybody but even if they eventually do that they'll probably extremely constrain who can use it and how so I don't think the creation of that frame will necessarily be the one that address based sampling goes out the window there's also as I say CACI experience lots of other groups who effectively mulch together masses of data do a huge amount of modelling and spit some stuff out for you at the end but although that can be helpful in certain circumstances it's not helpful for telling you exactly who lives in the household randomly sampling one person from within it's not accurate enough for that so we initially thought well we'll test the classic methods used in postal research which is kind of quasi random where you just ask for the person in the household who had the last birthday and maybe half of them you'd ask for the person who was in the next birthday or if there was some sort of seasonal bias for some reason or other so we use these kind of classic methods but we also put in little items so we could check whether the person who responded really was that person and we found quite substantial non-compliance so about a quarter of the respondents were the wrong respondents if you like and that given that there's some households only have one person it was actually about a one in three non-compliance rate so is that a problem? well we'll come on to that but it was certainly there was plenty of non-compliance and that's then tested almost at around the same time that the European Social Survey tested actually doing it properly so they'd have somebody would go anybody would go online fill out exactly the household composition and then the computer would select one person and that person if it wasn't that person who'd started it they'd be sent off to go and get the person who was supposed to actually do it who had been randomly sampled but they also included various items to check whether it was the right person in fact non-compliance at pretty much exactly the same rate as we found with the kind of quasi random birthday methods so we moved on from there and said well why don't we just get rid of sampling and we'll test all individual's methods so please everybody take part so then you know how it got this problem in the middle now this brings you first of all clustered data so it's an unclustered sample of addresses but if smaller one person responds to the household then that data is clustered by household now people who live in the safe household are more like each other than a sort of random pair picked from the population so that tends to reduce the statistical value of the data but in fact when you actually look at it across lots and lots of variables you find that the statistical reduction is quite similar to the statistical loss of value you get when you have to wait to compensate for randomly sampling one person in the household so the statistical arguments against it don't stack up particularly much secondly there are certainly lower printing and postage costs if you can get the increase in the average number of people responding per address there's also what we call risks of contamination these contamination risks affect all household services but I think they particularly affect them if you've got online data collection where there's no interviewer there who's somewhat governing how the data is collected on site so there is also for instance a risk of one person completing multiple surveys in order to pick up more incentives than they otherwise would have done which is a risk we want to look at and we've always worried about this to some extent but I've done a recent study which suggests that community life estimates at least are not systematically different under the all individuals design then if I was to magically be able to randomly sample one person without any non-compliance whatsoever so I tested this by the fact that the individuals who did respond will give information about the household and you can post hoc randomly sample one of those people and if you've got data from those people well they can go into the mix as it like, as it were so then you can see what difference does it make if I get this sort of sub-sample and wait it in just the same way as I would do with the data that I've actually got so as an example here what I've done here is created an effect size so the difference between this sub-sample that would be this random one everything's done perfectly post hoc sample and what we actually did which was actually ask everybody to take part and the effect size here these are all proportion estimates so the effect sizes are at least twice as big as the real differences between the two but you can see that by far the most common effect size is 1% and there's a few that 2% or 3% and one of the variables had a slightly bigger effect size and I think those of you who read the literature on effect sizes these would be considered extremely small and suggesting that in fact it doesn't really make very much difference whether you randomly sample one who would get everybody to take part or in fact I looked across lots of other different methods such as taking the youngest person or taking the first person who responded or the first two people who responded it doesn't make a massive amount of difference because you've constrained the set of households who can actually take part in the first place so for instance all the ones with one person households absolutely no difference what sampling method you use there of course but even in the two person households it's not a great deal you can random one or the other the first one is going to have a 50% chance of being the same one as a random one so you wouldn't expect masses of difference between them but the all individuals method appears to be the one that gets closest to the perfect random sampling within one household but this is still very much a kind of live issue O&S is developing some work and they will see I think Ipsos as well so in which you use a single username or passcode in which any individual in the household can use that person completes the household composition data and then that same username or passcode can be used by everybody else but who can use it and how many is constrained by the original household composition that person the first person has plugged in or alternatively you could generate new usernames or passcodes based on that data from that first person which has some advantages of data security I think over reusing the same passcode and there are lots of options in which you could use that so you could say well the first respondent notes all these new codes down and passes them on very low-fi and I believe that's kind of what some tests have used the second option is that the first respondent gives email or mobile contact data for these individuals and then the research agency can send a link directly to these individuals but again you've got to collect a lot of information from one person it's likely to be a lot of non-compliance with that and the third option is that as long as you've got the name of the person of these individuals you can post new letters out with new codes and they can go and complete the survey so there's lots of things still to be done but as they say it probably doesn't impact very much on any of the survey estimates themselves but with random sampling it's often as much to show that you've done as much as you can to meet the sort of theoretical requirements as much as the practical impact so question two how do you verify that data is from the sampled individuals well first of all any verification of self-completion survey data has to be proportionate if you are spending masses of money verifying this data then one of the reasons for collecting data in that way in the first place is lost but you do have far less control than with interview surveys so we kind of use a mixture of three strategies so the first is actually just stressing as part of the questionnaire the importance of data validity explaining also that the year as an agency will be using verification processes even if you don't describe exactly what those processes are to make it sort of on the one hand ask people to be honest just an open request for honesty but also say but we do do some checking so kind of alternative motivations there strategy two is very much as I've described which is you could control the release of usernames and passcodes within each sampled household strategy three which we rely on quite heavily is a constructed algorithm to weed out completed questionnaires that fail on set criteria which are usually multiple criteria so we use a red flag system so that one flag doesn't normally mean that we'll get rid of the data but it'll make us want to inspect the data a bit more and multiple flags we tend to use a threshold approach where you're out if it's kind of a three strike so you're out sort of process but some of the signals of bad data are treated as stronger signals of invalidity than others but this isn't really tested in the sense of saying does this algorithm actually identify all the bad data or not we did attempt to do this by having trying to re-contact people but those that we thought had bad data were extremely unlikely to have given their details for us to actually re-contact people to actually check that so there's a risk there that in fact your algorithm while it's very clever isn't actually doing quite the job that is intended to just as a guide we tend to exclude about 5% of the completed questionnaires for community life but different clients have had different sort of thresholds if you like so that's range between 2 and 10% on other surveys depending on the edit criteria but we can generally live with that so we'll just do a larger sample to accommodate that and it certainly as I say a live question whether the combination of these strategies is sufficient even if it is in fact proportionate so in other words what is the risk to inference of including bad data through failing to identify that third question how do you cover offline individuals based on data which I slot together yesterday which is 2015-16 it looks like about 13% of the adult population has not used the internet in the last 12 months so they are effectively not covered by an online survey method or at least you'd think so but in fact you look at the internet usage data from a boss surveys you'll find about 1% or 2% who've never used the internet before so this will be their first go at the internet is actually to complete the questionnaire that group does appear in small numbers and they of course play no part in online panels but you'll be surprised how many people will do that their first time all the people who will have somebody helping them a younger person helping them for the most part but anyway 13% haven't used in the last 12 months this is shrinking fairly quickly not because that group is changing its behaviour particularly but simply because they're becoming a smaller part of the population but they are particularly distinctive with respect to birth cohort and educational level I'll show you a chart to demonstrate that in a moment and government studies can't usually miss them out and government studies can't usually miss out any minority group for the most part or at least not clearly discriminated against them so you do need to find some way of including them so here's an example here I don't know if you can see that question about using the internet at home or elsewhere in the last 12 months it's almost universal amongst the under 45 population but not universal amongst the older population and here I've divided up those who have any educational qualification it's a fairly substantial subset that doesn't have any educational qualifications so you can see that there are some big internet usage differences and for say those age 75 plus even with educational qualifications it's only a little over half of use the internet in the last 12 months and those without educational qualifications which is half of them it's only about 20% so there's a fair non coverage bias risk there so most ABOS studies use a paper option as the alternative which is usually available on request right from the first letter it will tell you about how you can get hold of one but it's also used in reminder packs usually in the second reminder second and final reminder and usually we don't include paper questionnaires every one of the second reminder we tend to apply it selectively so that certain strata that don't respond as well online as others tend to get the paper questionnaires in the reminder packs whereas other generally more affluent areas we don't tend to include those in there now paper brings in different demographic types certainly we can certainly see that but it won't necessarily improve all sample balance if you actually wanted to say right well I'll just wait I've got all this data now and I'm going to wait it the waiting efficiency which is effectively how much of you lost as a result of waiting of having to wait the data isn't very different whether you've included these paper questionnaires or you haven't included them so though they bring in lots of different people the balance isn't necessarily much better and we haven't found a way to make that perfect or anywhere close to it also a paper option enforces simple questionnaire design or a paper version that's limited more or less to the headline measures so we often produce paper versions where they'll often be filters and filters and filters in an electronic questionnaire and we have to cut there now some questionnaires of course have got so much filtering in that you just can't do by paper and others you can just about get away with it because you can get about 80% of the data without too much filtering so it's suitable for that for the financial life survey that we did for the FCA the most complicated question I've ever seen in my life no chance at all of doing that so in that example we did an alternative which was a parallel interview survey limited to the offline people under 70 and people who are aged over 70 because people aged over 70 aren't brilliantly represented in the A-boss sample 60-69 is fine but 70 plus not so good so we did a parallel face-to-face interview survey to pick up these groups you might have said why don't you just follow up some of them and non-responders and do interviews there but that would have enforced that the whole sample became clustered or at least a very large chunk of the sample had to be clustered and we preferred not to do that it's also logistically the two surveys could run in parallel rather than the face-to-face having to be stuck onto the end of the online survey period so we had quite a constrained fieldwork period so that's one of the reasons for that so it's a parallel interview survey now the impact of including all this alternative mode data on the actual total population estimates are not always substantial but subpopulations it can make quite a big difference it's just that unfortunately often your subpopulation sample sizes aren't quite large enough to detect the effect and tell whether it's actually large or not but as an example of total population estimates here this graph the x-axis shows a proportionate estimate from A-boss with both modes going into online and paper and up the y-axis it shows it with just online excluding the 30% of completions that were on paper they were both weighted to the same set of population totals so they had that in common but as you can see that the the introduction of paper well we've got an R squared here of 0.998 so it wasn't really making very much difference all of that paper data and it actually costs quite a bit so in a sense you might say well all the paper data was really just for show but this survey only had a sample of 2,000 so it might well have been improving various subpopulation representations but you can tell because it wasn't really being used for that it was being used for largely total population estimation so what response rate does the A-boss method get very much depends on the sponsor the topic, the design features so on so our worst one was 7% and our highest is 25% but we could have got higher than that there's a big range there I've seen some very recent O&S data similar sorts of surveys much shorter though which have got over 30% on some conditions the sample profile quality seems almost unrelated to the response rate itself so by far our best had a 9% response rate but the response probability seems to be more or less random and the population just fell out 8 different dimensions almost exactly as it should have done so that was a 9% response rate survey 25% isn't as good as that but as I say that's a fairly well known feature of survey research certainly conditional incentives increase the response rate and they sometimes pay for themselves through lower printing and postage costs cos you don't have to sample as bigger number of addresses if your response rate is going to be higher about a 3% or 4% difference based per 5 quid that we put into the pot but there is a bit of diminishing returns so 10 quid isn't quite double that and 20 quid certainly isn't I know O&S and we have ourselves tested unconditional incentives which are very popular in America but the cost of these when you're sending it out to all the addresses when the response rate is actually quite low cost of quite a small rate is about the same as quite a large conditional response rate so as expected in experiments it doesn't usually beat the conditional response rate conditional incentive in terms of getting response rate per unit of cost unit of expenditure adding paper questionnaires one of the reminders is an even stronger incentive so though you think everybody's doing everything online the paper questionnaires are not just people who don't do things online or don't want to do things online it actually brings in lots of people who just prefer to do it that way than online and if you put paper questionnaires in every second reminder you would get half the completes would be paper so online is actually people don't want to do it as much give them one chance paper and they'll take it so because paper data is more as I said quite constrained in what it includes we don't really want to do that we want to push them to web but we have to include them in some ways if we want a higher response rate as well as including some of these offline individuals Government sponsor incredibly good needs to be clear on the envelope so people actually open it I thought we did this recently in fact the BBC has adopted this method for its main audience tracker and when we tested it I thought no better than a barely known Government agency so which I would count the financial conduct authority as a barely known Government agency we've also done for the competition and markets authority and so on they all did better than the BBC so I was quite surprised at that so a Government sponsor is the one if you can't get one try and do your best to get one our current focus is on trying to limit variance response rates between strata because the response rate is never going to be very high but the next best thing you can do is make sure that it's pretty even across different strata so you're hoping that the correlation between response probability and the things you're measuring won't be that strong so that's why we do that's why we target reminder packs that include paper questionnaires in particular types of addresses generally the more deprived areas tend to get this the affluent areas don't vary the number of reminders that's a good tool there's some evidence for varying the type of reminder but we've recently been burned on a postcard where we sent some postcards out and they did absolutely nothing at all so that was a good use of money letters seem to be more effective than postcards but it's quite new you might see some examples other than that varying the messaging strategy between ways so after somebody hasn't responded once then hasn't responded twice should you say there are different things in your letter well we've tried that we've not really detected any effect based on what we have written down I think that's mainly because people don't really read very much and they kind of read to the important parts and we have to include all this information because it's almost statutory requirement of doing surveys for government or academia but in fact people are just looking down go £10 that's a site there's a post pass codes and all the rest of the stuff that we might write is not being picked up that much so potentially we expect very large effects and our experiments aren't really large enough to detect that those effects are there just a quick example here of how we use paper questionnaires here so the community life survey here is five strata based on the index of multiple deprivation on the 20% most deprived areas to the least deprived 20% as I say we put paper questionnaires in all of the more deprived areas a random sample of the middle ones and none at all in the others on the right hand column you can see what the person level response rate is web only and on the left hand when you add in both modes so you can see here that we managed it almost even it up so the response rate in the most deprived areas was 19% at the least deprived areas was 24% so it's not a massive amount of variation but the range was 10 to 23% just on the online data which requires a bit more waiting to do so five how does response rate vary between subpopulations well we've done lots of these surveys and they are all slightly different in terms of what comes through but a couple of things that clearly come through is that online response is more educated than average certainly percentage with degrees will be 8 or 9 percentage points higher than the population they're much less likely to rent they are as I say online response rate is lower in the deprived areas there are also lower response in areas with lots of flats that aren't necessarily deprived paper questionnaires tend to bring in more people aged 60 plus as you'd expect and especially those aged 75 plus who are not particularly well represented in the ABOS samples we also think they probably bring in under 60s who have long term illnesses or disabilities and or live in social rented accommodation paper questionnaires tend to be helpful bring in those groups in as I say the sample balance measured by population percentages might not be brilliant but effectively you're bringing all these people in you've got the raw material for waiting the data to those population totals whereas sometimes you just don't have enough cases from groups to actually be able to do that overall these profiles are less accurate than face to face interview profiles which are perhaps saying tend to have response rates of 50 through to 70% so perhaps it's not unexpected that the profiles won't be as accurate as that they're quite similar to dual frame RDD which as I say was our original sort of rival model as I say have I said this maybe critical to survey sufficient numbers of each type within the population because you're going to use multi-dimensional post stratification I tend to think with low response rate surveys that non-response bias is usually on some degree non-coverage bias is usually the major problems and you can afford to weight the data much more radically than you would with face to face data where you've spent an awful lot for every single case here you've not spent so much for each case you've got bigger samples and you can sort of shift them around a bit and I think on balance it's better to do that than to be too cautious about waiting and also the design effects due to waiting are fairly modest I don't know if the concept of design weights is that design effects is that common but I'll show you some examples in a minute but generally waiting seems to reduce the affected sample size is about 70% or so of the actual sample size so it's not that much of a reduction and that suggests that there's a reasonable raw sample balance even if it's not perfect and it's certainly I think worth assessing the waiting efficiency for each marginal subpopulation in the waiting matrix because then you can see for a subgroup say 16 to 24 year olds how much messing around have I had to do in order to get that group as aligned as possible to population totals and the less messing around you've had to do the more confident I would say I'd be about that sample for other things that I don't have population totals for so example here I suspect you won't see the percentages here but this is the waiting efficiency which is the effective sample size expressed as a proportion of the actual sample size and this is for community life 22% response rate paper questionnaires uses secondary mode overall efficiency of about 80% but efficiencies of over 90% for under 30s also again for over 70s so it's actually a little bit weaker but still in the middle high 80s for all of the middle age groups that's because age is by far the biggest discriminator in terms of response so once you're actually within an age group everything looks actually okay there's not a lot of messing around actually required to do there's quite a substantial amount of waiting here so gender by age group, highest qualification by age group housing tenure, number of people in the household region, ethnic group the fact that waiting efficiency is high is not really a product of simply not waiting it enough so obviously if I didn't wait at all the waiting efficiency would be 100% but I'd have a lot more bias in my data I hope one assumes I've mentioned it a few times a financial life survey here so this was 7% ABOS online response rate in face-to-face interviews used as a parallel mode for a people aged 70 plus and offline who are younger and here the waiting efficiency is more like 70% so I had to do a bit more to it the waiting variable is actually slightly more comprehensive but not particularly much but here is a slightly different approach where the best the least amount of waiting was done for the middle age groups and the most amount for the older age groups and the youngest so it was actually the reverse of community life so two ABOS surveys aren't exactly the same I'm finding from that that brings me on to almost the end, non-response bias itself most survey data like benchmarks which makes non-demographic bias pretty hard to quantify and for some variables of course interviews and self-completion questionnaires would produce very different distributions even if exactly the same sample completed both especially if you're using scales response scales agree, disagree or any other type of response scale as well, you tend to find self-completion people tend to clump more into the middle and be a bit more negative about whatever it is that you're asking them about that interview service they aren't necessarily brilliant benchmarks if you're using a self-completion questionnaire rather than an interview now the three year parallel run with the community life provided an opportunity for us to assess whole system effects which is simply the difference between the A-boss result and the interview result and they were a variable size, these system effects but they were sometimes large especially when rating scales were used and one of the things we did was we did a kind of A-boss style follow-up of interview respondents and asked them to complete it online waited it as well as we could and then compared that to the A-boss data itself and that suggested to us that most but probably not all of the system effects were due to measurement differences between these two modes rather than because one survey had a response rate of 60%, the other one had a response rate of 20% or so and differences in sample characteristics that follow from that so that sort of study is quite valuable when considering a new data collection system because in this case the client could accept some measurement effects but didn't really particularly want to deal with selection effects which would naturally suggest that the A-boss sample was worse than the interview sample they were replacing it with but of course there's lots of ways of analysing that data and NTRM themselves are going to re-analyse the data I used there and use different assumptions and see how robust that finding is to changing the specification but just as an example here if we just map the system effects which I think are on the Y axis against the measurement effects on the X axis we can see here that by all large most of the measurement effects are very similar to the system effects and there's an R squared here 0.77 a certain amount of scatter around it which suggests there are other effects as more than sampling are the Y axis is the actual difference between the A-boss result and the interview result for all the results in the survey because each result is a dot here and the X axis is our calculation of what was due to the fact that there were different modes used and not due to the fact that there were different samples that's the system effects yes as I say on the top system effects there measurement effects across the on the X axis but basically they're clustered around the line sufficiently that one might think that measurement effects are the primary cause just for completeness here are the selection effects calculated from the same study and although there is a correlation the R squared here is about 0.13 it's quite small and I think if you took away the red line which is the line of best fit you'd probably think well it just seems to be scattered about a bit and the largest selection effect is about 5 or 6 percentage points there's the largest measurement effect about 15 percentage points so from this I kind of drew the conclusion that generally speaking this was all due to measurement and not due to selection and very final question how much does it cost well obviously the specific combination of design features that's adopted will influence cost but for general population surveys it seems to be about 50 to 70% of the cost of an RDD survey and about 15 to 20% of a face to face interview survey done random but they're far more expensive than using convenience examples so I think Pat showed you gov have 5 quid per complete which seems extremely cheap to me I'm sure we put higher prices than that but still this is several times more expensive than that so you are spending money on getting random sampling there are certain things to do that panels can't do so you can't target a hyperlocal area with a panel because panels have got people spread all over the place and so on but mostly the panels have a little bit greater flexibility you can do surveys specifically of subpopulations but here of course you're effectively paying for people who aren't in the subpopulation as much as you're paying for those within the subpopulation unless you manage to do some fairly clever work with the sample design community life now incorporates an ethnic minority population boost purely through applying different sampling fractions and different strata with those strata defined by the ethnic mix according to the 2011 census and I didn't think that was quite going to work but actually it worked almost completely bang on so it is possible to do some of those if your population is reasonably well clustered and identifiable in the census and there's also some potential to use experience and data to improve efficiency but it's pretty marginal it's best for particular age groups not for any other subpopulation and finally piloting is essential due to the wide variation in response rates and that's as much to work out the cost as anything else I don't think we're quite at the point where we say it's going to be this percentage we're kind of guessing to within a couple of percentage points now absolutely absolutely on it