 how we do in RailsConf. Thank you guys for coming to join. I know there's a lot of other great sessions right now, so I appreciate you being here with me. My name is Andy Glass. I'm a Brooklyn-based Rubyist entrepreneur. Third of my time, I'm a Nomad, I'm a Maker, I'm a US-open ball person, and I'm also an adjudicator for Guinness World Records. So I'm perfect for the unusual Rails app track. But I'm here to talk about how to create a human-powered API with Ruby on Rails and Mechanical Turk. So first, why are we talking about M-Turk? Who has heard of M-Turk? Show of hands. Nice, a lot of people. And who's actually used M-Turk? Okay, a few people. Come on in, come on in. Cool, so yeah, so I actually briefly ran a company which failed pretty miserably, that built custom APIs off of Turk to clean data, and I thought it was an interesting enough experience that I wanted to create a talk about it. The gist of it is that you can integrate with M-Turk, which is a scalable 24-7, always-available workforce. There's a little bit of controversy around Mechanical Turk which we'll get to. But first, I want to say thanks Rails. I owe everything in my career to you, not any of you personally, maybe you. But definitely the Rails community at large, I'm so thankful to be a part of. It's an honor to speak at RailsConf for the first time. I'm thankful every day that I create software off the backs of giants. And being a programmer has given me the financial freedom to live an unusual life. But more importantly, I think it's giving me the confidence to pursue unusual pursuits. And that's what being a Rails developer taught me is that anything can be done. So what do I owe to you guys? I think I owe you a little unusual. I think I owe you, based on this being the unusual Rails apps tracked, I think it's appropriate, your time is valuable and you're spending it with me in this room. After telling you this, you probably won't leave, although feel free to leave if you want to leave. But maybe it's not a talk about Mechanical Turk. Yeah, it still is talk about Mechanical Turk, we will get there. Maybe it's not a talk about crowdsourcing, although it kind of is. Maybe it is just a talk about being an imposter. And I know a bunch of people so far at RailsConf have talked about imposter syndrome. I struggle with it. I don't feel like I'm a good enough developer. Many days I feel like a shitty Rails developer. I don't deserve to share this program with people from Rails core and GitHub and Heroku. I tried to print my company logo on a t-shirt so I feel like I'd be long, came out like shit so I didn't wear it. But I think we're in good company. According to the Wikipedia page for imposter syndrome, these people suffer from imposter syndrome. And there's talks like, is it bad? Is it good? Yeah, I think it's good. This was a pretty cool article I checked out that you should read. Why imposter syndrome is good for you. And I thought this was a pretty money quote. If you're interested in personal growth and development, by definition you are always going to be pushing yourself into something which is new. And when things are new, of course, we don't feel as comfortable in our own skin as when we were doing something which is deeply familiar to us, which we've been doing for five or 10 years. So yeah, it's about growth. I think you need to fake it. So what does that mean? I think that actually what the article also posits is that it's imposter experience. It's not imposter syndrome. This is something that we should expect to experience in our lives if we're gonna push ourselves. It's something that we should embrace for. It's something that we should think about what our personal response is to. And I think realizing that we belong, our success is not a mistake, and we're exactly where we need to be is what we need to realize. So it's not a talk about mechanical Turk. It's a talk about learning to fake shit. We're gonna build a human-powered API, but at the end of the day it is really just an API. And yeah, our mechanical Turk is at its core about faking shit. Cool, so now let's talk about Turk. So how did it start? It followed the path of a few other AWS products where it was like an internal tool at first and then they sort of made it open to the world. Basically it's a marketplace for online micro jobs. So requesters can post tasks at different price points and then workers can complete them. Does anyone know how Mechanical Turk got its name? Few people. It is named after the Turk which was an 18th century chess-playing robot. Of course it was not an actual robot. As you can see there's like a little area inside that there was a person that was actually mechanically controlling the Turk. So yeah, it was a fucking hoax. I told you this was about faking shit from the beginning. So what can we use M-Turk for actually? The four main use cases kind of go into four buckets. Image video processing, data verification and cleanup, information gathering and data processing. When I was using Turk I was doing a lot of like analyzing or dividing up video. A lot of data validation for leads, lead generation and lead enrichment also. So providing business data. So I would give a worker the address of a salon and I would say figure out if the salon offers X service and also figure out how much it costs for X service. Turk is like widely used nowadays for a lot of random things like it's cited in hundreds of academic journals. It's been used to analyze satellite data over like big swatches of area that want human views. It's used a lot of time for training, machine learning algorithms. It is kind of an interesting thing because like a lot of the stuff that's on Turk is you can use with, you can complete with AI. So it's a little bit of a regressive technology in some ways that I would say that like people are doing stuff that can be done with AI but it's interesting. So let's make some money as a Turk worker just to give you a basic kind of overview of what the Turk sort of environment looks like. These are kind of just like some rough estimates I got from just checking it over the past few weeks. 1500 groups of hits, a hit which you'll see me reference a few times is a human intelligence task. So that means that 1500 kind of like broad stroke questions and then 300,000 individual assignments. So each broad stroke question might have an individual assignment. For example, broad stroke question, does this saline offer a microdermabrasion and then each assignment being a specific salon? The most hits in a single group was 15,000 meaning that in that example, one person would be requesting 15,000 salons. That's just an example. The lowest reward was a penny which is crazy. I don't know why anyone would do that. And the highest reward was 150 bucks if you were a verified worker for this task to transcribe two hours of audio. So I took some screen shots of some stuff that I saw on Turk. This was kind of interesting. This was Turkers tracking when finger spelling was occurring. So they actually used this little widget up here to start and stop a timer when the finger spelling occurred and didn't occur. Maybe they were doing this to get that information and then process it elsewhere or something like that. This was spammy that I thought was funny that I wanted to show. This was someone saying, sign up with my Robinhood referral link and I'll pay you a dollar or whatever and then they were making more money off the referral. I think this is banned or should be banned by Amazon. This one was interesting. I don't know if you could see the things that came out but this person was having people like box different fashion items basically on each person. So using it to sort of pinpoint parts of an image. And this is a task that I see so much on Turk that I am so confused how AI is not able to do this yet. It's basically saying extract data from a shopping receipt. If you come up with extract data from a shopping receipt.com and you can figure out what to do with AI, I bet you can make a lot of money. So what should we use it for today? Let's go through a use case. I spent way too much time thinking about what we were gonna do for this talk. I put a few requirements together. I don't want it to be too simple that we would have some people being like, oh yeah but AI can do that even though maybe I didn't succeed in that we'll see. I want it to be social media content because I thought if we're gonna build an API we should definitely have a scraper kind of like feed the data in constantly. So I think that'd be a good use case for this. I wanted to be Pittsburgh centric because we're in Pittsburgh and I wanted to make y'all excited for happy hour because I know that it's coming up next. So I came up with, does this sandwich have french fries in it? Pittsburgh is known for having french fries and sandwiches, quick show of hands. Who is vehemently against this practice? One person, no that's chill. And everyone else is down with it? Alrighty, so it was between this or identifying different types of bridges. So thought this would be more fun. So let's go through the process of putting this in without Ruby first, without Rails. What we're gonna do, we're gonna assemble some sample data, some Instagram posts. We're gonna create a new mechanical Turk project. We're gonna load up the batch and then we're gonna review the results. So this was how we started doing it with the Turk GUI. We had a title, look at a picture of a delicious sandwich and determine if there are french fries in it. Description, you'll be provided with an image of a sandwich, determine if there are french fries in it. Next up, we set a few different properties on it. How much money we're gonna pay per assignment. One thing that's kind of important here that we'll get to later also is the number of assignments per hit. So that's saying that we want each sandwich to go to two people and we'll have each, we'll have two people kind of look at it and validate and corroborate their results. And then some of the time allotted. So we're gonna use a GUI. So this was like their categorization template. They have a bunch of different templates. If you guys end up doing this, I would recommend before even building anything just like throw some shit up on Turk just to test it out. And they have a few templates that are pretty, they're pretty easy to do that. And then this was the kind of layout that I made. You'll see very explicit instructions if you can read it. Do not count the side of the sandwich. There's fries on the side. We only care if it's in the sandwich. And also I said pay extra caution to if the sandwich is cut in half because that would be like a cross section of a fry. So like we wanted to make sure that workers got the eagle eye for that. We made the template with the Instagram post embedded. That was easy enough to do just interpolating the image URL in the view. Uploaded a CSV in the batch and we were good to go. This is much of a cost. So I don't need a 21 sandwiches to per sandwich and then 15 cents per plus mechanical Kirk takes a fee on top. I loaded that in. I played the new Cardi B album for 23 minutes, which is lit. I recommend everyone listen to it. And then 23 minutes later, it was finished. Speed is dependent on a few different things which I will get to later in the talk. But in 23 minutes, this one was done. And this is what the output looks like. So a few columns, one is the hit ID. So that is gonna be for each item. Then the assignment ID, that's for each sandwich, essentially each item. The assignment ID is for each individual assignment. So two per thing. The worker ID is the worker that worked on it. How long they took to work on it. The input of it and then also the answer put it. So you'll see it like input dot answer dot. So whatever input outputs you have, it'll like preface it with answer. We see little stats on this. Anywhere from four to 700 seconds. The time figures are actually not so helpful on the high end because people just leave it open. The median time was 42 seconds. It's also interesting to see that a lot of workers worked on more than one of my assignment. So people are like, oh, there's an easy task. They're like, I'll crank out 10 of them or whatever. The max someone could have done would have been 21 because they wouldn't have been able to do two of the same one. And these are the results. We actually had pretty good consensus on this, which was cool, like consensus being both workers agreeing that there either was french fries in the sandwich or not. The accuracy was also pretty good. I will show you the edge case later, but these were correctly identified. See some cross section fries might have slipped through the cracks. These were correctly identified as no fries, which I was happy about. Maybe like the chips would have been, I don't know. And this was the edge case. They said there was no fries in this. And at first glance, it does look like there's no fries in this, but if we look deeper, there's some french fries in there. So that one would have definitely slipped through the cracks. There are some tips and tricks to getting really accurate results on Chirk. I'm gonna get into that at the end, but we are software developers. We're not here to use CSV. I saw a few cringe when I said upload a CSV batch. So let's automate this process. Let's say that we already have a scraper for Twitter and Instagram. Not gonna touch that stuff, let's assume we have it. And we're gonna push any posts with hashtag sandwich and hashtag Pittsburgh to our API. And then our API will process it through Chirk and then we'll post it out for another API to read. So how are we gonna do this? First thing we're gonna do is we're gonna create the Ruby on Rails service for M-Turk. We're gonna create a process for loading the task in. We're gonna create a process for approving the results and re-inputting the tasks as needed. And then we're also gonna serve our results via an API. So of course, we gotta give credit to the backs of the people that we build. I use these two gems to build this project. They're really great gems. They're super helpful. The first is Turkey by Jim Jones. It's built on top of R-Turk, which is the bottom gem. It really helps out with a lot of database models and easily converting our forms and for giving it some tasks for launching and importing our Turk tasks. And that was built on top of R-Turk by Ryan Tate, which is kind of like just the more simple mechanical Turk Ruby layer. Admittedly, these are not optimized for Rails 5. My old app was on Rails 4. R-Turk was built on top of an Amazon gem which is now deprecated, but that's kind of the state of them. They're still really great. So this is our basic data model. Let's just say right now a batch is an overall task. So a batch does the sandwich have French fries or not. The output field names, which we'll use, is category and then the output field options will be the options of what the selectable things in that select box are. A batch item is gonna be each item. So each sandwich would be a batch item. And then each one of those will have a result. So I know it was a little small. I tried to not make it too dense of code, but we'll create the batch with a name, a title, a description, some instructions, the output field name on it, the output field options on it. And then also we'll take some post IDs and we'll create a batch item for each post. Then we're gonna bring in Turkey. So Rails G Turkey loads it right in. That's gonna do two things. The first is gonna create a pretty simple config file that we'll fill up with some AWS credentials. The second thing it's gonna do is it's gonna create two database models for us. So our schema is gonna look a little bit like this. The first thing it's gonna create is a turkey task. So a turkey task corresponds with each task that gets put into Turk. So I think every CSV in that input file that we loaded. And then each turkey imported assignment, excuse me, will be each result that comes back out. So I think two of those per item. Now also the turkey imported assignment does not actually store the result data. That needs to be connected to something else. So that's actually connected to our batch item result. Cool, so now we're just gonna do batch, batch.batchitem.eachlaunch. This is within the launch function of a batch item. We're just gonna set some pretty simple variables that are gonna get pushed to Turk. I'll highlight some of the important parts. The first thing which is pretty critical with using turkey is that you need to actually specify the model that's gonna be created when we pull in the data from the form. So here we're gonna say it's a batch item result. Also the number of assignments, we talked about multiple times, two workers on each assignment. We'll set some qualifications. So we want only workers with approval rates greater than 95%. And then also an important thing is the form URL. So we'll say that this will be our form URL with the ID in it for that batch item that will be iframed into Turk. Then we're gonna use this method from turkey to post the Turk. It takes on all of these arguments individually, works. We just pass all that stuff in. Then we're gonna have an iframed form. So this lives at the URL that we specified in that form URL that we passed to turkey. The important part to note here is that we have this turkey form form method. And again, this is one of the things that turkey helps with. It helps with the database modeling and it helps with the forms. So the forms are cool. The form will set the form URL automatically to mturk.com slash external submit. And it will also add some hidden fields. So this will let our form mimic a regular Rails form, but in actuality it's not. In actuality, it's submitting it to mechanical Turk. We don't get the data right in our server then. We actually have to import the tasks afterwards. So that's a pretty important point to note. And that's one of the things that makes this a pretty helpful gem to use. The next thing, importing the result data. So here we're gonna use another turkey process. We're gonna create, that's gonna create the imported assignment records. And then it's also gonna create our batch response records. Just so you see what that looks like. Again, here's a turkey imported assignment. It has the assignment ID from Turk. It has the worker ID from Turk. It has the task ID that it's associated with. And it also has a result ID which corresponds to our model that we specified, which is a batch item result in this case. And then that, I'm just storing all the results in a JSON hash. So if we revisited our original schema, which I wanted to start with just like batch, batch item and batch item results, then add these in. But it actually could be this. It actually could be like batch, batch item. Each batch item corresponds to a turkey task. Which again, that goes into Turk. The turkey task has many assignments on it, two per or maybe more. And then each assignment is connected to a result. And then batch items can probably be better off getting the results that way. I think, I think like how I had it was duplicative obviously with like a few extra things, but not really that important. Just something I wanted to bring up if you ever are making up your own schema. So the other thing we need to do is we actually need to process and confirm that this data goes through. So remember each item has two responses. So this is kind of like a basic flow we can use for that. We have our batch item with completed results. We're gonna push it to our in adjudicator model. And we're really gonna say the batch item is complete. It works, or we're gonna reprocess it in Turk. So this was like, this is part of the adjudic, this is batch item process. We're gonna take a new adjudicator, pass it to batch item. We're gonna say if the adjudicator approves it, we're gonna update the attributes to be complete. Or we're gonna queue for reprocessing. And queue for reprocessing will actually send it back to Turk for two more opinions. So this is one of the cool things that we can do with the flow. If we had one person said there's french fries in there and one person said there weren't, we can actually send it back to Turk for two more opinions and see what we get back. This is a little meaty, but it's a basic adjudicator model. Basically, we're just making a flat array with the results and then I'm making a histogram and I'm saying if the results, if more than 50% of the people agree, approve it, if not, disapprove it. That's pretty much it. And you can see that here, like adjudicator.new, if approved, give me the result and give me the confidence layer level. So that's pretty much it. Then we just have a rake task and the rake task just says process the hits from Turkey task, import them into us and then take every batch item that is incomplete with an incomplete status and has all the results in and then process them. So that'll look at it, it'll take all the results, it'll flatten them up, it'll say do we approve the results? We're not. And then that's it. And we'll just throw that in a rake task, we'll keep it on a cron job every five minutes and that's it. So then we can serve our batched items via an API, which is great. So however you want to thin that up with an API, you can do that. So it is lit, sweet. So the thing that we need to do though is like I gave you guys at first like a basic use case, right? So like our schema, our data model would probably only be good for like one use case, the sandwich spotter. But what we really wanna do is make it like extensible so it could be like any use cases. You know, maybe this is a part of your app that only needs to do one thing. That's okay, but if you actually wanted to build a robust Turk app, it would probably need to do a lot of different shit. So how can we extend it? So a few things that my Turk app did was it had multiple batch items in a Turkey task. So each task would have three sandwiches to look at instead of just one. That's helpful for like some of the pricing on it and some of the volume things. A more complex reprocessing flow. So let's say I was looking for data on businesses and I was asking for like name, address, phone number, email. And I have confirmed phone number and email but I need the name and address reconfirmed. I don't really need to send all that stuff back to Turk. So we can work on a complex flow for sending what we need back and keeping the rest. The thing that I wanna focus on right now is the multiple inputs and outputs for batch items and having different output types. So we might wanna have different inputs for a model, right? Instead of just like, again, a post ID from Instagram which we had right in our schema, which was silly. We can actually take like a name or an email or like phone number, website, whatever. We also might wanna ask for different outputs. So we can ask for yes, no, did I have different french fries? Or we can say like, did it look delicious? Yes, no, maybe, different outputs. Or we can say like, how many french fries did it have? We want a number. So I think the ability to have multiple different inputs and outputs on each batch item is key. Another thing that's key is actually having different output types. So that would affect how we are gonna adjudicate or determine the success of the results, right? So for that sample, yes, like looking at the histogram and saying, oh, above 50% of people agreed, that would work. But what if we're saying like number of french fries in a sandwich, one person said nine, one person said eight, one person said seven. We should maybe, depending on the use case, be cool with taking eight as an accepted result there, right? So for again, we have batches and batch items. We're gonna kill a few fields on those things. So none of those output fields on batch, none of those input fields on a batch item and no result and confidence on batch item. And then we're gonna change to this. So we're gonna have a batch input and batch output method, both of which are one to many off batch. Batch input has batch ID, a key, a format and settings. And then a batch output has a batch ID, a key format and then some display settings and then also some adjudicator settings. So we can figure out how we wanna display it and then also how we wanna confirm it on the backend when we're processing it. So this is how we can create it now. Batch.create, we'll create the inputs on it. So this one will be a format of an Instagram post and you'll have a key of sandwich ID so that what we'll be looking for in the batch input. So then on the batch, excuse me, in the batch item, so the batch item will create it and the input data, which we're just gonna use a Postgres JSON hash, it'll just have a sandwich ID on it. It's sandwich IG as the key, excuse me. And then when we render that batch item, we know to look for that batch input. We know that's an Instagram post so that'll render that template and then we'll know to look for that key in the input data and that's how we'll have that. The next thing we're gonna do is the output. So again, this was has fries, this is what we had before and if we look at it, this is going to how we're gonna display it. So it's a format of categories, which would be like some radio buttons or something just to display, we'll have some display settings, yes and no, and then with the labels for them, then we'll also have a different specified adjudicator for it. So we'll say, okay, this one's like a mode one so let's look at the frequency of it and then on that let's have an acceptance criteria of 0.5. So again, over half a people to accept it. Next we'll say, let's do a fry count. So we'll say, if yes, how many fries? So this is gonna be a format of a counter. So we're gonna say, okay, like our app has like a different sort of like counter view. It's got a min and max of 50 in the view and then we'll accept anything with a variance of 20. So of 20%, excuse me. So we can have definitely a lot of different input formats. The app that I created had a bunch of these different input formats, business listing information, which was super helpful because you might have a name, address, email, you might only have a name and phone number, et cetera. An image type, so obviously a lot of image stuff is going on, social posts, Twilio bed, which I think is really cool because we did phone calls right from Turk. And then video, so you can actually do like video, like the finger signing one before. You can do like some JS helpers for like splicing the video, et cetera. Output format, you can do different formats. So like text, obviously we can have different validators for phone, email, websites, et cetera. Radio buttons, multi-select categories, numbers, data. This was some of the adjudicator types we had, so single text outputs, multiple text outputs, multiple text outputs to get kind of tricky. So let's say we were like, oh, like tag this image. Combining the results of multiple workers' information and data, you have to have, it's not too complex, but you have to kind of just figure out like what different options you want for logic there. Number, we talked about. Tips for Turk accuracy. So I think this is like actually pretty practical advice if you are looking to rock some things in Turk. Again, like some of these are basic UX instructions, right? Like provide explicit instructions, have good UX, have simple and straightforward tasks. You know, if you're making someone Google something, like give them the Google URL, don't have to like copy and paste it and put it in. There are also two sort of like tricky-ish, like Turk things that you can do. The first is goal data. So you can actually put in data that you know is correct, test the workers to make sure that they get that information correct, and then you can sort of use that as like a little bit of a screening process or just to check their data. So it means you actually approve specific workers based off of different criteria. You can have different worker pools. There's a lot of different qualifications you can use. We saw, we said greater than 95% acceptance, but you could also make people do tasks and say, you know, if it's a French translation, you can give them like a basic French quiz to make sure they actually know French. Another thing is what will be the rate that your task will be completed? So it's some product and again, it's a market. So, you know, when we were putting things in, you know, it was going slow, we would usually bump the price, but we'd also kind of look at these things. The ease of task, the reward, and the number of tasks. The number of tasks is the interesting one here because workers want to work on tasks that have a higher number of hits available. So if I'm like sandwich hunting, I don't want to do it 10 times. I want to do it a lot of times or as many times as I'm able to do because then you have to learn a new task and search, Turk, for all that stuff. So the more tasks you have in there, your rate will actually increase. And that's Andy's theorem. So yeah, I think it's actually really important to talk about the ethics of M-Turk. These are two articles that have been written about it. One was a University of Colorado study that looked at just the use of Turk in low wages. I mean, as I said, it's pretty prevalent in academia with like research studies and stuff like that. Another was a letter writing campaign asking to Bezos, asking for higher wages. You know, it's actually surprisingly mostly a U.S. worker pool, but a lot of people do this from other countries, countries with lower wages. So, I mean, I think it's a really serious question about whether this is fair. Is this taking advantage of low wage workers? Yes, it is a market, but can these people sort of communicate with each other? Requestors can also commit fraud and reject work dishonestly, and people won't get paid. So I think it's important to talk about. And another thing that's kind of interesting is the culture of this. There are three websites that are pretty robust and popular about it. Turkernation, Turkopticon, and the M-Turk subreddit. And people are saying it's not beer for money for me, like it's rent money. People are saying they make $100 a day. Someone's saying that you can make $6, $12 an hour. I'm really curious about this, and I've seen very varied estimates of how much workers will actually make from anywhere from $2 to $15 an hour. I mean, I think if you're doing those transcription tasks, that's a different game, but it's interesting. Another controversy I wanna bring up is Cambridge Analytica, which has been in the news. So people know that people took these Facebook quizzes, but what actually has been released recently in the past few months is that people were actually paying people on Turk to take them. So Alexander Kogan of Cambridge Analytica was paying people one to two dollars to take these personality quizzes, and 240,000 people took them. And they were banned by Amazon, but not until after a year. So M-Turk really got Donald Trump elected, which sucks, because Donald Trump sucks. So this could have been this, which would have been more interesting. So yeah, thank you guys very much for coming. I hope you picked up some technical tidbits about Turk and crowdsourcing. I hope you left with a sense of curiosity about what's possible with Turk. I like my experience with Turk, not really because it was a marketplace for micro tasks or anything, but I built a tool that helped coordinate people to process data quicker and more accurately than I thought was possible. So that was something that was exciting for me. And that's why I love being a Rails developer. In many ways, what we do is simple. It's practice, it's experience, it's learning new things. But the ethos of our community is the imposter experience. It's pushing our software and ourselves to accomplish what we didn't think was possible before. And I think that even extends beyond programming. So thanks for listening to me talk about M-Turk. One shameful plug, I'm in New York. I started a landing page service. We serve high growth companies and we make their landing pages and we build them dynamically and we run dynamic experiments for them. I'm looking at higher Rails developers in New York or remotely if you're interested. With that, I thank you for your time and yeah, any questions? Yeah, clap, sure. Yes.