 Okay, so I guess I'll get started now. I'm Gregory Brown, I'm from New Haven, and I have to start with an apology because the contrast on the screen really sucks. So I had originally gone through all the trouble of using the syntax highlighting from TextMate, but now you're just gonna see standard black on white and I'm sorry about that. So this talk is on Ruby Mendicant, which is really about prawn and what I've been doing over the last six months. Ruby Mendicant was essentially a grassroots funding project for hacking on free and open source software. And it started with this post that I wrote when I was just in the middle of a contracting job that was particularly boring to me at the time. And I was completely joking. And what I said is that there's a lot of projects that I would like to work on, but because I have to work for a living obviously, I don't have time to do it. And there's some projects that you could do as side projects and other projects that require a decent amount of dedicated time to just even get off the ground. So people eventually took it seriously and the short story is that I got enough funding to take 22 weeks off of work, which is great. And I decided to spend it on prawn, which is a PDF generation library that's designed to become the PDF generation library for Ruby. I'm gonna put a bunch of material up on this webpage and there's already the slides and some other things and some kind of background material. Now my talk description kind of covered a lot of different things, but as far as what's coming from me, I only wanna talk about three things. And we'll have some time in the end where I can either show some examples or answer some questions or do kind of whatever we wanna do here. So the three topics I wanna talk about is being a hippie because you pretty much have to be to take six months off of work and live off of just living expenses to do free software. I wanna talk about the bare internals of prawn because okay, most of you will never need to do low level PDF generation. But it's very likely that some point in your career or in your hobbyist background, you're gonna have to approach something big like PDF. I mean PDF was a 1,310 page specification and the pattern that I'll show is possibly useful for if you're attacking different problems with a similar feel to them. But of course, I think some of the people here are here to see the shiny outside of prawn, what you can do with it now and where it's heading and things like that. But before I get into the technical side of things, I wanna give you a tiny back story. Aside from Ruby, I've got three main hobbies. A play go, I home brew beer and I study Buddhism. The only thing about that that's relevant is the part about studying Buddhism. I'm sure that some of the people here are saying, shit, it's going to be a philosophical talk now. But that's not the case, I just wanna give a background of where Ruby Mannequin came in the first place. So Buddhism like many other religions has this concept of generosity. But what I find fascinating about it is that it became this sort of interesting system for supporting people who want to do things that are good for our community. So throughout Asia, there are tons of what they call wandering mendicants who basically have vowed to take a vow of poverty and just study spiritual life. And the community believes that these people are beneficial to them and that they do things that help them. So they give them food and sustenance so that they can do their things. Even in the West, Buddhist teachers who teach meditation retreats and also run talks and things like that, they typically don't run on any sort of, they don't simply charge for any of that stuff. It's all entirely on donation. What's interesting is out of that you end up with something that becomes a true meritocracy where if someone is not producing something that people perceive of as value, they are not supported and therefore can't continue doing what they're doing. But the people who do bring something to people that they appreciate can easily make their way. Of course, Ruby is not a religion and it shouldn't be. People do open source for a lot of reasons and that's awesome. You don't need an ethical reason to be doing open source hacking. You could do it for fun, you could do it for profit for your company, whatever it is, I think that's great and it's what keeps our ecosystem so diverse. But my reason is that I wanna help people. Now, I don't really know that much about religion despite the fact that I study it or pretty much anything else, but I'm pretty good at Ruby. So that's why we have Ruby Mendicant. And what's really nice is that lots of people helped me. This is the list of donors and there was a total of 70 of them. Plus Ruby Central and Mountain West Ruby Conference did some donation matching. These are the people who essentially allowed me to take a half a year out of my life and dedicate it to doing things for the Ruby community. Now, it's worth mentioning that these folks, they had an idea of some of the projects that I wanted to work on, but this was not a bounty system. This was me saying, I've got a few ideas, what I wanna do and I'm open to accepting other ideas, please give me money so that I could take the time off and then decide what to do. So a lot of these people put trust in me without knowing for sure what they were gonna get in return, which I think is fantastic. What's amazing is that Pran has had a tremendous amount of support non-monetarily. On the left column here, you can see all of these people have patches to Pran. I mean, you're talking about a PDF library, which I thought would be untouchable until it got to the super high level, but all these people have been helping out over the last six months and it's been amazing. The logos you've seen in here were also contributed by the community. Now, one person I wanna give a specific thanks to is James Healy. I don't know, is anyone familiar with who he is? Okay, so James Healy was a former developer on Ruby Reports and he's also the person who wrote PDF Reader. Because PDF Reader exists, Pran has specs. It was originally in our spec, we moved to test spec, but we can actually test the output that we're doing, which I think we may be the only PDF library that's doing that. He's also been the one that goes forward and pretty much does all of the exploratory work. I mean, if you like doing things like embedding images or using Unicode in your PDFs, he's the guy who started that stuff and then I sort of followed along and just solidified it a bit. So without these people, Pran would not exist. So I'd like to give them a round of applause. So the question is, if you're gonna take six months off of work, why work on such an awful project like PDF generation? And the answer is complicated, but it mainly boils down to I needed convincing. And the person to convince me was James Gray. Now imagine most of you have heard of them if we see a show of hands. Okay, yeah, so James knows my situation which is that I've been working on Ruport for all of this time and that we're working, fighting against PDF writer to do the things that we needed to do. And he wrote this really, really awesome essay about how it's okay to let software die. And he wrote this about PDF writer and this is ultimately what determined what I wanted to do. This is the last sort of meta stuff in the talk and then we'll move on to code. But I'd like to read this first. This is from James Gray. He says, I think we should let PDF writer die. Why sing a lot of specific library? Just because I'm fairly familiar with some details of it. It's nothing personal and the message behind this post is intended to apply to many projects. For example, the Ruby core team has publicly stated that they want to see the standard CGI.RB library replaced. I'm sure we all feel that way about some software. I'll stick with PDF writer and you can mentally replace it with a project you're familiar with. Now back to the point. I think we should let PDF writer die. I guess that sounds kind of drastic but give me a chance to explain. There's a great quote that Matt's the creator of Ruby showed on a slide in a talk he gave to Google recently. It said, open-source software should move forward or die. That's an important truth. Why are Matt's and I so ready to start handing out the destruction? The reason is not at all complicated. A project can get to the point where it's hindering more than it's helping. I believe PDF writer is there. I've lost respect for Austin and his work to build PDF writer. Back then it was a welcome effort. Today is a different time though and the landscape has changed. For instance, Austin no longer keeps up PDF writer. PDF writers new maintainers more like patch suppliers don't completely understand the system. He was talking about me there which is but there are several known issues that just aren't practical fix to fix for various reasons. PDF writers vast and complex code base. There's serious performance issues. The API is far from ideal and it would be a substantial effort to port it to Ruby one nine. If we put all this together the picture becomes clear. PDF writer has stopped moving forward as Matt's put it. It's on life support. That's worse than being dead because it means we're burning valuable effort to keep things in this obviously less than ideal state. Now if we could just get the corner to call the time of death for PDF writer we can move on. Where would we go next? Who knows? Anywhere is better though because we would again be moving forward. Some options we might explore in the immediate future are using a different format such as RTF, piping some HTML through HTMLs PS, print section all. The fact is we've used all three of those options and production applications at work within the last two years. None of them are perfect. Prince is amazing but so is the price tag. HTML to PS is just shy of being as useful as we'd love it to be in some areas. If you really need PDF substituting is probably just not an option. That said all three of these support our needs better than PDF writer. Perhaps the only viable long-term solution is a shiny and sleek rewrite of PDF writer. We know we have at least a few people interested in the project so if we could free them up from monitoring the life support systems we might just have the beginning of a rebirth effort. That's the way we need to get things moving. The moral is simple. It is not just okay to let PDF writer or whatever project die. It could actually be a blessing. Sure we would mourn the loss of once great resource but eventually we would also choose to move on. That's for the good of us all. And that's why I've been spending six months doing PDF stuff. So that's enough of this sort of intro stuff. I wanna talk about the bare internals now. Okay so this is the most trivial example that I could come up with. It's drawing a line from the top left to the bottom right of a page with respect to the margins. Now right now in PRON that's absolutely trivial to do. It can't get more simple if we read it in English. PRON document generate lines.PDF. Stroke a line from the bounce top left to the bounce top bottom right. Okay does anyone see a way that that could be better? Okay but when we look at what it actually generates PDF is non-trivial. So on the left hand side here you've got things that are pretty much gonna be in every PDF. Just sort of metadata and a container for the pages and things like that. On the right side we've got the sort of stuff that were actually the work that we need to do that was specific to this problem. So when we look at this we could sort of decipher it. Okay so ignoring the first four lines when you see the 0000 RG, 0000 uppercase RG does anyone have any venture we guess well that might be? Yeah color right on. So the first one is the stroke color the second one's the fill color. Okay so Q and uppercase Q are actually things that save and restore the graphic state. So this allows you to move to somewhere in the document do something and then just wipe all of that and then just kind of stay where you were before you did that. So inside of that block we've got something that's 36, 756M and then 5786, 36L and then S. So who wants to guess what that first line's doing? The 36, 756M. Right it's moving to a position. So PDF actually works sort of like turtle graphics you say move here then move somewhere else and then draw something. So the next one is actually creating the path that's going to the bottom of the document and then the S actually strokes the path. So in PDF you're generating basically point to point to point and then you say okay fill this in. On the right hand side does anyone want to guess what the media box represents? Page size right on. So obviously we don't have to manually put that stuff in Pran but you guys have just pretty much seen and understood what a PDF looks like. Now the reason why I'm showing you this is to show sort of what it's like to build up on top of this stuff. So letting Pran handle the positioning and the pages we could actually manually do the line drawing here. So you could see that just using some string substitution formatted string substitution. We've got to move, we've got to draw a path and then a stroke. So this is equivalent to what you saw before. Now the interesting thing here is that you don't need to dig into the internals to do it. If you know some PDF operations that you need to val so say you need to add some functionality it's pretty easy to do it in Pran. You could do it in a subclass meaningfully. Now this is a little bit more hardcore. Right now the thing that it's only doing is the stuff that was on the left hand side originally the stuff that's part of every PDF. Here we're manually creating a PDF page object. And if you look at the parameters that we're passing here they look very similar to what you actually see in that sort of raw output. But of course don't ever ever do this. The real point is that PDF is hard. And these low level APIs help make it easier for contributors to come in and build stuff with us. So going farther down the bunny hole you could see that you've pretty much got support all the way down. This is actually building low level objects from scratch. We give them an ID and then we give them some Ruby code and you could see that they produce your PDF objects. Now the cool thing, and this is the thing that I want people to sort of keep in mind when they're attacking something else is that I didn't create some sort of domain specific thing for this. I didn't say okay so when we do something like do graphics drawing or something like that. We're not writing a bunch of code that's okay now we need to go and read everything about PDF to understand how it works. What we're literally doing here is we're going from Ruby to PDF in the object serialization. So when you look at this you've got just a hash of arrays of strings. And when you look at the PDF object that's exactly what you have. Now really strongly consider this when you're working with a low level format. If you take the lowest thing that you have and then wrap it in something that lets you write Ruby you will write Ruby for the entire rest of the duration of your project which of course is a good thing. Our entire PDF object generator is this and I'm not expecting you to really memorize this or know this or anything but it's as simple as taking classes and then mapping them to their equivalent in another language. So what do I want you to get out of this? I'm not trying to convert everyone here to people who work on PDFs of course not. But I do want people to know that extensibility is not just important at the high level. If you approach some of the other PDF libraries we have out there they assume that you've already read this spec. And we wanted to avoid that and that's probably the reason why we have so many contributors on Prime. Another thing is that these developer APIs simplify the interactions with the low level systems. It makes it so that you can have code that is not at the top level but lower down in the system that people can still meaningfully understand. You can wrap these things pretty easily and see things like add content to page. Meaningful methods, this is general good practices that we do when we're doing things like coding a Rails application or writing a small Ruby application but it doesn't quite come natural when we're working with something so low level. We just end up generally running into a mess. And I mentioned it before that wrapping your domain lets you focus on writing Ruby instead of writing whatever it is underneath. I mean, after the first few weeks of this I sort of was able to forget about what the low level PDF constructs were so long as I could sort of cite recognize what kind of objects they were I didn't need to know their particular syntax or anything like that which really is a big win. So I'm now going to move on to some of the nice fancy stuff. Now how many people were here to see sort of like how to use Pront instead of the underlying stuff? Okay, a couple of you. So a problem that I had when forming this talk is I didn't think anyone used Pront in production yet but I was wrong, these apparently GitHub is using it for their invoices. Of course, since real code is more fun and they neglected to leave the source code on their page we'll reverse engineer it right now and hopefully not get sued. So this is what their invoice looks like. Fairly simple but fairly standard. How many people have needs that look something in this realm? Okay, awesome. So we're going to do it right now. And this is also valuable because it's looking at it out of the way that you might traditionally look at it. We're not looking at some code that already exists we're looking at a document we want to reproduce. Now I bawled this but you could obviously do some direct calculations. So let's do it step by step. This is generating some text in Pront. You don't need to position it it will flow automatically if you want. So we're saying, okay we're going to make an invoice slash receipt text it's going to be size 24, it's going to be bold. We're using the default font which is Helvetica. We get that and we realize it's a little too high up on the page. So we set a larger top margin and not perfect but close enough. Now what we want to do is we want to move down the page and then put that little bit of text that says, okay I count build person's name and then email address. And we do that. And now we get a little bit more fancy we want to generate a table. In Pront tables are dirt simple. They are basically arrays of arrays of strings or cell objects and you can mix and match the two. Now setting headers is just a single array of headers you can do things like set up fancy header alignment and things like that but we'll start simple for now. Row colors are cycled over and by default if you don't specify a header color it will just use the first one for the header color and then walk over it. You're using HTML color codes for this which I assume everyone here is familiar with. And we can align column based things so we can do things like that. But the one thing that you don't see here is you don't see a lot of complex objects for doing these things. How many people have worked with PDFWriter before? Okay, so does this look better than PDFWriter's tables? Okay, so we do that and that's our simple table and we say, okay well they're not actually using the sort of fit it in the smallest areas you can, they're using fixed widths for their columns. So we can go back and we can fix that and I actually fixed the typo in the email here too just did a little things. There's a weird issue right now in Prawn that if you specify column widths you have to set the header alignment which I'll look into. I found out about it last night so it's not fixed yet and I wanted to give you a code that actually runs. But what we also want to do is we want to align that text field to the right so it looks like that and right now we're looking pretty good. So the next step is to do a horizontal rule which is trivial. We move down the page a little bit more and then Prawn has a function that you don't have to do any calculations. You just say stroke horizontal rule and it'll go from the left boundary to the right boundary at the current wide position no matter what your bounds are. So when we do that, we get this. Looking good so far? So the next thing we want to do is we want to put these logos on there. I'm sorry. Now we want to just put the addresses on here. That's just some more text. Nothing special. At this point what you might notice is that we're giving some positions and telling it where to move and things like that but we haven't done a whole lot of absolute positioning of things and that's one of the nice things of working with Prawn is you'll probably tend to do less calculations of coordinates which anyone who's done any sort of graphic processing or anything like that before knows that could be very painful. So we have the text looks like that and now we want to do something a little bit more fancy. We want to put the images on but I decided I didn't want to use them locally. I wanted to just pull them from the web page. So our image support, that may be very low for you guys. Could you guys see the bottom line there? So the image support in Prawn you could pass it just a file name or you could pass it any object that responds to read and we're trying to follow that sort of approach. So if you use open URI and then you open the URL and you pass it in, works. So now we have the second image and we also do some things around with the scaling so that we end up with I'm sorry if you want to see that code for a little bit longer basically I had some scaling things to make things the right size and I had the second image which you can use relative or absolute paths either way. So then you end up with this and looking at it on its own is this convincing enough? We're pretty similar. Okay, so don't go fishing with it please but now I'd like to relax a little bit and really give us an opportunity to use whatever time we have left to either go over some examples that you guys are curious at or looking at talk about questions, things like that. Okay, so now we'll just go into sort of a question and answer open discussion sort of thing and I can answer whatever questions you have if you want to look at some code if there's something easy you want me to demonstrate I can do it so that's the URL, that's my blog and we've got a lot of time to fill so hopefully you'll have questions. Coming from sort of the same kind of altruistic place that you mentioned where did you have like different folks looking at different things maybe some people actually did not have a specific requirement or a bounty out of you but had an interest. I would say most of the people are just fixing things that they need for their job and that's why I think it's awesome that we don't need this deep spirit of altruism at an open source because if some people have it some people don't. What happens? I mean your monetary oh it's impossible for them to have a bounty out on things because I didn't specify what project I'd be working on previous in fact I didn't even have all of my ideas out until about halfway through the donation process so these people are those people altruistic or at least trusting of me to produce something decent other questions? okay so it was hard it took it was difficult it's the first time I well I've done other things like this I mean I've done Google Summer of Code I've done Code Fest Grants James Gray and I worked on a sort of game framework for Ruby that never really came into anything Ruby Central had funded that and those are very different because basically someone saying you're getting X amount of dollars and then you do this and it's part of a sponsored organization totally different to ask people openly just to give you money and the thing is that I really sort of undershot how much money I would need and the things like I didn't count for travel so when I wanted to do travel and things like that I had to do consulting to fill it up short things I would love to do it again I would rethink the way I did it I made a pledge of ours which apparently no one cares about maybe through that that's just a random statistic but around there around three quarters of the way through and you can't track and open the source project like you do billing of a contract it's hard there's a lot of things I'd have to think about before I did it again but I think that it was a success because we've got lots and lots of people using this just out of the box and it's allowing people to migrate from other software so I think that's a win talk with or help other people with doing this sort of thing okay Dave one of the features that I'm using right now a PDF writer is he wants you to capture a PDF into a context and then use it multiple times so it basically creates like in our particular case there's a loophole in customs documentation you need two invoices to accompany it but they don't say it has to be a two separate piece of paper so I should generate one invoice, rotate it, scale it and put two side by side one form let me do that probably not because that loophole, you're talking about how you could serialize basically now I see a creative context before a PDF from a drawing and then draw into it and then just reuse it the way that it worked in PDF writer is you were able to marshal it out to disk and then pull it back in later is that what you're talking about so the concern is that it keeps a bunch of prox all over the place but we can talk about that I think this is something we need to do some sort of feature comparison but if you want to catch me after this we can talk about it a little bit other questions? no there's nothing in Ruby that does it so I don't even have something that I can look at there's, you can do that with pearl so I mean if you're not aversive of running a shell script and then doing some merging between documents you can do it that way but it's not that feature but it turns out to be hard because you need to be able to fully process the PDF to meaningfully do it because if we go like let me try and pull out my slides again alright so I mean maybe I'm coming from a pearl background I feel like the pearl solution might be lighter but yeah that's the situation right now because it's external but the problem with let me back up one more slide the problem with doing that in PDF is that you've got this cross-reference table and what this does I kind of glossed over it before but it tells you exactly where every object is in a PDF and so in order to add things into it like into different pages and things like that we'd have to update the cross-reference table in order to meaningfully do that we would either have to find a sensible way to stub out all of the objects and sort of ignore them and then inject inwards you know just picking up just their lengths or we would have to fully and a PDF reader has come a very long way it's very cool and if people are interested in how to test your PDFs and stuff I could show that stuff but we're not there yet people with interest in it I would support it for sure because it's one of the most common requested features but we stayed far away from it because it's hard and hard in a different way than some of this other stuff other questions? okay so what I can do if people kind of don't know what to ask I can tell what two o'clock okay so I could show a couple other examples that show some of the sort of more advanced stuff in Pran because what we didn't see in this was we didn't see anything that was multi-page documents we didn't see anything that required you to position things in a particular way or anything like that so I can show you some of that stuff okay so let's see okay so here we're using we're playing around with these things called bounding boxes now how many people came into this familiar with Pran somewhat okay not many so Pran fixes a lot of the issues with positioning things on documents by allowing you to box off and then remap the origin in different places of your document so you can do a whole lot of relative positioning without doing absolute positioning in calculations so here's from that example what we did is we said okay we're going to place a bounding box the top left corner of it is going to be 100 over from the left and 600 up from the bottom and it's going to be a width 200 we're going to flow some text inside of it and then we're going to draw some lines and we made an X by going from the top left to the top right of that box and then the bottom left to the bottom right so then we do another bounding box that has the circle and then the X through that and inside of that you can see that you can nest them and when you nest these bounding boxes they're relative to the one that it was nested in so you can see in this code there is absolute absolute positioning going on and just to show that example again we get something like that now does that seem interesting to people? do you have any questions about how the bounding box stuff works? okay so other things we have are built-in UTF-8 support it just works and another thing that I should mention is that all of this stuff is going to work on Ruby19 and Ruby18 because we wrote PRON on Ruby19 so you can move forward with it and you can get those speed enhancements and things like that and because you can eventually become part of Ruby reports Ruby reports since faster CSV is now 1.9 compatible will quickly move up there too so if you're using this sort of stuff you can bring it on when you want to migrate to 1.9 so you don't need to do anything special as long as you pass UTF-8 strings they just I should have opened at the bottom of here they just work wrapping works, all of this stuff works the key here is that you need to it only works for the true type fonts so you need to have a font that supports whatever language that you're working in and things like that but you shouldn't need to do anything special so if you've got needs for multilingualization and things like that it's pretty nice now PRON will convert your code to it expects that your code will be in UTF-8 if you're on Ruby18 if you're on Ruby19 you can pass in anything that's in any encoding that can be converted to Unicode and it'll do it for you it's going to transcode it which is not a fully robust model but it works pretty good for most cases so other things okay so there's this concept of spans and a span is sort of like a simplified bounding box that allows you to flow text in a column these things you basically you give them a width and then a position you can position it relatively centering or whatever you want to do and okay so this is text flowing in the column and then here's some text in the bounding box and when you use spans in a bounding box you have to do something a little bit tricky these are not really meant to be used that way because the difference between a span and a bounding box in PRON is that if you flow to the bottom of a bounding box it will bring you to the next page but it's going to put the text at the top left corner of that bounding box wherever you put it so that often becomes a problem for people because that's not what they want so when you do work with spans within a bounding box you have to offset them from the margin because that's how they always work regardless if they're inside so they don't nest like bounding boxes do but you can see that we're just very easily flowing text in a column centered and doing things like that let's see okay so images don't need to be absolutely positioned this is another sort of cool thing so here we're just saying okay from the current Y position go to the center of the page right left again and if you don't put anything at all it just puts it flush with the left edge of wherever you are at your Y position what happened there something is broken with that okay that was broken this weekend during the hackfest I think I'll fix it but okay so after seeing some of that stuff are any other questions coming up or okay yes I'll show that okay so this is just a bunch of content you can see that you can use UTFH strings inside of all of that other stuff you can set this has a lot more options than the other one so you can set the padding and things like that for builds and so that's a table and across pages it'll just repeat the headers and just start off where you left off on the next page and you can actually flow them within a bounding box too so you can sort of subsection an area of your document and you say okay I want it to be this size now when you make a bounding box what you're doing is you're sort of pretending that you're moving the margins on the document temporarily inside of that block and then doing everything relative to that and it turns out to be really really handy that's one of the main things that I like about Prawn so there's also some sort of interesting things with the way that we do fonts and things Prawn has all of this stuff that is sort of like you've got these blocks and they do something just for that block so you can apply some parameters here so this is all sorts of ways of dealing with fonts you can manually set everything you can say okay I'm just going to set the font size just for this area and then you can override it in there this is just setting new defaults you can specify everything all at once the styles, everything like that and then use all the stuff nested inside and it all just works so that's that now since we only have like two minutes left that's all I have, I'll throw this page with the URL on it up again there there you go so any last questions before we get out of here leaving now so I won't be around the conference anymore unfortunately but please get in touch Prawn is a really really approachable project it's got a lot of great people doing casual contributions, some people getting a little bit more active than that we're always active on IRC and you know the mailing list and things like that and I'm more than happy even if you're not looking to contribute directly I'm more than happy to help look over some projects that you're working on and try and improve things and I've still got a couple hours quite a bit of hours left to finish up on this for the Mendington project so that means sort of like I've got some dedicated time to help work on your problems, that's pretty cool so that's my talk, thank you