 So our next presenter is presenting three talks today. This is her first. It's a programmer who works for the Bureau of Meteorology on the next generation weather forecasting system. Please welcome Brianna Lau. Thanks for coming to my talk, or at least remaining motionless in your seat while talks continue around you. I work at the research department of the Bureau of Meteorology which is called Corker. It's a climate and weather something research. And as Chris mentioned, I work on a project which is called the next generation forecast and warning system I think. And a component of that is the automatic generation of text forecasts. So when you open the newspaper and it says cloudy, chance of rain, clearing by the evening, that text is generally generated by the program that I work on which is called the Graphical Forecast Editor. And so this talk is a little bit, might be a little bit strange for this stream and I'm not going to go into a great deal of detail about the programming language which we use, which is Python. But this talk is actually more focused on how you can use really simple features which are available in any of the languages that will be mentioned in this stream to explicitly program language. And by language I mean English language rather than some kind of meta computer language. So a lot of people, if you're working on a program it's more than likely that you'll have to output some information which may well be in text or in language. So you have to display some information to your user. And if you can just display canned information then strings are awesome and you can just have as many strings as you like. That's not even getting into translation of strings or internationalization. But if you need to have a little context sensitivity or you want something that's not quite so canned then you might need to do something that might look a little bit like generation, text generation. So natural language generation or text generation generally refers to when you're using some input which is not language to start with and you're going to produce some language output or some textual output. So you might be summarizing some statistics. You might be summarizing an appointment or an invitation or something like that. But if you work with strings and you kind of go to strings too early and then you find that you need to modify your strings it can be a real pain. So this is about what can you do in the middle layer in terms of encoding some representation of language that is going to give you a bit more flexibility before you finally output your strings. Okay. It says AV system will turn off in 15 minutes but I'm going to ignore that. So the program that I work on is called the Graphical Forecast Editor and this is used by forecasters currently in Victoria and New South Wales and it's being rolled out to TAS in South Australia this year and the other states next year. So this is what they use when they're sitting in their office forecasting. There's a bunch of grids on the side here so they have some names like wind, T which is temperature, max T, minty, sky and then the one that's highlighted down here is pop which is probability of precipitation and then this is the gridded output and I think this has got some wind, wind bulbs overlaid on top of it. So the gray areas kind of saying pops probably below I think that's maybe 10 or 5% so probably not going to get any precipitation there going up to the dark blue purple color where that might be 40% so there's a reasonable chance of precipitation in that area and so these grids come in from numerical weather prediction models and the forecasters kind of play with these grids and tweak them and modify them until they're happy with them and then they press one of these buttons up the top and the text is automatically generated from these grids so they're not handwriting the forecast although they can edit them if they don't like the text. So this is a little bit of a graph of what the forecast process is so the numerical weather prediction models and observations come in and you have these grids which we just saw before and then smart tools are used and these are supposed to represent kind of meteorological or scientific processes that produce this weather grid and what I'm concentrating on is just the text generation for the weather phrase and then from this we have statistics and then the text formatters which is the text generation basically want to simplify the statistics and then produce some words and that becomes the text forecast that we get in the paper or on the website and so we saw with this grid we talk about gridded forecasts and there's some forecasts which are for areas and some which are for points but even though it's a point it's still like three or six k's across and so there's some interesting problems in terms of calculating coverage which when you have this point which is kind of a point but not really and then you want to extrapolate to a larger area as well and so when we talk about weather in this context it's actually kind of a subset of what you might think of as weather which is what's going to be forecast for so the cloud if sunny or cloudy that's not weather windy that's not weather if it's going to be hot that's not weather but all these things so if it's going to rain anything related to rain, thunderstorms fog, frost, these other things that's what we consider weather and so in this weather grid which is created from the like the pop grid probability of precipitation we have about I think there's 15 different types of weather and we have these keys so they've each got these four components which is coverage type, intensity and then these optional attributes so there's optional and multiple attributes so attributes can be things like hail and flash flooding down on the bottom one we have this thunderstorms that often you know there's no attributes but you do have to have the coverage intensity and type so this is some example of keys and how they would be represented in text and we can see that there's a pretty decent correspondence between basically if we wanted to convert the key to the words we'd basically say the coverage words come first then the intensity words and then the type words and then last of all we have the attribute words so as a first pass that seems like a fairly decent model for how we're going to convert these keys into words and just with coverage there's there's a lot of terms there's basically four different levels of coverage which represent you know I think wide is like 55 plus so more than 55% chance of rain and I saw patches like 0 to 30 scattered areas is like 30 to 55 I think that's about right and so depending on the type that we're combining it with and depending on the type of forecast so if it's for a point or an area these are some of the terms on the right that we use to correspond to these in our weather keys so a reason that we don't just say scattered all the time is because if you're forecasting for what's supposed to be a point it doesn't really make sense to say scattered like there's either showers or there's not so we say at times and that's kind of hedging our bets so we have all these keys so maybe we could just list every possible combination and just you know have a massive table and list them all but there's kind of too many so maybe that's not the best idea so we need to do a little bit of generation and so what the old infrastructure did so this graphical forecast editor is something that Australia inherited or borrowed from the US and their forecast texts are much simpler when it comes to forecasting weather so they would just pretty much just say showers or rain and they wouldn't have this at times or easing to light drizzle or they wouldn't have a lot of elements that we use so we have the old system had which we inherited had a lot of these kind of look up tables and this would look a lot nicer in prologue but not many people use prologue so what this is saying is if your key has this coverage term and matches this type and this intensity use this string on the end and then there'd also be this part order list and so this would say if your key has this coverage this type this intensity put everything in this order and so if we look at this bottom one this is kind of the default one so we can see it's got prefix okay let's ignore that coverage intensity type attributes that's basically what we said before when we looked at the four examples and it's also got this TD which is the time descriptor so in the morning and as SN which is as snow and that's we won't worry about that either so if we look at a particular example which is if we had isolated showers for a point forecast so we're using this point the name of the function is point weather coverage so we're looking up for a point what we say is a shower or two and so this is getting the coverage component of that is going to be the or two prefix is actually going to be the A but they're I mean they kind of go together it's like a shower or two it should be one thing but they're actually separated in this infrastructure and so then we see here that we have intensity prefix intensity type coverage and so the fact that the coverage comes after the type in this list is how we get the shower or two happening in the right order as opposed to saying or two showers which would not be grammatical so that's our basic thing our basic thing was coverage intensity type attributes but there's lots of you know exceptions so in the first case if we have thunderstorms with hail and flash flooding we don't want to say like with hail with flash flooding we want to be able to combine the hail and flash flooding so we don't repeat the preposition with this isolated thunderstorms possibly severe this is such a pain this is an adjective possibly severe but because it's kind of wordy we want to push it out to the end and add this comma it's a total nightmare we've got these showers at times and at times is also used as a time descriptor when something happens intermittently so it actually has this double meaning which apparently you know doesn't seem to matter we've got a shower or two we've got dry thunderstorms and dry is actually an attribute but you wouldn't say thunderstorms with dry so we need to recognize that dry is an adjective and it's going to go in this different spot compared to the other attributes which are preposition phrases and we've got this very hazy this isn't even a noun phrase unlike everything else this is an adjective phrase don't know why they decided to do that we've got patchy morning fog where instead of the time phrase being a preposition phrase like in the morning we actually use it as an adjective and so then it's got a slot in with the other adjectives so you have the right order there so like the adjective ordering thing is a little bit subtle but there's definite preferred ways and then there's a whole lot of vague ways as well so morning patchy fog versus patchy morning fog you know which sounds better I think patchy morning fog is slightly better but it's very hard to to say why and this is part of this this is like this adjective ordering thing which speakers of English learn but it's very hard to explicitly say what the rules are and then we have showers fooling us which I'm not really going to talk about because that's a big pain as well and there's also this thing called co-reportability and it's not really a language restriction well it may be we could consider it a language restriction but we could just consider it a constraint that's coming from the kind of meteorological basis to what we're doing so the forecasters really like if you had thunderstorms and showers they want them to be in the same sentence so they want you to say showers and thunderstorms because in their head you know they're associated they're related meteorological things and so it gives them great satisfaction to see them in the same sentence whereas if you had showers full stop thunderstorms it would just rub them all the wrong way and they would change it so we have to put showers and thunderstorms together but if you had showers and fog at the same time they don't consider those meteorologically related so if you wrote showers and fog they would change it to be showers full stop fog so pretty much precipitation and thunderstorms need to go together frost and fog need to go together and everything else pretty much just has its own sentence so this is not a syntactic restriction there's nothing we're not breaking any rules of grammar if we say showers and fog that's totally valid and actually for lay people who don't know meteorology they wouldn't consider there's anything wrong with it either but the forecasters, meteorologists have this extra level of knowledge and for them it's almost ungrammatical and so maybe we consider it a semantic restriction or it's just some other type of restriction that we have on our system so that's looking at single blocks of weather which aren't changing but then we also have this idea of trends where we have some weather and it's changing and becoming some other weather and if we just have heavy rain easing to light drizzle that's quite simple you just list everything, you just list your connector then you list everything again but if you want to be a little bit smarter and have some kind of context sensitive rendering of your sentence you want to avoid repeating things that you've already said and you want to have some contextual rendering so in this middle example we have chance of showers becoming scattered showers so scattered is a greater coverage term than chance so instead of saying we could say becoming scattered but becoming more likely is a comparative term and that's kind of nicer you know closer to what a human is going to do so if we look at the old system in terms of how it modeled a noun phrase so just going back to one block of weather it was something like this, it was just super flat and then you know this was your default and then you had lots of exceptions to override the defaults and equally if we look at what you might do if you were analyzing it in a English class you'd have this really complex thing and so you've got a noun phrase at that level and a noun phrase up at that level and recursive adjective phrases over here so scattered, very dangerous thunderstorms with hail and flash flooding that's not even getting into flash flooding but that's a little bit complex for what we're doing because we're not modeling all of English and we don't want to model all of English because it's overkill and it's going to be too complex so we want to pick something that's in the middle that has enough expressiveness for what we're actually doing but is not is not biting off more than we can chew so we're going to make something a little bit flatter and you know at certain levels we're going to flatten things out so natural language generation often has a model kind of similar to this where your input goal is your statistics your text planning is deciding okay what am I actually going to say because normally you have to leave things out or simplify them or use some comparative forms and then the linguistic realization is actually coming out to your string and so in rewriting this infrastructure the thing that was the most striking thing to me is that the text planning you can do or the simplification that you can do is like entirely dependent on what you are able to realize the kinds of things you are able to say that just drives everything so this presentation of them in separate stages is quite misleading because they're very intertwined so we want to use the rules of English a little bit but not too much and we want to avoid recursion because it's just scary it's not scary but it's hard so we want to keep it simple and flat and so the first thing that we notice about forecast English let's say it's a subset of the entire expressiveness of English is that it's really focused on noun phrases and things that are grammatically forecast English are not grammatically regular English so you can't just say rain you can say it is raining it will rain it rained but in forecast English that's a totally fine thing to say and they in English when you say it is raining you go what is it it's actually like a dummy subject and we only put it there because English forecast English so this is the kind of data structures that we ended up using it's quite simple we're not going to use recursion we have this thing called a conjoined noun phrase which is not a part of speech in English but we're going to use that and it's quite simple there's a list of stuff and then we've got a dictionary mapping from part of speech to some words and we use the part of speech to control where things appear in what order things appear conjoined noun phrases so this is like this is the part of speech ordering for our noun phrase and you can see that we've got this time prep phrase snow height prep phrase and prep phrase which represents the attributes now they're all preposition phrases but we need a way to control where they appear in the sentence the different types of prep phrases and so we use part of speech to control this because normally prep phrases have multiple places they can appear but that kind of makes it too hard or it makes it more complicated whereas if we create these fake parts of speech I mean they're not fake they exist for our purposes in our limited language domain so we use these to control in what order things appear so you'll always have the snow height prep phrases above 1500 meters and the time prep phrase is like in the morning so you'll always have in the morning above 1500 meters you'll never have it the other way around because of this ordering and because my time is running out so one of the things that we need to worry about with the conjoined prep phrases where we might have fog and frost is we don't you don't want to repeat things so if you have patchy frog and patchy frost you want to say patchy fog and frost you don't want to repeat the patchy but depending on the type of forecast and the specification for the particular type you'll actually be using different words so it's not quite enough to just say you know if you said patchy before don't say it now because you might have said patches of fog you don't want to say patches of fog and patchy frost it's still repetitive so you need to account for synonyms even though they might be in different parts of speech and the fog patches is like a particularly nasty one because normally the type is the head of the noun phrase so fog is the head but when you say fog patches then the head of it is actually patches or patch so you need to somehow rip that out again and make fog the head again and as I was saying these prep phrases are really annoying this is kind of this well known ambiguity problem which is if you have two prep phrases it won't be clear which one or it may not be clear which one the second one applies to because you've got your head noun which is going to be showers then you've got a noun in the first prep phrase which is morning in the first example and then with hail is that applying to the morning or is that applying to the showers and in the first example it doesn't matter because morning is not something that comes with hail showers is something that comes with hail so the reader can use the semantics of the meaning of these words to deduce that there's only one logical meaning so this is like the joke you know I shot an elephant in my pyjamas how he got my pyjamas I'll never know and so in my pyjamas can apply to I or the elephant just to totally deconstruct that joke for you so this is the same problem and then showers with hail in the morning it's not clear if the in the morning is limiting the hail or the showers so did the showers go all day and the hail was just in the morning or did the showers and hail were they both just in the morning and so it it matters a little bit but maybe not a huge amount and then we also have a location prep phrase for types of sentences called local effects so if we have a large area but something is just happening in part of the area then we want to use another preposition phrase to say to limit the location of it so rewriting this you know we were motivated to do this by the difficulty with the old system in terms of the logic for controlling the word order being distributed among 12 or more different functions and just being quite afraid to change anything in terms of being unsure what the repercussions of it would be so with this system it's like quite you know localised it's in one place and the the logic of the English or the language is not mixed in with the logic of choosing what you want to say which is the document text planning so like creating this kind of simple infrastructure is pretty simple because you know we weren't trying to model all of English we're just trying to model what we had to but integrating it with other stuff proved to be not quite so simple so I think it's something that is worth considering if you are dealing with a system that has a lot of output and you need to be a little bit context sensitive or you'll find you're doing a lot of string manipulation and it's quite painful maybe you need to consider encoding some English explicitly somewhere in your system thank you everyone thank you Anna so while our presenters are changing over we'll take some questions if you please wait till the microphone shows up somewhere near you before you say anything any questions thanks for the exercise how long have you actually been running this Python I think you said it well how long have you been actually running this now the whole system the whole system started in being used in Victoria in 2006 I think in New South Wales September last year and the US has been using it for years longer than that for Queensland their rollout is not going to happen this year but I think next year or the year after and so one of the good side effects of it is that you will get a lot more you will get a lot more forecasts so for a lot more points and for a lot more days you'll have a lot more towns that have seven day forecasts like that's quite a normal thing in Victoria in New South Wales now whereas up here towns will only have one or two day forecasts because they don't have to write each one manually they're just automatically generated any other questions yep would you say the forecasts so we monitor when they get edited and the rates vary from around 10% for the really short forecast about 50% for the coastal waters forecast which focuses on wind and we don't do the wind phrase very well but there's a lot to be untangled in terms of is the problem the statistics that we got in the first place is it how we simplified them are they just being picky and it's kind of what do you call it it's not really a necessary change or is it like a big mistake so we're monitoring the changes that they make and trying to figure out is it something we can fix and to reduce the editing rates because if they have to edit it it's a big workload for them just wondering with your new system how long have you gone so far with that any grammatical errors whatsoever well it's I think it's hard for it to make grammatical errors actually just because of the way it's built yeah it actually this new system hasn't gone live in the regions yet it's going out late January so if I get a firestorm of complaints in February I'll know that I'll need to do some work but we've had it running on a review server for quite a few weeks now and given it a fair bit of testing and it looks pretty good well to go back to the weather grid the weather keys it's quite spelled out for you you don't have to do that much interpretation in terms of how it should be said so it's pretty straightforward it should be something we can get right and we're getting a lot closer I think right so this has been a talk that's on something that probably not many of the people in the audience would have considered before so thank you very much for making us aware of it thank you Chris ok since we're running a tad late for our next presenter to get his as close as possible