 All right, I believe we are live. Fantastic. Okay, let me bring over everyone into this session. Good, and welcome back everybody for day three of the conference. Can't believe we're here already. Another action packed, delightful day of talks coming up. I am very excited about everything that we have. We have coming up. We've got six wonderful talks on the program for today. But first and foremost for the day, we have our third keynote speaker. I am extremely, extremely excited to welcome Susan Hunston from University of Birmingham from the English language and linguistics program there who's gonna talk to us about language and the construction of knowledge in the scientific community. So really this is, I was just saying when we were sort of getting everything connected time, I'm really excited to build this link in particular. I think what linguists are up to recently is extremely cool. There has been some phenomenal research going on that I'm really excited to get to know a little bit more about and I hope you are too. So without further ado, please take it away. Thanks so much. Thank you, Charles and good morning, good afternoon, good evening, good middle of the night, everybody. It's a great privilege to be able to take part in this conference. I'm going to be talking about language. I'm going to be talking about some ideas about language that predate digital approaches and then I'm going to update with some of the more recent work on digital approaches to investigating science. So there are two ideas that I'm going to talk about first, particularly one is an area of linguistics called systemic functional linguistics and the contributions that that has made to the study of the language of science. The second is a notion that I came up with many years ago on epistemic status and the contribution that I think that might make. And then I'm going to take three case studies that build on the notion of epistemic status. But after that, we kind of meander a bit into this project on interdisciplinary discourse and I try and expand the ideas presented to you a bit. So the idea of systemic functional linguistics it was developed by Michael Halliday. There are some references at the bottom of the screen there. It sees languages, social semiotic. I'm going to talk a bit about transitivity, about a key concept in SFL called grammatical metaphor and how that contributes and can be quantified in a corpus of scientific discourse. So a little digression, first of all, into what is grammar. I heard a definition recently that it was something that provides a model of how language is best represented in the brain and for many people, that's what it is. But for many other people, it is different. It is a model that provides the best account of how society construes the world. So it can be seen either as an account of structures and constructions or it can be seen as an account of systems of meaning. Now, systemic functional linguistics, language as a social semiotic is the social side of this. So for people who do this, grammar is a model based on evidence that provides the best account of how society construes the world and it doesn't have much to do with people's brains. And grammar is an account of systems of meaning. When we look at transitivity, Halliday proposed looking at verbs and accounts of things that happen as being broadly belonging to three kinds of thing. It sees the world as things that happen things that people do, things that animals do, etc. It sees the world as mediated through thought, perception, words, the things that people say or think or observe or feel and the world as relations between things, people and qualities. So examples of things that happen are the cut, the child kick the ball, the ball, sail through the air, the window broke, the parent scolded the child, the child cried. The world mediated through thought, perception and words might be, she told the child to stop, the child saw the window break, the parent locked the child. And the world as relations between things, people and qualities are examples such as the child had a ball, the ball was red and black, the window was made of glass. So three different ways of construing the world. Now, for many, a lot of the time, nouns are the things that construed people and things. And if you look at that the other way, people and things are construed by nouns. Verbs construed processes and processes are construed by verbs. And circumstances are construed by adverbials and conjunctions. And when those things happen, this is what Halliday calls congruent grammar. In other words, the grammar kind of matches our experience of the world. But when we have something called grammatical metaphor, the nouns construed processes and the verbs construed relations between processes. For example, here's an example I picked up a couple of weeks ago from the magazine, The New Scientist. It's a question and answer in there, question and answer page. Many birds are able to remember and mimic sequences of songs they hear including human speech. What evolutionary advantage does this give them? And the answer is the evolution of the advanced ability to mimic the bird song of other species is usually driveled by sexual selection. Then we have an example of the superb lyrebird with who has a sophisticated repertoire of songs and sounds. Now we'll look at the bits in bold there. Many birds are able to remember and mimic sequences of sounds they hear. The evolution of the advanced ability, etc. is usually driven by sexual selection. If we take the first of that, this is congruent grammar. Many birds are able to remember. The nouns are many birds, birds of things. They, they refers to the birds. Sequences of sounds, not quite a thing, but close. And then the third phrases are to do with remembering and mimicking and hearing. So basically doing things. But if we look at the metaphoric, the noun phrases here are for one thing very long and the main noun is not a thing but an abstraction. The evolution, something evolves. The ability, something is able to do something. Sexual selection, one animal selects another. Biological fitness, somebody is fit. So those nouns are coming from verbs or from adjectives. The exception of that is birdsong, which is the name of the thing. And then the verb phrases here are not about something, doing something, but the relationship between ideas. So evolution is driven by sexual selection. Sexual selection causes evolution. That's the relationship between them. The ability signals greater biological fitness. It gives us the idea, it shows greater biological fitness. So it shows the relationship between the two things rather than somebody doing something. And one of the great things I find great fun to do is changing metaphoric to congruent or vice versa. So we have the advanced ability to mimic birdsong. So if we were to do that more congruent, we would say some birds are able to mimic birdsong very well. The evolution of the advanced ability might be some birds have evolved so that they are able to mimic, etc. Then it starts to get a bit more difficult to know what the correct translation would be. Sexual selection, I presume is something like the partners they want to mate with. So the evolution is driven by sexual selection becomes, because birds select mates, some birds have evolved so that they are able to mimic birdsong. And greater biological fitness, I guess is something like some birds have more robust genes, but we're starting to move quite a long way from the original. But we could paraphrase the original metaphoric statement by putting together all those congruent bits. So it would be the evolution of the advanced ability becomes some birds are able to mimic birdsong very well, gives them the appearance of having better genes. And at this point I'm starting to become quite creative in deciding how to paraphrase the original. But what I'm essentially doing is changing those abstract nouns that construe an activity or an ability, changing those into the verbs and then moving the other nouns around them. If we look at the differences between the two the metaphoric version as you will have seen is quite inexplicit. The congruent version is more explicit. There's a lot of information to make it congruent. The metaphoric vision version is shorter. My congruent version was longer. And the metaphoric version is also probably more difficult to understand and there's evidence that unsurprisingly children grasp congruent grammar before they grasp metaphoric grammar. But all those things are really sort of stylistic things, length of the sentence whether it's difficult to understand or not. What also is important is that the metaphoric version construes entities which are central to biology, evolution, sexual selection biological fitness. These become things, they become entities in biology. Biology works by treating these entities as things and working out how they fit together. As Halliday proposed grammatical metaphor is central to the development of a science. Now Halliday did that work many years ago and before it was possible to quantify this except on very small amounts of text so you could work through texts and count the different kinds of clauses. But more recently of course with corpus linguistics we're able to do this on much larger collections of text. So Biber et al and other colleagues at Northern Arizona University have worked a lot on this. In the context of clause complexity and noun complexity and looking at really the kind of noun you get in these texts and the amount of pre-modification and post-modification of the noun phrases. If we look at something like the advanced ability to mimic the bird song of other species that's got pre-modification advanced and post-modification to mimic the bird song of other species. And in the examples I'm going to show you they compare registers, they compare conversation and academic writing and they measure progress through the university from first year undergraduate to postgraduate. So here are some results from comparing registers. So these are the language features on the left. This is the average per thousand words in conversation and the average per thousand words in academic writing. And you will see that the things that are associated with long noun phrases, attributive adjectives, modifiers, those modifiers are there within. They are all more frequent in the academic discourse whereas the things that are associated with more clauses are more frequent in the conversation. And if we take the second example comparing levels from first year undergraduate to level four postgraduate again per thousand words and these Asian nouns are nouns that end with things like T-I-O-N, I-S-I-T, all the things that are suffixes that make verbs in English into nouns. So those kinds of nouns, the adjectives the noun modifiers and the post modifiers with of they all increase in relative frequency from level one to level four. So all of this is kind of confirming without actually having started from Halliday's point of view nonetheless what they've done is confirm the importance of grammatical metaphor in academic discourse. So we might say one of the conclusions of this is that heavy use of grammatical metaphor is indeed an indicator of academic style but as I said before it's more than as it were just style. The nominalisations, the use of these particular kinds of nouns impose objectivity and they remove agency. They distance the science from the scientist or they distance the scientist from the science. And those nominalisations become the entities that make up science. Now I'm going to turn to the second idea and this is the notion of epistemic status. So this idea is that every proposition in any kind of discourse is evaluated in terms of its epistemic status. That is the nature and strength of its alignment between that proposition and the world. And sometimes the status of a proposition is by default. It's not actually signalled. So here are some examples of different kinds of sentence if you like. The first one is from a novel prize for anybody who can tell me where it comes from. Directly I began to cross the common I realised I had the wrong umbrella, the rain ran down etc. The second one is from a newspaper the forest fire in the sparsely populated area 50 miles north of Tokyo continued to burn yesterday. The third one is from the same newspaper from the opinion pages. The psycho drama engulfing the Scottish National Party is remarkable for a party that prides itself etc. The next one comes from an academic paper regression analyses were run and the last one comes from the same paper the learners in the two studies were very similar in L1 age aptitude and learning context. One way of thinking about this is what happens if you disagree with the statement? What does a challenge to the statement mean? Because they mean different things. So a challenge to the first one the novel is kind of irrelevant there is no point saying you did not cross this common though you did not see Henry. This is a novel it's not supposed to fit in the dictionary world so it's there's no truth value there. For the second one if you wanted to say this is not true it would essentially be you are lying there is no forest fire 50 miles north of Tokyo the next one a challenge to that might be that's your opinion I don't agree there is a psychodrama engulfing the SMP and I don't agree that it's remarkable but it would be different from saying you're lying it would be a disagreement the next one again if you were to say this is not true regression analyses were not run the consequence would be a lie but the final one if you said no the two learners are not the learners in the two studies are not similar in L1 age attitude and so on this would not be you're lying this would be I disagree with your interpretation of the two cohorts of students so they're very different and one thing that strikes struck me when I was working on this was how important epistemic status seems to be to people just ordinary people there's a couple of examples here there was a nature program on British television which showed polar bear cubs in a den and there was a tremendous furora in the newspapers because it turned out they weren't filmed in the wild they were filmed in a zoom so there was a bit about what the epistemic status of the film was I don't have time to go into this second example in great detail but it was basically a book that was written about the experiences of a child in a concentration camp during the Second World War and whether it was fictional based on true events or whether it was things that had actually happened to the writer of the novel and again tremendous fuss caused by which of those it was but I'm going to move on now because where this becomes relevant in digitizing this what I just talked about that epistemic status can only be it's a manual process to classify those instances but here we start to get status being marked by words like show and idea and observation and implication and finding and likelihood and these are the nouns, the verbs, the adjectives that mark status in English and they can be put together to construct a web of knowledge so here we have an example of a proposition all living organisms on this planet are descended from a single organism and this is labeled in three different ways as suspicions as evidence and as fact in the adjective clear and so we have a single sentence that tracks us through how this proposition was arrived at as being currently accepted and I'm going to miss that bit out so it's the avert alignment between word and world avert there means it's the alignment as stated by the writer we're not getting into the philosophy of truth here when writers write what status they confer is a matter of choice so the choice to call something a suspicion or evidence or a fact is a matter of choice it may be marked or unmarked but it's always there and that incidentally makes status different from other related concepts and status entities are again the building blocks of science so they're about the thing that science is as well as expressing opinion and when this has been quantified what is quantified is these markers of status because you can't quantify on large amounts of data the default status or at least I haven't found a way of doing it so one example is a paper written several papers but I've just picked one comparing a corpus of dissertations in material science and political science so we've got the difference between physical science where writers build on previous discoveries and political science where previous arguments are reinterpreted Maggie Charles took the noun plus that clause pattern so words like argument, conclusion, discovery, evidence that are followed by that clause argument that the conclusion that the discovery that and so on and she divided those nouns into different kinds so the kind that express ideas or arguments or evidence or possibility you'll notice that there are many more instances of these in politics than in materials these are averages per 100,000 words and that politics has more of the idea and argument and materials has more of the evidence she then also said what is it that's being evaluated is it the writer's own ideas or other people's and in politics it's mostly the other people's ideas and in material science it's mostly the self you see those you see the items involved so typical examples from political science is the assertion that nationalism and patriotism are incompatible causes complete confusion what's being evaluated here is somebody else's idea being evaluated as incorrect the typical example from material science based on those figures is that something that has been done provides good evidence that the bonding of the QD layer to the substratons is excellent which is an evaluation of the writer's own work own filing findings so this is one indicator of the representation of evaluation of propositions we see the difference in the disciplines and these are consistent with other accounts of differences in epistemology between physical science and social science now my second example is a piece of work that was done on a book called The Rough Guide to Evolution great book if you don't know it and the author was kind enough to give me a probably illegal text readable copy of the book from which I selected those verbs that are followed by that clauses so this was a simple task of all the instances of the word that look to see the verbs that came before them which I did a simple categorization into what is known and what is thought words like argue also called a word alignment or potential alignment so this is true or this might be true I also categorize the subjects of those verbs as being a person or a non-person a thing finding an experiment whatever it is I also categorize the verbs and came to the conclusion spoiler alert that what this writer is doing mostly is saying man or person proposes nature disposes people come up with ideas nature tells them whether the ideas are right or not we'll see that most the larger quantity indicate potential alignment whereas if we look at the non-human subjects the greater quantity is a word alignment not potential and if we look at individual verbs where you can have either one or you can either have Darwin showed or you can have Darwin's experiments showed then where the verb is a word alignment it's more likely to be a non-human subject where the verb is potential alignment suggest and assume it's more likely to be a human subject but if you look at both ways you get the same result so again what does this mean in terms of the typical well if you have human subject and potential alignment verb you get something like the earth is much older than biblical accounts allow and if you get a non-human subject and you get an alignment verb you get either something like these findings show the rift valley did not represent a barrier or fossils show the rift valley did not present a barrier to chimpanzee occupation and this pattern of person suggests findings show as the figures tell us this is repeated throughout the book and gives us although it's a popular book it does follow what you might call a standard scientific view of how things are done and inscribing a scribing the construction of facts to human ideas and then objective non-personal subject evidence now the third example is my longest example and it's the most recent and it comes from a project carried out on interdisciplinary discourse by a team led by Paul Thompson at the University of Birmingham and I've put at the bottom a reference to a book that Paul and I published last year with Routledge and I've also put a little reference there to Sketch Engine which is the corpus environment where we did a lot of the studies in case I forget to mention them again because they helped us enormously to do this study we built a corpus with the cooperation of Elsevier publishers and we used journals from Elsevier on the general topic of environment we had what we called the B11 corpus which was 11 journals from sorry that's not quite true but never mind almost true from 2000 to 2010 we also had the B4 corpus from 2000 to 2010 and we focused particularly on one journal called Global Environmental Change and we had access to that from 1990 to 2010 so global environmental change is one of those 11 which is why I said that's not quite true these were the people who were working on the project and I'm going to go in a little bit more detail now about the B4 corpus so this was four journals Plant Science a journal about agriculture and environment global environmental change and a journal on economics and environment and we selected those four because two of them are broadly about physical science and two of them are more social sciency two of them are single discipline Plant Science and REE and two of them are interdisciplinary AEE and GEC and we determined the nature of the discipline using Elsevier's own way of mapping journals and disciplines we undertook a lot of different methods from the qualitative to the quantitative or if you prefer from the text base sitting reading things on a piece of paper write the way through to the highly quantitative digitised methods including topic modelling which was the subject of the preliminary talk yesterday but I'm going to start with carrying on with the study of status markers because I was able to undertake a study of status markers nouns, verbs, adjectives in the B4 corpus many of them were unevenly spread so for example 66% of the noun perception is in GEC 60% of the terms of deduce is in Plant Science 60% of assumption is in the economics journal and through a bit of not very clever statistics looking to see which of these words were very distinctive of particular journals it was then possible to quantify the number of distinctive status markers and you've got this at the bottom of the screen and really the only thing to notice here is that GEC, this interdisciplinary journal is the most distinctive it has the most status markers that are most frequent in that journal if we look at the status markers that are distinctive they're very frequent in one of the journals and not the others the physical science journal Plant Science its markers relate to the empirical research process things like analysis, confirm, determine, hypothesize indicate, observe in the economics journal you've got distinctive markers relating to the mathematical research process which I believe conclude etc AEE the agriculture journal actually doesn't have many distinctive markers but they really seem to relate to judgments about the future recommendation, estimate prediction and so on and GEC the most distinctive one has markers related to research theories so not so much in that case not so much processes but theories, ideas, notion theory human thought and communication agree, believe, decision, opinion risk, danger, risk threat research processes and reflexivity which I'll come back to in a moment and if we compare the word observation about what you see what you see is aligned with the world the proposition is aligned with the world this is more common than the word argument in three of the journals but the word argument which is I am constructing something I'm not looking at something I'm explaining interpreting theorizing about something which is more frequent proportionally in our social science interdisciplinary journal interestingly observation turns out to mean not quite what thought it might mean because observations can be the foundation of knowledge as in these examples from plant science so it's things like the observations indicate that the observations agree with the observation is evidence of the observation is taken for granted and it's a foundation of other parts of knowledge but in these other journals observations are part of a debate so if we look at this example consistency with herders observations you get the observations of non-scientists that are brought into connection with the observations of scientists or the measurements of scientists you get this rather curious example the one above it some evidence supports the observation that organic fertilizers diminish herbivorous insect populations which is kind of odd because it suggests that that observation about organic fertilizers still you get other evidence to support it and then the final example on that screen is something which you might say is not an observation at all the fragmented nature of social science research is an opinion really but it's being labeled here an observation which gives it a higher epistemic status and we can look at things like the adjectives that come in front of argument and in particular contrasting the domain related arguments economic argument, scientific argument which are used when talking to people outside the discipline in these interdisciplinary journals as opposed to these specifying arguments which are very disciplinary internal the acid growth argument the guilt argument, the indignation argument you have to know the discipline to know what those things are and one interesting contrast between the two interdisciplinary journals is if you take something like scientific arguments the agriculture journal the sciencey one usually takes the scientific arguments which are used at face value so are there scientific arguments in favour of domestic support to agriculture whereas the GEC examples are questioning scientific arguments the validity of scientific arguments is up to the dispute and this led on to us looking at what we call disciplinary reflexivity in the GEC journal because it would appear that writing for an interdisciplinary audience can focus attention on the notion of the discipline disciplines are mentioned, compared, commented on and challenged in the interdisciplinary journals much, much, much, much more than in the interdisciplinary ones so one of the very qualitative studies that we did was simply reading the introductions to the articles and having examples which are conciliatory in the first set so is there a dichotomy between indigenous knowledge and scientific knowledge? No, there isn't we who do social science should be cautious in moving too far from the philosophy and structured experimental approaches that characterize science or they can be these introductions can be antagonistic something cannot be adequately understood relying on science alone positive science which pretends to maintain back to this notion of science not really being as objective as it pretends to be and if we look at just those words science and scientific, you will see here how vastly more frequent they are in the GEC journal than in any of the others so a science journal like Plant Science here doesn't talk about science or scientific very much at all sorry those figures are per million words and we can follow that up but I'm going to turn now to topic modelling which we also tried tried out not really a standard procedure in our field but we decided to do it and I don't have to tell you what it is because the excellent paper yesterday by Christophe Malataire gave you all the detail that you need to know I'll just add though that something that we did was a bit different is we had a much reduced stop list so we didn't try and exclude all grammatical words because we believe that grammatical words are useful contributors to topics we worked with 50 topics and I was interested to hear that Christophe also it was trial and error we looked at 10, 20, 30 all the way up to 100 but the most important thing we did differently was that instead of working on whole texts we split everything up split all our files up into text chunks of about 300 words and then to the next end of the paragraph so you're talking about two or three paragraphs not whole papers our results was we found topics that occurred and we looked at the processes in the paper and we could do this because of our method we looked at topic change over time one thing that I think was particularly interesting although not going to talk about it today is there were papers that had a lot of a particular topic in them and other papers had a broad spread of topics so this seemed to relate to the notion of being the focus paper or the broad discussion paper and we had to try and classify what kinds of topics we were coming across so these were some of the topics that we had and you'll see some of them to do with the natural world some of them to do with people some of them to do with political things and we looked at here how the reduced stop list introduces some grammatical words here and then really quite odd things like more than less not greater which we said were actually part of the research process so this is the part of the paper that brings together the empirical research again just some examples of the topics that we found from GEC from global environmental change and one interesting thing that we could see was that the same words or the same kinds of words occurring in more than one topic and so indicating how when they were in different papers they gave you a different sense of what that thing is about I've said that extremely extremely incoherently this is part of the problem with this topic modelling that these topics are not actual things and you have to kind of interpret what a great long list of words actually means so agriculture can occur in the context of crop production soil food yield increase fertility use plant which is a sort of industrial agriculture I guess or agriculture as part of the human activity of farming raising cattle livestock pasture etc and then right down to what you might call the local or household level with this topic here where it's from the more industrial to the more personal household based view of agriculture so there isn't a single topic that is agriculture but there are these nuanced topics and another thing that was very apparent was that a lot of the topics had human involvement in natural entities so the topic that had the word river in it also had irrigate and irrigation forest had conservation sea had flood and impact so the way that these entities are being construed in the papers are in relation to human involvement and then there was a predominance of risk and mitigation so again a lot of topics which included a natural thing like water and stress environment and problem sea and protect and loss so risk and mitigation in relation to natural entities and we had topics that coming back to the notion of discipline reflexivity that included knowledge in their list of words this is just an example of how we showed topics that changed in frequency over time so this set of topics was very prominent in 91 to 95 with planning and agenda but here we're having to give a single name to a topic which is in a sense a bit it's very subjective the name you give to that topic that's just showing you that it can be done I suppose. Incidentally you'll see here that we've got these things that we call topical topics and then these other things and the other things are to do with research processes whereas topical topics are to do with the content which is something they're actually talking about now I want to now diverge really from what I said I was going to talk about because I did just want to introduce this other thing that we used in that project just because it is a common digital procedure to carry out and I thought it might be of interest to the participants in the conference and it's the use of something called multidimensional analysis developed again by Biber the same Biber as I mentioned before first in 1988 but then in many publications since the way this works is that a corpus is tagged using features of Grammar and Lexis and the corpus will consist of different registers factor analysis is then used to cluster those features positive and negative in each factor and the factors are interpreted stimentically or stylistically and they're given names which are dimensions and then the registers in the corpus are compared along each of the dimensions and what Biber showed was that if you look at two things like conversation and academic discourse as we saw before they will differ from each other more or less depending on which dimension you take and so that's the way that Biber did it we did it a little bit differently because when we looked at a particular journal we didn't have ready divisions within that journal so if you've got a corpus that's made up of different registers you obviously compare one register with another our corpus was made up of different journals and one of the things we did was to compare the journals with each other obviously but then we also wanted to see if we could from the bottom up derive groups of papers from within a single journal rather than start off by saying we're going to divide the papers in this journal into different groups to start with so we did that by obtaining the dimension profile for each paper and then we clustered the papers based on the dimension profiles and we got what we called constellations because we were running out of the capillary at this point so this is just to give you an idea of this these are the features that the corpus is tagged with this comes from Biber's tag set and in fact he helped us to do it and he tagged our corpus for us using his own tagger and you'll see it's an eclectic set of features that include some grammatical things sorry some grammatical things and some other things so you've got for example verb the uninflected verb you've got the word but you've also got semantic categories like activity verbs you've got first person pronoun which is very clear but things like cognitive nouns so this was one of our dimensions so these things are brought together by the factor analysis and then you have to interpret what that means we did that partly by looking at the list but also by reading the papers that had high loadings positive or negative loadings and the interpretation we came to was that this dimension is at the positive end system oriented and at the negative end action oriented so the high scoring papers are not about action they are about descriptions of systems models abstract concepts and low scoring papers are oriented towards actions what people did at particular times here's an example from a higher scoring paper and the features that are in that list are in blue here the abstract nouns and the other things there and if you just glance at that you can see that it's an abstract paragraph this is a lower scoring paper so it has some high scoring features some of the high loaded features in blue and some of the negatively loaded features in red and this is about somebody doing something essentially the constellations we ended up with six constellations and you can see here it's the statistical outcome of the analysis on each dimensions so cluster one for example has 118 papers in it and the average for the paper along each of the dimensions 145.6 are shown there in those boxes and viscous presentation and then again your job is to look at those and think well what does it all mean and this is what we thought it all meant the constellation one are papers to do with quantification and measuring constellation two to do with quantification and so on through our six constellations there was diversions between constellations so the ones that diverged most were one and five so constellation one has a focus on the physical argumentation is implicit it talks about specific sites of human environment interaction and there is quantified data about changes in the environment constellation five focuses on the abstract the argumentation is explicit it gives you human perspectives on the environment and social perspectives on physical science so within this interdisciplinary paper interdisciplinary journal we can identify those that are take this different perspective on the topic of environment we've done it through calculating the co-occurrence of the features so this is an example from constellation one and it's about things that happen in particular places from an example from constellation five which is about the personal and about human beings reacting with the environment so our constellations were partly about style but they were also about content what are these papers about we could then do other corpus searches and calculations looking for example at the color cuts of the word environment so we looked at minus to plus three color cuts of environment in sketch engine and in constellation one it's obviously quite a long list but in red you can see the color cuts that are about the natural world features of the natural world the environment constellation five has some of those and it also has some other things of the same like departments and things like that but also in green has these words that are about the human reaction to the environment human interaction with the environment protecting having a campaign care protect responsibility and so on so also I've been talking about I've talked about the way that language studies tell us about style they tell us about the distinctive features of a particular kind of writing the relative frequency of particular linguistic features and these stylistic features are essential to the maintenance of communities of practice in science as people like Highland have said but I think it's important that they also constitute the things that make up knowledge I've talked particularly about the nominalized processes the evolution the fitness the selection of the things that make up knowledge but also labels for all knowledge entities those epistemic status words like arguments and findings and evidence I've talked a bit about how ideas that came up before digital methods were widely used have continued to be tested out and quantified digitally but also how we've been able to use data-driven and quantitative methods to come up with new ideas as well that's the end of my talk my slideshow has some references on it as well actually I'll just leave that one up Fantastic thank you so much this is really neat stuff I am going to once again take chairs prerogative while we wait on questions to come in through Crowdcast as people collect their thoughts since I get a little bit of a preview thanks to the tape delay I wanted to ask about something that I'm sure you're thinking about because it sits just behind so much of this work in the global environmental change journal it seems like in some sense one of the things that your methods letting you see in a very profound way is the sort of inherent friction in doing interdisciplinary work I keep asking questions to people like this but I just want to invite you to speculate on to talk a little bit about that because I know that's got to be something you're thinking about because it's exactly what sits underneath these if everyone thinks they're talking about the same thing but then they're justifying it in radically different ways and the structure of their arguments what do you think about that? Yes, you're absolutely right it became very apparent that our two interdisciplinary journals the agriculture science one and the global environmental change social science one were very different in the way they approached interdisciplinary work and we became quite influenced by the work of Barry and Bourne on different questions and what I have referred to in this paper as the conciliatory as opposed to the antagonistic so you did have people who are saying we must work together we must find the answer and then people who are saying science has got it all wrong and I'm exaggerating to some extent if you want a personal reaction I ended up being quite annoyed with social scientists actually and thought oh for goodness sake get over it we need the scientific evidence as well as everything else but yes that global environmental change does have it has quite a lot of that I have to be careful here it has proportionally quite a lot of that discourse but actually also it has a lot of articles that aren't like that so because of because we were doing the figures I've given you today are sort of global figures across the corpus and when we have time to go into more detail in the book we what we could do was saying look within global environmental change we can pick out a subset of a few dozen papers which really push this anti-science agenda and the a lot of the other papers don't so much so although that is a journal that gives room to that kind of discourse it doesn't necessarily the dominant kind of discourse even within that journal sure yeah that's always an interesting significant ratios you always have to remind yourself right that you're still dealing with what might be an absolute sense of small number question from Sarah Davies coming in this is a great question extremely interesting to see research writing unpacked I wonder how this affects your own writing at all do you make particular choices and how do you write now based on based on your findings yes yes and no or no and yes when I write something I don't think about this when I go back and reread in the way that you do when you're editing I start to think I've used this particular way of saying things here or maybe and I'm terribly conscious of whether I'm using the word assumption or hypothesis or I can agonize for ages over whether something is a notion or a concept or an idea but I did it in my presentation when I was going over it this morning I thought oh yes I've used a nominalization on that slide when explaining nominalization that's great another question here from Catherine Steven who asks did your findings tie in with any of your assumptions predictions or hypotheses about how language might fit into the philosophical approaches or paradigms adopted by various disciplines or were you surprised by anything when you when you started to analyze this corpus I know very little about philosophy of science I have to say and okay let me let me answer that question in two ways when I started to work on the language of science which was I have to say a very long time ago probably before large parts of the audience were born it was assumed that scientific language was entirely objective and had no subjectivity in it and that whereas science might have subjectivity because it was the days of Gilbert and Mulcage investigating discourses in science contingent and empiricist discourse it was assumed that when things got to publication they no longer had any subjectivity in them and my work was about saying even this language that looks very objective is actually full of evaluation so that epistemic status is an evaluation of status so when you say so and so argues that you are evaluating the proposition as being only somebody's argument and not the truth I mean it sounds obvious but it wasn't so obvious at the time so I think that was that was kind of a surprise but on the other hand you could say well that evaluation is expressed in a way that is consistent with what people say about scientific discourse which is that it prioritizes the objective so it's kind of the objectification of the subjective if you like when you come to work when you come to work which compares disciplines I would say that on the whole what you find is what you expect so if the work that Maggie Charles did on material science and political science if you start from the position that material science is going to be people do experiments it's a set of experiments built on the first set of experiments and we all go in a direct straight line towards the truth whereas political science is going to be oh that person said this and I'm going to reinterpret it and I'm going to disagree with it and that's how social science moves forward what she found was exactly that and so in a sense it wasn't a surprise to justify exactly how that happens putting together each individual material scientists and political scientists and adding them all together this is how it works when we came to the interdisciplinary study we were really batting in the dark we had no idea what we were going to find or indeed how we were going to find it most of what we wrote about it was actually how do we adapt methodologies to go into this new thing but I think it's always it is both disheartening and heartening to find that other people have said the same thing so when you find what you think is wonderful and then you discover that the philosopher of science has already said this but they've said it in a general way and what you're doing is saying yes and this is how when people sit there putting pen to paper or fingertip to keys this is the mechanism by which what you've observed happens sorry I don't know if that answers the question I seem to have gone on for a long time I have a question coming in I was very interested in the markers of epistemic status that you mentioned in the middle of your talk did you identify and quantify them by close reading or through text mining methods did you develop up front a list of such markers that you specifically searched for we started developing a list of those kinds of markers like we're maybe reinventing the wheel well what, because when the markers are nouns or verbs or adjectives they are always I say with inverted commas because you always have to have inverted commas around always don't you they are predominantly words which can be followed by positive that clauses so things like it shows that the idea that it is clear that all of those words are followed by that clauses now in some work that was done not to do with science at all but in the field of lexicography back in the 1990s with colleagues we produced actually believe it or not lists of all the words in all those grammatical patterns in English I can send you the reference for that if really you're wanting ways of getting to sleep but the good news about that was that we ended up with a list of words verbs, nouns and adjectives that are followed by a positive that clauses and we could then do a search and I didn't actually do this but it was Paul Thompson who somehow magically did some kind of algorithm that found in the corpus all those words whether or not they were followed by that clause because that's also quite important so I can follow that up offline if that's of interest so it wouldn't get everything in adverbs for example but it would get most things let me just piggyback on that just because I'm interested in something that came up in the context of your answer what do you see what is the meaning you ascribe to the use of those terms when they don't get followed up by a positive that clause oh well very often they are very generic so it would be one of the examples I have that I can't bother to find now was things like the implication of these findings is that so if you only search for implication followed by that you won't pick up that that example but also you've got the phrase these findings and you've got a way to previous sentence paragraph whatever even different paper but they're still indicating the status of the thing being referred to so this is why we do things it's a different notion from signalling nouns where the reference has to be in the immediate surroundings this is where it's a bit different I mean we had to be very careful with the word perception which is sometimes in plant science for example the perception was often just plants don't perceive do they anyway whatever it was I can't remember now but it was people and animals perceiving as opposed to my perception of this is that yeah that's funny oh yes a question here from Virginia Petrovich who asks do your findings about scientific language match up with the advice that one might find in a manual of professional scientific writing so is there a training information question here sorry I cut my own mic too early more generally kind of yes and no I think I have to say this is not a study that I've carried out although I know people have made comments on this the comments are very often along the lines of people writing scientific papers use I much more than style guides tell them to so that's always the the advice it's offered I do find because I work a lot with PhD students for writing thesis and one thing I find is I am constantly telling them be careful which which report verb you use there is a big difference between saying somebody says that and somebody insists that and somebody claims that these are not you know you've gone to your dictionary and you've found a whole load of words that you think mean the same thing they don't mean the same thing be careful in your choice of word when it comes to nominalization and grammatical metaphor I find sometimes I'm telling students this sentence is too difficult make it simpler by changing the nouns to verbs and the other half of the time I'm saying this sentence sounds too simplistic make it better by changing the verbs to nouns so you know I'm a terrible person to have us a supervisor that actually I wanted to an inchoate question but I wanted to ask something about this metaphor idea because this also of course lines up with as I'm sure you know at this point at least in a glancing way a lot of literature in the philosophy of science thinking about conceptual foundations what it means to construct theories structures of theories and this this kind of thing and I'm really interested by exactly this idea that this might be a process that is sort of corpus level visible if you will and so that I guess one way to ask this question would be to say where does that kind of work get done in your experience so you found that in these interdiscipline you found a lot of this of this differences in language patterns more abstract less concrete type interactions in an interdisciplinary context and that makes that makes good sense have you picked up for instance sort of self-reflectively more theoretical journals do people do the same kind of thing in those sorts of contexts like where can we catch scientists if plant science is just filled with you know and then I cut the plant into tiny pieces and I put it under the microscope where can we catch scientists doing more of this this more abstract work now I think this grammatical metaphor is everywhere and I mean it wasn't something that we looked at specifically in our project but I can pick you any you know you pick up any journal on any topic any academic journal and it'll be just crammed full of this of grammatical metaphor it really is absolutely pervasive I mean I bought an issue of the new scientist and that's not an academic you know it's not an academic journal but still as soon as scientists write they're into the grammatical metaphor you just can't get away from it and Halliday talks about this he talks about it in the context of scientists complaining that English doesn't have the resources they need to express the vibrancy and the organic progression of their discipline so I'm quoting him now he says that scientists say English doesn't allow me to talk about the process-driven nature of my discipline and he says well actually it could do if you use if you didn't use a scientific style when you're writing what's happened is that scientists themselves have changed English and instead of using it in a verb-dominated way they're using it in a noun-dominated way and he uses the metaphor of crystalline versus choreographic style he actually also uses two Greek words Attic and Doric but I can never remember which one is which so I don't use those the crystalline is where everything is very static and that's where you go noun, noun, noun and the choreographic is when it goes verb-verb-verb and you've so you've got more clauses and shorter noun phrases but it's worth reading Halliday on that because he says it he says what I've said but a lot better that's really interesting yeah I'll definitely pick that up good question from coming in from Christoph Malatter who asks I'm even going to broaden his question a little bit and then I'll ask it exactly as he wrote it but let me start by asking so just have you applied this kind of analysis as well to humanities research as a separate kind of journal writing so he asks you know I wonder whether the metaphoric versus congruent properties of discourse would pick up a marked difference or not between the classic distinction between continental and analytic philosophy or between sort of older styles in humanities research versus current style that people often accuse us of sort of having become more scientific in recent years in our journal writing no is the short answer I haven't done that I do find writing in the humanities fascinating and I think certainly should be studied I do have a colleague who's worked on something similar in relation to comparing history with English literature journals I'm showing a difference between the two I can only say that let's say again grammatical metaphor is everywhere and whether you look at history or anything else it is absolutely there but actually quantifying it no I don't know and that suggestion about the two different kinds of philosophy would make a very interesting study I think so thank you for the suggestion very cool oh another sort of oh this is yeah good question so another question from from from Christoph as well so this this bridge is a bit back to the question about about training although here from a different direction so do you know whether there's been any work maybe in psychology linking the way that people produce these kinds of these kinds of grammars to to personality traits or other characteristics of authors of the of the people themselves and again the short answer is no I don't know whether any work has been done I certainly don't know of any work like that that's been done there has been work on different cultures so there are I know there are papers comparing for example people from China writing in English and their use of kind of markers of epistemic status and linking the differences between that and a similar group of people from the US the UK writing in English and linking that to perceived cultural differences but individuals I would be surprised if there were individual differences just because these things can tend to get knocked out of one by editor review and reviewers and things like that yeah strong strong homogenizing effect of the discipline for sure yeah I had a question from from Beckett Sterner who asks I'm curious how the rise of big databases like WordNet which get used heavily in computer science can interact with more fine grain studies of terminology and linguistic patterns so have you seen good examples of how expert analysis can augment or improve some of these more big data based lexical knowledge resources I think there's been a little bit on improving WordNet by including phrases because one of the things that comes out of the kind of work that we've that's being done in corpus linguistics is the importance of phrases rather than words that meanings belong to phrases and not to words so there's been that influence I think but sometimes the view is that WordNet starts from the wrong you know I wouldn't start from here therefore I'm not going to improve it kind of that yeah that's a that's always a risk let's see let me see if there are any other questions coming in over the chat at the moment I don't see any so I will will I self-indulge again I probably shouldn't I'll stall instead okay I'll be I'll be self-indulgent one more time one so that actually this connects a little bit to something that was just being mentioned I wonder to what extent you think so how much of this these changes in patterns in epistemic status markers I guess I'm trying to what I'm trying to grasp my way towards is a question about is a question about causation that is to say so why what exactly is it giving rise to the difference that is to say is it that the scientists themselves genuinely have something that they want to express that they're really trying to this is this is what they actually believe is the it's the epistemic attitude that they really have toward the proposition in their head is it more sort of disciplined membership signaling is it really being pushed in by editors or even copy editors in the in the journal publication process do you have any any sense for sort of how to the publication process is long and complex and thorny and I wonder if you have any sense about breaking it down into its smaller parts I think to know that for certain you'd have to do a study of drafting and redrafting and I've seen that done a colleague of mine did that on and a distance master's dissertations and what happened to students rewrote their draft after draft of their master's dissertation and the role obviously all sorts of changes this was one of them that they changed the way that they expressed their attitude towards the propositions that they were writing but I don't know I don't know of any study that's been done on broader scale on that it would be interesting I like to believe that people express it because they that's what they think because that's that's what they means so choosing to say so you know somebody shows that means that you are aligning yourself with that view you can't say so and so shows that and they're wrong it's got to be they show that and therefore they are right it's like saying as Darwin says you can't then disagree with Darwin so I like to think that people are in control of what they're doing and that they don't change that because some editor has told them the exception to that of course is this is an international process and I was talking to a writer who had quoted me and said Honston claims that and I said oh so you mean I'm wrong then it just meant argues that so these things are always these meanings are not as fixed as sometimes we pretend they are yeah that's true I'm sure we could have a whole separate conversation and it's come up several times over the course of the conference so far this idea of internationalization by multilingualism the questions of translation of course that's just a whole separate can of worms for another day unfortunately as we are at this point out of time thank you so much this was really fantastic I really very