As an undergraduate I was taught assembler by a grey-haired lady. My pal called her Mrs Brady, Old Lady like the one from Viz so often that I can’t remember if Brady was actually her name. I do remember that in the labs we would recite what she had said in lectures: load the accumulator in a robotic voice, LDA X etc., and laugh hysterically.
I had forgotten all about Mrs B until the other day when I was giving a mini-lecture called Writing functions in python and all the lovely memories came back: the lecture theatre, her grey hair, her voice and I had to suppress a laugh as I realised that I am now the Mrs Brady, Old Lady at the front of the room speaking in a robotic voice with grey hair.
I think she lectured on acetate slides, but I was completely overexcited to create slides in Jupyter Notebook with a RISE extension, using the notes I had typed in WordPress and converted to Markdown and then I put python code into the slides too so that I could execute it during the lecture and if I had needed to write anything new to test out during the lecture, it would have all been in one place and easy to do whilst not losing my students’s attention. How amazing is that?
Writing a function in python
The thing is, python does actually remind me a bit of assembler in that you can robotically load the accumulator and get immediate results, in fact it is really easy to behave like a crazy cat banging away at a keyboard without any idea of what you are doing.
However, I was taught great coding and software design as an undergraduate – the likes of which I have never seen since on any course anywhere ever since and I like hanging about universities so I have seen a lot of teaching and how to code courses.
I love thinking about good code and software design and how to teach it. Occasionally, I watch Agile Bob Martin on YouTube – SOLID principles of OOD – because he has nice grey hair and it makes me laugh to remember how, as a child when I was ill, my mother used to say: Oh you need a Bob Martin’s which I found out years later was a tonic for dogs.
Taking car 2 to the garage
When preparing the mini-lecture, I googled about for a function I could snaffle as I was pushed for time, but soon gave up, as a lot of the python code I’ve seen online could totally do with a Bob Martin’s. It seems that because python is so easy to get started on and people are dealing with a computer, they forget that the code needs to be legible to everyone involved, and they do silly things like define their variables as car 1, car 2. Imagine, if I had two cars, I would never say to my husband: I am taking car 2 to the garage. I would say: I am taking the jag to the garage. Nor would I say: I am picking up child 1 from football, or I have fed cat 2.
So why are people saying that in python?
I have blogged before how I like to think about how the user and the computer is having a dialog, in the hope of making a new discovery. When we talk to someone, or write a letter, blog or essay, we don’t say everything we know all in one breath, we structure it, and we present one idea at a time, otherwise we only end up repeating ourselves in small blocks until we are understood. In the same way, when we write a function, we don’t want it to do everything all at once, we want it to do one thing.
Therefore, perhaps we can view the function as a sentence in an essay. It says one thing, it makes one point and then we use more sentences to say more things and add more points, until it adds up to a whole essay.
The function as a sentence
A function is a small piece of code. The idea of grouping lines of code together is to define the scope of your problem and the reach of your variable, so that variables inside your function are only recognised locally, and variables defined outside of the function are recognised globally, that is inside and outside of your functions. In python defining a function looks like this:
def name_of_function(parameters) # some code here which takes the parameters and uses them to make value return(value)
Where def tells us that we are defining the name_of_function and then passing it some input (parameters) which the code inside will use in order to find something out and return an output (value).
DRY = don’t repeat yourself
We then move onto the next function to do the next thing, in the same way in an essay, having stated one fact in one sentence, we move onto another sentence to express another idea or state another fact. We don’t repeat ourselves (DRY) and we don’t try to express more than one idea in a given sentence.
So far we have used pre-defined functions, each one does one thing. So, when we want to write to our console, we call the print() command, it ‘prints’ what we want to say to the screen:
[] print('Hello, Ruth!') [1] Hello, Ruth!
Print() only does one thing, it doesn’t print out a string and give the time, time is dealt with by another function, a different function, we wouldn’t want print() to print out our string and give us the time, that would be annoying. Imagine:
Hello, Ruth! 12:34:22
So, we are precise and DRY and our function is a building block, rather like a sentence in an essay. When we write an essay we structure it, we have an overall theme, we have a beginning, a middle and end. Each section is made up of paragraphs and each paragraph is made up of sentences, each making one point. Our sentences are the building blocks which do one thing in order to construct meaning. In python, once we have our function, we then write more and eventually we create modules and packages. Python or essay writing, we make sure our function or our sentence works. We make sure it is clear, it doesn’t repeat any other code or sentence and we add more and more functions or sentences to build up a body of work.
Sentiment analysis #marmite peanut butter
Sentiment analysis is big business nowadays for companies, simply put data scientists capture and interpret what people are saying about a company’s product so that the company can use the information to inform future strategy.
So, I thought for our first function, I would perform some sentiment analysis on behalf of #marmite peanut butter on Twitter.
I bought a jar the other week and quite enjoyed it but wasn’t sure if I would want to repeat the experience so I took to Twitter to harvest other people’s opinions.
Inspired by the last time I blogged about sentiment analysis, I decided to interpret emoticons. There is a whole list here, though it’s not as extensive as the ascii range on WhatsApp. They have amazing ones like this: (@_@). Anyway, I digress. I decided to keep it simple when I got sentimental about #marmite peanut butter.
marmite_love_hate(tweet)
First of all I declared two sets of emoticons love and hate, I kept them quite small so we could see what we were up to. Then, I wrote a function marmite_love_hate(tweet) which reads in a tweet from the command prompt and then decides if the tweet expresses love, hate or is neutral about #marmite peanut butter.
I could have written my code to say if tweet has 🙂 then return ‘happy ‘ elif…. else and so on, however, the logic is more stylish if you do the union of the set of the tweet and the emoticon sets. That is, using the set command, put each element of the tweet into a set and then see if any of its elements overlap with the elements in the predefined sets of emoticons_love and emoticons_hate. So, the function looks like this:
emoticons_love = set([':-)', ':)', ';)']) emoticons_hate = set([':{', ':<', ':(']) def marmite_love_hate(tweet): #split the string into words and puts them into set words= set(tweet.split()) #counting up the number of emoticons num_positives = len(emoticons_love & words) num_negatives = len(emoticons_hate & words) result='neutral' #defaults to neutral and makes the logic neater if num_positives > num_negatives: result='love' elif num_positives < num_negatives: result='hate' #returns the result return result
And by putting the emoticon sets outside of the function, they could be used in other functions.
Then I wrote called the function from the command line and fed it a tweet.
It worked really well for:
marmite_love_hate('Is this not the best breakfast one could enjoy? ;)')
Result: love.
marmite_love_hate('Ugh, I hate marmite it makes me want to vom :(')
Result: hate.
marmite_love_hate(' Trying #Marmite peanut butter for breakfast. I could take it or leave it. #ambivalent')
Result: neutral.
However, it fell over at this:
marmite_love_hate(' Ugh, I hate marmite it makes me want to vom :( Please don't hate me :) ')
which was part of my cunning plan to get the students to think about creating meaning and how we do we decide what meaning is and how do we then reason with the meaning we are creating.
In a computer we only have two states 1 or 0, which turns out, is how exactly how us humans and our brains function, and the opposites of on/off, yes/no, light/dark is how we create meaning. If we want to do anything different we could think about fuzzy logic and redundancy or the excitingly named quantum computing, where we have a computer in multiple states simultaneously using techniques such as entanglement. But, let’s keep it simple for now.
So, we could take all the words in a tweet set and rank them from 0 to 1 like this: 0, 0.1….0.9 where 0.1 is representative of less hate than 0 and 0.9 is not as loved up about #marmite as the 1 of love. Of course, this is again simple, and in the lecture I actually said 1 to 10 instead to 0.1 to 0.9 so that people felt less put off thinking in numbers as it is the meaning we give which matters, albeit, in numerical form. Assigning meaning is something humans do so easily, computers less so, especially given when you get down to the machine code level.
Hello! looks like this: 01001000 01100101 01101100 01101100 01101111 00100001
And, this is how we segued easily from our first function into talking about extending our program into more functions, and we talked also about whether Twitter was the best place to draw our data. Is it representative of the #marmite eating population? I am on Twitter (sort of) and I love #marmite, and am neutral about marmite peanut butter, but I didn’t shout about it on Twitter, so how would I count?
Would I have to change my behaviour and start sharing opinions about #marmite on Twitter if I wanted to be counted in the sentiment analysis process?
Isn’t technology supposed to help us, not change us? Again I don’t want to be codependent on my phone, or Twitter, or anything, yet I fear it may be too late. In this blog of four years ago I said I didn’t need a phone, but now because of the way companies have embraced technology I now need one to park my car, check I am not missing GP appointments, get messages from school and so on. Is this progress? In a word: NO (01001110 01101111).
Extending our function
So, we have our one function, we could improve the logic to cover a whole range of tweets. We could have a function for each emotion and then, we could write more functions to:
- pull our tweets direct from Twitter (Tweepy and Twitter APIs),
- read and write to files, because you need more than three tweets,
- and once we have our tweets in files, we need to include functions which do data munging, which is the term used when we clean up that data and figure out whether to put in spaces, or throw away tweets which don’t make sense,
- explore different ways of expressing our logic and our interpretations to get different uses,
- analyse the tweets words and create translate them into numerical values on the fly, as there is no way we would have included all the words,
- we could even have functions which ask a human to numerically define a word say like blergh because a computer can’t do it alone. It needs human-computer interaction.
The possibilities really are endless and the more modular the code, the easier it is to change your mind or run it different ways and add or delete code.
I’ve said all over this blog that technology is an extension of us, and it really is just a tool. It cannot understand anything without great human effort and input, we need humans to have that all important dialogue with a computer to give meaning.
Luckily I love to talk computers and #marmite and meaning, and then write about it here.
Huzzah! (01101000 01110101 01111010 01111010 01100001 01101000 00100001 00001010 00001010)