VALA Tech Camp - Introduction To Python
A workshop intended to introduce Python to people who are new to programming!
Press "t" to toggle showing the table of contents
Objectives
Concepts
- What is Python and what can be done with it?
- What is special about Python as a programming language?
- What are Python libraries?
Outcomes
- I can use Python
- I can read Python documentation
- I can use Pandas to
- Read and write data
- Select particular data
- Investigate data
- Add simple information (colums) to the data
- Link data*
- Add complex information (from different locations) to the data*
Table of Contents
Resources | Resources available and required | 5m |
Welcome | Welcome! | 5m |
Getting Acquainted | Variables, types, basic functions | 30m |
Getting Cosy | More functions, control flow | 30m |
Pandas | Pandas: Making sense of data | 30m |
Funtimes | Experiment with what you've learned! | 15m |
Thanks | Thanks and goodbye | 5m |
Required Resources
We will be using previously set-up computers in the cloud. You have been given a personal URL to use through-out the session.
At the end of the session we can help you set up Python on your computers.
Other Useful Resources
Python
Anaconda How to get Python on your computer.
Weekly Python Chat Interesting weekly videos of all things python. They go back quite a bit so you can usually find things you are working on.
Talk Python To Me - Podcast Really nice to listen to people discussing topics in Python. Gala's personal fav is 20 python libraries you aren't using but should.
Improve your python skills Nice blog-posts about how to squeeze the python juice.
PyConAU 2017 Python conference in August
PyLadies Meet-up Monthly talks aobut Python in the city.
Cool Reads
Silverpond Blog Noon works there :) All things Deep Learning and AI
Thick! Great blog, also great newsletter!
FiveThrityEight Super cool use of math, stats, programming, data-viz opinion articles.
Other
Welcome
Welcome to the Python Workshop!
About Your Instructors
Gala & Noon
About You?
- Interest in Python?
- Goals?
Why Python?
Getting Acquainted
Wordbank
notebook | Value | Variable |
Type | operator | tuple |
list | integer | decimal |
function | Evaluate | |
argument | String | range |
keyword | syntax |
Open the notebook entitled "Fundamentals" on your provided
server.
Variables, Values & Types
- Values represent "things" we want to deal with:
- Names:
"Noon"
,"Gala"
, .. - Ages:
34
,30
, ...
- Names:
- Variables hold Values and can be passed around
- Values have Types
- Types specify the "world" of potential Values
Variables
In Python you can define variables with the =
operator.
Example:
my_location = "VALA Tech Camp"
Example:
number_of_people_present = 22
Example:
# (x,y) refers to the desk position in terms of rows and columns.
# We call this a "tuple".
my_position_in_this_room = (1, 2)
Example:
# This is a "list".
favourite_foods = ["Pizza", "Pancakes", "Toasted Cheese Sandwich"]
Example:
# In meters
distance_to_wall = 1.25
(1) - Define these variables in your notebook, and fill in
your own values.
(2) - Ask your neighbours for their favourite decimal number
and make a list of these. Called `favourite_decimals`.
Types
Q: How do you make a great program?
A: You type it!
The type of a variable informs how we can use it.
For example, we can add and divide numbers, but dividing strings doesn't make a lot of sense.
We can count the number of elements in a list, but it doesn't make sense to "count" anything on a single number.
We can learn the type of a variable by using the type
function:
suburb = "Carlton"
print(type(suburb))
We've also introduced functions here. The "print" and the "type"
functions. We will see more of these in the next section.
(3) - Figure out the type of the variables defined in
Exercises 1 and 2.
Bonus (3.1):
Redefine the variable assigned to your distance to the wall and make
it into an integer: the `int` type in Python. (Reminder: Integers
are whole numbers with no decimal places. I.e. 6, 1983, -123, but
not -2.1 or 1/2.)
Bonus (3.2):
Do you expect the type of the "favourite_foods" and "favourite_decimals"
variables to have the same type? Do they?
Group Question 1:
- Are there some types you've heard of but not seen here?
- Can you imagine some other types of values?
Functions and Operators
We've seen how to define variables and assign values. We would now like to perform actions and operations on these values.
Functions are the means by which we can manipulate values. We'll look at some functions that come with Python, and later on we will define our own.
Evaluating Functions
We call a function by writing:
function_name(argument_1, argument_2, ...)
An argument is the input to your function.
Built-In Functions
The following is an example of calling the print function:
print("Hello, World!")
Here, print
is the function name, and "Hello World"
is the first (and only) argument.
(4) - Use the "print", "min", "sum" functions to look at the
list of "favourite_decimals".
(4.1) - What is the sum of your favourite decimals variable?
- What is the function name, and what is the argument?
(4.2) - What is the smallest favourite decimal?
A useful function is the range
function. It let's you get a sequence of numbers. To see the values, we need to use the list
function on the range:
list(range(5, 10))
Python Built-In function documentation.
(5) - Use the "range" function to print out all the numbers
from 0 to 10.
(5.1) - Look up the "range" function in the Python
documentation, and add a new argument so that we only get
the even numbers in this range.
Bonus: What is the sum of these two lists?
Bonus: What are the lengths of these lists? (Hint: Look in the
documentation for the appropriate function).
The Python documentation (which we linked to above) is an
invaluable resource for learning about available functions
and how to use them. We will use it through-out the session,
so we will see more on this later.
Built-In Operators
We've summed up an entire list, but we can also just add two numbers together using the +
operator:
fun = 5 + 6
print(fun)
(6) - What else can be added together? Strings? Lists?
(6.1) - What can't be added together?
Bonus - Can you guess some other operators that might exist? What do
they do on integers? And lists?
Function Reference
Below are the functions we've seen:
type | list | |
sum | range | len |
Keywords and Syntax
In the sections to follow we will see keywords. These are special words that Python uses to determine how the program is structured.
The structure of a program is referred to as "Syntax". It determines what is a comment, what is a function, what is a variable, a value, and indeed every element of the program.
A "comment" in programming is a line of code that the program
doesn't interpret. It can be used to document what is happening,
or the programmers thoughts at the moment they were typing.
Getting Cosy
Wordbank
logic | def | return |
arguments | function body | block |
indentation | objects | elements |
split | replace | upper |
lower | join | list slicing |
index | control structure | if |
else | for | loop |
flow | logical operator | in |
We've seen how to define variables and assign values, and do a bit of basic manipulation of those variables.
However, in any real-world program we're going to be interested in performing our own transformations on data, and combining transformations together into a larger overall purpose.
We'll see later on that our main task today will be to investigate movie reviews; and this may involve many steps.
Functions are a way of collecting together many steps of logic into a single statement, or "recipe". Much like how we would think of walking to the shops and buying apples as a "single thing", instead of all the individual steps that that would involve.
User-Defined Functions
To define a function we will use some keyswords for the first time.
The def
keyword (short for "function definition") let's us begin what we call a "function block", and the return
keyword let's us set what we refer to as the "return value" of the function.
Let's take a look at an example:
def seven_up(x):
y = x + 7
return y
Group Question 2: How many arguments does the function below have?
def greet_person(name, location):
result = "Hello " + name + "! How's things in " + location + "?"
return result
The syntax for specifying a function block in Python is that
the function body, i.e. the "steps", are indented.
For consistency, we will say that there should be 4 spaces
for all of the steps in the function body relative to the
position of the "def" keyword.
This rule actually applies to ANY block in Python, such as
an "if" statement, which we will see later.
Group Question 3 - What are the steps in the "foo" function
below?
def foo(x):
w = 1
y = x + (7 * w)
print(y)
w = 9
(7) - Evaluate the "seven_up" and "greet_person" functions,
with different values, in your notebook.
(7.1) - Write a new function, "always_five" that always returns
the value 5.
(7.2) - Delete the "return result" line from the "greet_person"
function and try and evaluate it again, like you did in step 7.
What is different?
Objects and Functions on them
We often think of Python as an Object-oriented programming language.
An "Object" is a way of representing something about the world. Examples of things we might like to represent are:
- Houses
- Our friends
- Places we've been
- Words
- Sentences
We won't go into the details of objects (and classes) here (take a look at this and this for a nice introduction), but we will make use of them.
As usual let's investigate by example. Consider the following string:
friends = "Jean Girard,Ricky Bobby,Cal Naughton Jr.,Susan"
We'd like to count the number of friends in this string. One solution would be to convert this string into a list, and count the elements in the list.
The function split
can be used here. This is defined on the string itself:
list_of_friends = friends.split(",")
(8) - Run the code above, and count the number of friends
in the list.
(8.1) - Try the "replace" function on a string.
(8.2) - Try the "upper" and "lower" functions on strings.
(8.3) - Challenge: Use the "join" function on a string to go
from the "list_of_friends" variable back to the
string where the friends are separated by commas.
This can be done in one line.
Lists
When we think of a list we are generally interested in such questions as:
- What is the first thing in this list?
- What is the last thing?
and we may also be interested in such tasks as:
- Give me the first 3 elements of this list!
Python has a powerful mechanism for these tasks: List Slicing. These techniques allow us to access elements of a list. Let's see by example:
things = ["Plant", "Computer", "Book", "Lamp"]
(9) - Try the following statements with the variable above:
- print(things[0])
- print(things[1:3])
- print(things[-1])
What do you get?
(9.1) - By reading the slicing documentaiton, can you write a slice
expression that gives back only the first 3 items? Can you do it in
two different ways?
(9.2) - We can use this notation to set elements of a list, just like
we did with variables earlier. Try updating the first element to be
your favourite meal.
(9.3) - Challenge - Can you treat a string as a list? What works?
What doesn't?
Control Structures
So far we've seen how to define functions and variables, but we're yet to make any kind of decision based on their values!
Control structures let us change our program flow based on the values.
Let's see an example of an "if/else" statement, along with the ==
operator (named "equals").
quest = "..."
if "Learn Python" == quest:
print("You may pass!")
else:
print("*You get thrown off the bridge and into the ravine.*")
(10) - Run the code above. What happens? Can you put in a
value that lets you pass?
(10.1) - What is the type of: "Learn Python" == quest
Logical Operators
We can use any of the following operators for making decisions:
Name | Operator |
---|---|
Equal To | == |
Not Equal To | != |
Greater Than | > |
Less Than | < |
Greater Than Or Equal To | >= |
Less Than Or Equal To | <= |
(11) - Try out the above operators on integers and strings.
Which ones can you use to compare an integer to a string?
(11.1) - Are upper case letters smaller than, equal to, or larger
than lower case letters?
(11.2) - Does the order of elements matter when you compare if
two lists are equal? What about tuples?
Looping
Oftentimes we will want to run a certain step several times, or even indefinitely!
One of the standard tools is the for
loop, which let's us perform a step a given number of times. Let's see an example:
for k in range(0, 10):
print(k)
This introduces two new keywords: for
and in
.
(12) - Using what we've learned above, make a for loop over
the numbers from 1 to 10, that:
- prints "hello" if the number is less than 5
- prints "goodbye!" otherwise
(12.1) - Define a list of nearby things, and use a for loop
to print only those things that start with the letter "d".
Hint: You may like to look up functions that are available on
strings, much as we did earlier.
Group Question 4 - Have you encountered a real-life version of
a for loop? What is it?
Take a few minutes to write a "pseudo-code" version of a for loop
that solves this task:
Example:
for movie in all_movies_ever:
if is_great(movie):
watch(movie)
discuss(movie, with=friends)
Pandas!
In this section we'll see our first Python Library. But before we get into that, we need first take a look at how libraries work!
Wordbank
library | import | as |
ceil | math | alias |
DataFrame | Series | head |
dtypes | multi-index | loc |
csv | sort_values | unique |
Libraries
A Library in Python is a collection of functions that let you do interesting things without writing raw code to do it.
- You import an entire library in order to use it in your code,
You can give it an alias,
# Import the pandas library and call it 'pd' import pandas as pd
or just use its name
# Import the sys library (as sys) import sys
You can import only the functions you want to use
# Import the 'ceil' function from the 'math' library from math import ceil
How you import libraries affects the way you use them in your code.
The ceil
function in the math
library lets you find the value of rounding up a number. Let's take a look at some examples:
import math as ma
ma.ceil(9.7)
import math
math.ceil(9.7)
from math import ceil
ceil(9.7)
Python comes with some libraries already installed.
The 'math' library is one of them. These libraries that come
pre-installed are called "The Python Standard Library"
One of the main benefits of Python is that there is a large and
well-supported community of handy libraries that you can install
and use for free!
Today we will be learning about the pandas library. Pandas lets us play with data!
The pandas library
In your notebook environments we have pre-installed pandas
for you.
(13) - Import pandas into your notebook and give it 'pd' as an alias.
Worked Example: Movie Reviews
We found some data about movies, these files are located under the folder data
on your instances.
We can read each of these files into what we call a DataFrame
and play with the data. This DataFrame
is an object, and as such, it has many functions we can use to investigate it and have fun!
(14) - Use the read_csv function from pandas to read the file
"data/movies.csv", assign this file to a variable, "movies".
(14.1) - Look at the first 5 rows of your data using the "head"
function on "movies"
movies.head(5)
What are the columns in this data?
(14.2) Looking at the values, what do you think the "type" of the
data in each column is? Write your guesses down.
On Indices
An index is used by pandas to provide a unique identifier for
each row. If none is specified, Pandas will automatically insert
one based on the row number.
We can set multiple columns to define the index as long as the
combination of values is unique. This is called a multi-index.
Group Question 5 - Can you think of a common multi-index that
you use in your everyday life?
We can access attributes of the DataFrame to learn things about it.
For example if I want to know the column names of my dataframe I can do the following:
movies.columns
(15) - Investigate the following attributes:
- shape
- dtypes
We can investigate a full column in our DataFrame. the column when referenced by the index, is called a "Series".
Let's investigate our data a bit more closely, and look at the "title" column in our DataFrame
as follows:
movies['title']
This results in what Pandas calls a Series
.
On Series and DataFrames
Pandas makes a differentiation between "Series" and "DataFrames".
This becomes important when thinking about what functions we can
perform on them.
Intuitively, we will think of a Series as a single column in a
DataFrame which is still accessible via the index.
We can locate data in our DataFrame using the attribute loc
. To access a particular row, or more precisely, the row indexed by 27
, we would write,
movies.loc[27]
We can also access a particular set of rows based on a value in the column:
movies.loc[movies['title']=="Ghostbusters (2016)"]
(16) - Read in the file "ratings.csv" in the data folder.
(16.1) - How many ratings are in this file?
(16.2) - Find all the ratings for userId == 14
(16.3) - Asumming 5 is the best score, what is the title of their
best rated movie?
(16.4) - What are the genres of this movie?
(16.5) - Challenge: Use the "sort_values" function on your ratings
DataFrame and sort them according to the "rating" column. Can you
sort them from largest to smallest, ie. descending?
(Hint: You can find documentation here.)
Pandas makes it really easy for us to add columns to our DataFrame. Suppose we are not fans of this 5-star rating, instead we want to see a score out of 100, and we want the name of the column to be "score"
ratings['score'] = 100 * ratings['rating']/5
We can also find out how many unique different scores we assigned, by using the unique
function on the scores
series.
(17) - Find the unique scores by doing the following:
- Follow the code above to assing your ratings data a "score"
- Access the "score" column as a series, like we did to access
movie titles (movies['title'])
- Use the "unique" function to get all unique scores.
How many are there?
(18) - Challenge - Merging DataFrames
As it stands, the "ratings" DataFrame we only see the movieId, and
not the name of the movie. If we were able to merge in the "movies"
DataFrame, then we could see the ratings together with the movie name
and other data.
Use the "merge" function on a DataFrame to merge "movies" into
"ratings".
(18.1) - Extra Challenge - Can you find a way to only merge the
movie name, and not the other columns.
Exercise - (19) - Joint Exercises Later in the Tech Camp you will learn how to use APIs. IMDB has an API that is accessible via the library ImdbPie. This can be used to get the movie poster for the movies in this data set. If you feel comfortable with APIs, come back to this exercise and use the "imdbId" field from the "movies" data set to find the movie poster for that movie and display it in the notebook! Hint: Here are two functions that will be handy: def to_imdb_format(imdbId): return "tt" + (("0" * 7) + str(imdbId))[-7:] def poster_image(imdbId): from IPython.display import Image id_ = to_imdb_format(imdbId) title = imdb.get_title_by_id(id_) return Image(url=title.poster_url, width=200, height=200)
(20) - Ambitious - Finding Friends To Watch Movies With
Write some code that, using the "reviews" DataFrame, finds
users that reated the same movie highly (a 5, say) and then
in this way build a new DataFrame, "friends", with the following
columns:
friendId1, friendId2
Funtimes!
- Experiment with what you've learned!
- We can help install Python on your computer
- Questions