VALA Tech Camp - Introduction To Python
A workshop intended to introduce Python to people who are new to programming!
Press "t" to toggle showing the table of contents
- What is Python and what can be done with it?
- What is special about Python as a programming language?
- What are Python libraries?
- I can use Python
- I can read Python documentation
- I can use Pandas to
- Read and write data
- Select particular data
- Investigate data
- Add simple information (colums) to the data
- Link data*
- Add complex information (from different locations) to the data*
Table of Contents
We will be using previously set-up computers in the cloud. You have been given a personal URL to use through-out the session.
At the end of the session we can help you set up Python on your computers.
Other Useful Resources
Anaconda How to get Python on your computer.
Weekly Python Chat Interesting weekly videos of all things python. They go back quite a bit so you can usually find things you are working on.
Improve your python skills Nice blog-posts about how to squeeze the python juice.
PyConAU 2017 Python conference in August
PyLadies Meet-up Monthly talks aobut Python in the city.
Silverpond Blog Noon works there :) All things Deep Learning and AI
Thick! Great blog, also great newsletter!
FiveThrityEight Super cool use of math, stats, programming, data-viz opinion articles.
Welcome to the Python Workshop!
About Your Instructors
Gala & Noon
- Interest in Python?
Open the notebook entitled "Fundamentals" on your provided server.
Variables, Values & Types
- Values represent "things" we want to deal with:
- Variables hold Values and can be passed around
- Values have Types
- Types specify the "world" of potential Values
In Python you can define variables with the
my_location = "VALA Tech Camp"
number_of_people_present = 22
# (x,y) refers to the desk position in terms of rows and columns. # We call this a "tuple". my_position_in_this_room = (1, 2)
# This is a "list". favourite_foods = ["Pizza", "Pancakes", "Toasted Cheese Sandwich"]
# In meters distance_to_wall = 1.25
(1) - Define these variables in your notebook, and fill in your own values.
(2) - Ask your neighbours for their favourite decimal number and make a list of these. Called `favourite_decimals`.
Q: How do you make a great program? A: You type it!
The type of a variable informs how we can use it.
For example, we can add and divide numbers, but dividing strings doesn't make a lot of sense.
We can count the number of elements in a list, but it doesn't make sense to "count" anything on a single number.
We can learn the type of a variable by using the
suburb = "Carlton" print(type(suburb))
We've also introduced functions here. The "print" and the "type" functions. We will see more of these in the next section.
(3) - Figure out the type of the variables defined in Exercises 1 and 2. Bonus (3.1): Redefine the variable assigned to your distance to the wall and make it into an integer: the `int` type in Python. (Reminder: Integers are whole numbers with no decimal places. I.e. 6, 1983, -123, but not -2.1 or 1/2.) Bonus (3.2): Do you expect the type of the "favourite_foods" and "favourite_decimals" variables to have the same type? Do they?
Group Question 1: - Are there some types you've heard of but not seen here? - Can you imagine some other types of values?
Functions and Operators
We've seen how to define variables and assign values. We would now like to perform actions and operations on these values.
Functions are the means by which we can manipulate values. We'll look at some functions that come with Python, and later on we will define our own.
We call a function by writing:
function_name(argument_1, argument_2, ...)
An argument is the input to your function.
The following is an example of calling the print function:
"Hello World" is the first (and only) argument.
(4) - Use the "print", "min", "sum" functions to look at the list of "favourite_decimals". (4.1) - What is the sum of your favourite decimals variable? - What is the function name, and what is the argument? (4.2) - What is the smallest favourite decimal?
A useful function is the
range function. It let's you get a sequence of numbers. To see the values, we need to use the
list function on the range:
(5) - Use the "range" function to print out all the numbers from 0 to 10. (5.1) - Look up the "range" function in the Python documentation, and add a new argument so that we only get the even numbers in this range. Bonus: What is the sum of these two lists? Bonus: What are the lengths of these lists? (Hint: Look in the documentation for the appropriate function).
The Python documentation (which we linked to above) is an invaluable resource for learning about available functions and how to use them. We will use it through-out the session, so we will see more on this later.
We've summed up an entire list, but we can also just add two numbers together using the
fun = 5 + 6 print(fun)
(6) - What else can be added together? Strings? Lists? (6.1) - What can't be added together? Bonus - Can you guess some other operators that might exist? What do they do on integers? And lists?
Below are the functions we've seen:
Keywords and Syntax
In the sections to follow we will see keywords. These are special words that Python uses to determine how the program is structured.
The structure of a program is referred to as "Syntax". It determines what is a comment, what is a function, what is a variable, a value, and indeed every element of the program.
A "comment" in programming is a line of code that the program doesn't interpret. It can be used to document what is happening, or the programmers thoughts at the moment they were typing.
We've seen how to define variables and assign values, and do a bit of basic manipulation of those variables.
However, in any real-world program we're going to be interested in performing our own transformations on data, and combining transformations together into a larger overall purpose.
We'll see later on that our main task today will be to investigate movie reviews; and this may involve many steps.
Functions are a way of collecting together many steps of logic into a single statement, or "recipe". Much like how we would think of walking to the shops and buying apples as a "single thing", instead of all the individual steps that that would involve.
To define a function we will use some keyswords for the first time.
def keyword (short for "function definition") let's us begin what we call a "function block", and the
return keyword let's us set what we refer to as the "return value" of the function.
Let's take a look at an example:
def seven_up(x): y = x + 7 return y
Group Question 2: How many arguments does the function below have?
def greet_person(name, location): result = "Hello " + name + "! How's things in " + location + "?" return result
The syntax for specifying a function block in Python is that the function body, i.e. the "steps", are indented. For consistency, we will say that there should be 4 spaces for all of the steps in the function body relative to the position of the "def" keyword. This rule actually applies to ANY block in Python, such as an "if" statement, which we will see later.
Group Question 3 - What are the steps in the "foo" function below?
def foo(x): w = 1 y = x + (7 * w) print(y) w = 9
(7) - Evaluate the "seven_up" and "greet_person" functions, with different values, in your notebook. (7.1) - Write a new function, "always_five" that always returns the value 5. (7.2) - Delete the "return result" line from the "greet_person" function and try and evaluate it again, like you did in step 7. What is different?
Objects and Functions on them
We often think of Python as an Object-oriented programming language.
An "Object" is a way of representing something about the world. Examples of things we might like to represent are:
- Our friends
- Places we've been
As usual let's investigate by example. Consider the following string:
friends = "Jean Girard,Ricky Bobby,Cal Naughton Jr.,Susan"
We'd like to count the number of friends in this string. One solution would be to convert this string into a list, and count the elements in the list.
split can be used here. This is defined on the string itself:
list_of_friends = friends.split(",")
(8) - Run the code above, and count the number of friends in the list. (8.1) - Try the "replace" function on a string. (8.2) - Try the "upper" and "lower" functions on strings. (8.3) - Challenge: Use the "join" function on a string to go from the "list_of_friends" variable back to the string where the friends are separated by commas. This can be done in one line.
When we think of a list we are generally interested in such questions as:
- What is the first thing in this list?
- What is the last thing?
and we may also be interested in such tasks as:
- Give me the first 3 elements of this list!
Python has a powerful mechanism for these tasks: List Slicing. These techniques allow us to access elements of a list. Let's see by example:
things = ["Plant", "Computer", "Book", "Lamp"]
(9) - Try the following statements with the variable above: - print(things) - print(things[1:3]) - print(things[-1]) What do you get? (9.1) - By reading the slicing documentaiton, can you write a slice expression that gives back only the first 3 items? Can you do it in two different ways? (9.2) - We can use this notation to set elements of a list, just like we did with variables earlier. Try updating the first element to be your favourite meal. (9.3) - Challenge - Can you treat a string as a list? What works? What doesn't?
So far we've seen how to define functions and variables, but we're yet to make any kind of decision based on their values!
Control structures let us change our program flow based on the values.
Let's see an example of an "if/else" statement, along with the
== operator (named "equals").
quest = "..." if "Learn Python" == quest: print("You may pass!") else: print("*You get thrown off the bridge and into the ravine.*")
(10) - Run the code above. What happens? Can you put in a value that lets you pass? (10.1) - What is the type of: "Learn Python" == quest
We can use any of the following operators for making decisions:
|Not Equal To||
|Greater Than Or Equal To||
|Less Than Or Equal To||
(11) - Try out the above operators on integers and strings. Which ones can you use to compare an integer to a string? (11.1) - Are upper case letters smaller than, equal to, or larger than lower case letters? (11.2) - Does the order of elements matter when you compare if two lists are equal? What about tuples?
Oftentimes we will want to run a certain step several times, or even indefinitely!
One of the standard tools is the
for loop, which let's us perform a step a given number of times. Let's see an example:
for k in range(0, 10): print(k)
This introduces two new keywords:
(12) - Using what we've learned above, make a for loop over the numbers from 1 to 10, that: - prints "hello" if the number is less than 5 - prints "goodbye!" otherwise (12.1) - Define a list of nearby things, and use a for loop to print only those things that start with the letter "d". Hint: You may like to look up functions that are available on strings, much as we did earlier.
Group Question 4 - Have you encountered a real-life version of a for loop? What is it? Take a few minutes to write a "pseudo-code" version of a for loop that solves this task: Example: for movie in all_movies_ever: if is_great(movie): watch(movie) discuss(movie, with=friends)
In this section we'll see our first Python Library. But before we get into that, we need first take a look at how libraries work!
A Library in Python is a collection of functions that let you do interesting things without writing raw code to do it.
- You import an entire library in order to use it in your code,
You can give it an alias,
# Import the pandas library and call it 'pd' import pandas as pd
or just use its name
# Import the sys library (as sys) import sys
You can import only the functions you want to use
# Import the 'ceil' function from the 'math' library from math import ceil
How you import libraries affects the way you use them in your code.
ceil function in the
math library lets you find the value of rounding up a number. Let's take a look at some examples:
import math as ma ma.ceil(9.7)
import math math.ceil(9.7)
from math import ceil ceil(9.7)
Python comes with some libraries already installed. The 'math' library is one of them. These libraries that come pre-installed are called "The Python Standard Library" One of the main benefits of Python is that there is a large and well-supported community of handy libraries that you can install and use for free!
Today we will be learning about the pandas library. Pandas lets us play with data!
The pandas library
In your notebook environments we have pre-installed pandas for you. (13) - Import pandas into your notebook and give it 'pd' as an alias.
Worked Example: Movie Reviews
We found some data about movies, these files are located under the folder
data on your instances.
We can read each of these files into what we call a
DataFrame and play with the data. This
DataFrame is an object, and as such, it has many functions we can use to investigate it and have fun!
(14) - Use the read_csv function from pandas to read the file "data/movies.csv", assign this file to a variable, "movies". (14.1) - Look at the first 5 rows of your data using the "head" function on "movies" movies.head(5) What are the columns in this data? (14.2) Looking at the values, what do you think the "type" of the data in each column is? Write your guesses down.
On Indices An index is used by pandas to provide a unique identifier for each row. If none is specified, Pandas will automatically insert one based on the row number. We can set multiple columns to define the index as long as the combination of values is unique. This is called a multi-index.
Group Question 5 - Can you think of a common multi-index that you use in your everyday life?
We can access attributes of the DataFrame to learn things about it.
For example if I want to know the column names of my dataframe I can do the following:
(15) - Investigate the following attributes: - shape - dtypes
We can investigate a full column in our DataFrame. the column when referenced by the index, is called a "Series".
Let's investigate our data a bit more closely, and look at the "title" column in our
DataFrame as follows:
This results in what Pandas calls a
On Series and DataFrames Pandas makes a differentiation between "Series" and "DataFrames". This becomes important when thinking about what functions we can perform on them. Intuitively, we will think of a Series as a single column in a DataFrame which is still accessible via the index.
We can locate data in our DataFrame using the attribute
loc. To access a particular row, or more precisely, the row indexed by
27, we would write,
We can also access a particular set of rows based on a value in the column:
(16) - Read in the file "ratings.csv" in the data folder. (16.1) - How many ratings are in this file? (16.2) - Find all the ratings for userId == 14 (16.3) - Asumming 5 is the best score, what is the title of their best rated movie? (16.4) - What are the genres of this movie? (16.5) - Challenge: Use the "sort_values" function on your ratings DataFrame and sort them according to the "rating" column. Can you sort them from largest to smallest, ie. descending?
(Hint: You can find documentation here.)
Pandas makes it really easy for us to add columns to our DataFrame. Suppose we are not fans of this 5-star rating, instead we want to see a score out of 100, and we want the name of the column to be "score"
ratings['score'] = 100 * ratings['rating']/5
We can also find out how many unique different scores we assigned, by using the
unique function on the
(17) - Find the unique scores by doing the following: - Follow the code above to assing your ratings data a "score" - Access the "score" column as a series, like we did to access movie titles (movies['title']) - Use the "unique" function to get all unique scores. How many are there?
(18) - Challenge - Merging DataFrames As it stands, the "ratings" DataFrame we only see the movieId, and not the name of the movie. If we were able to merge in the "movies" DataFrame, then we could see the ratings together with the movie name and other data. Use the "merge" function on a DataFrame to merge "movies" into "ratings". (18.1) - Extra Challenge - Can you find a way to only merge the movie name, and not the other columns.
Exercise - (19) - Joint Exercises Later in the Tech Camp you will learn how to use APIs. IMDB has an API that is accessible via the library ImdbPie. This can be used to get the movie poster for the movies in this data set. If you feel comfortable with APIs, come back to this exercise and use the "imdbId" field from the "movies" data set to find the movie poster for that movie and display it in the notebook! Hint: Here are two functions that will be handy: def to_imdb_format(imdbId): return "tt" + (("0" * 7) + str(imdbId))[-7:] def poster_image(imdbId): from IPython.display import Image id_ = to_imdb_format(imdbId) title = imdb.get_title_by_id(id_) return Image(url=title.poster_url, width=200, height=200)
(20) - Ambitious - Finding Friends To Watch Movies With Write some code that, using the "reviews" DataFrame, finds users that reated the same movie highly (a 5, say) and then in this way build a new DataFrame, "friends", with the following columns: friendId1, friendId2
- Experiment with what you've learned!
- We can help install Python on your computer