VALA Tech Camp - Introduction To Python

A workshop intended to introduce Python to people who are new to programming!

Press "t" to toggle showing the table of contents

Objectives

Concepts

  1. What is Python and what can be done with it?
  2. What is special about Python as a programming language?
  3. What are Python libraries?

Outcomes

  1. I can use Python
  2. I can read Python documentation
  3. I can use Pandas to
    1. Read and write data
    2. Select particular data
    3. Investigate data
    4. Add simple information (colums) to the data
    5. Link data*
    6. Add complex information (from different locations) to the data*

Table of Contents

Resources Resources available and required 5m
Welcome Welcome! 5m
Getting Acquainted Variables, types, basic functions 30m
Getting Cosy More functions, control flow 30m
Pandas Pandas: Making sense of data 30m
Funtimes Experiment with what you've learned! 15m
Thanks Thanks and goodbye 5m

Required Resources

We will be using previously set-up computers in the cloud. You have been given a personal URL to use through-out the session.

At the end of the session we can help you set up Python on your computers.

Other Useful Resources

Python

Cool Reads

  • Silverpond Blog Noon works there :) All things Deep Learning and AI

  • Thick! Great blog, also great newsletter!

  • FiveThrityEight Super cool use of math, stats, programming, data-viz opinion articles.

Other


Welcome

Welcome to the Python Workshop!

About Your Instructors

Gala & Noon

About You?

  • Interest in Python?
  • Goals?

Why Python?


Getting Acquainted

Wordbank

notebook Value Variable
Type operator tuple
list integer decimal
function print Evaluate
argument String range
keyword syntax
Open the notebook entitled "Fundamentals" on your provided 
server.

Variables, Values & Types

  • Values represent "things" we want to deal with:
    • Names: "Noon", "Gala", ..
    • Ages: 34, 30, ...
  • Variables hold Values and can be passed around
  • Values have Types
  • Types specify the "world" of potential Values

Variables

In Python you can define variables with the = operator.

Example:

my_location = "VALA Tech Camp"

Example:

number_of_people_present = 22

Example:

# (x,y) refers to the desk position in terms of rows and columns.
# We call this a "tuple".
my_position_in_this_room = (1, 2)

Example:

# This is a "list".
favourite_foods = ["Pizza", "Pancakes", "Toasted Cheese Sandwich"]

Example:

# In meters
distance_to_wall = 1.25
(1) - Define these variables in your notebook, and fill in 
your own values.
(2) - Ask your neighbours for their favourite decimal number
and make a list of these. Called `favourite_decimals`.

Types

Q: How do you make a great program?
A: You type it!

The type of a variable informs how we can use it.

For example, we can add and divide numbers, but dividing strings doesn't make a lot of sense.

We can count the number of elements in a list, but it doesn't make sense to "count" anything on a single number.

We can learn the type of a variable by using the type function:

suburb = "Carlton"
print(type(suburb))
We've also introduced functions here. The "print" and the "type"
functions. We will see more of these in the next section.
(3) - Figure out the type of the variables defined in 
Exercises 1 and 2.

Bonus (3.1): 

Redefine the variable assigned to your distance to the wall and make
it into an integer: the `int` type in Python. (Reminder: Integers
are whole numbers with no decimal places. I.e. 6, 1983, -123, but 
not -2.1 or 1/2.)

Bonus (3.2): 

Do you expect the type of the "favourite_foods" and "favourite_decimals"
variables to have the same type? Do they?
Group Question 1: 
- Are there some types you've heard of but not seen here? 
- Can you imagine some other types of values?  

Functions and Operators

We've seen how to define variables and assign values. We would now like to perform actions and operations on these values.

Functions are the means by which we can manipulate values. We'll look at some functions that come with Python, and later on we will define our own.

Evaluating Functions

We call a function by writing:

function_name(argument_1, argument_2, ...)

An argument is the input to your function.

Built-In Functions

The following is an example of calling the print function:

print("Hello, World!")

Here, print is the function name, and "Hello World" is the first (and only) argument.

(4) - Use the "print", "min", "sum" functions to look at the 
list of "favourite_decimals".

(4.1) - What is the sum of your favourite decimals variable?
      - What is the function name, and what is the argument?

(4.2) - What is the smallest favourite decimal?

A useful function is the range function. It let's you get a sequence of numbers. To see the values, we need to use the list function on the range:

list(range(5, 10))

Python Built-In function documentation.

(5) - Use the "range" function to print out all the numbers 
from 0 to 10.

(5.1) - Look up the "range" function in the Python 
documentation, and add a new argument so that we only get 
the even numbers in this range.

Bonus: What is the sum of these two lists?

Bonus: What are the lengths of these lists? (Hint: Look in the 
documentation for the appropriate function).
The Python documentation (which we linked to above) is an 
invaluable resource for learning about available functions 
and how to use them. We will use it through-out the session, 
so we will see more on this later.

Built-In Operators

We've summed up an entire list, but we can also just add two numbers together using the + operator:

fun = 5 + 6
print(fun)
(6) - What else can be added together? Strings? Lists?

(6.1) - What can't be added together?

Bonus - Can you guess some other operators that might exist? What do 
they do on integers? And lists?

Function Reference

Below are the functions we've seen:

type list print
sum range len

Keywords and Syntax

In the sections to follow we will see keywords. These are special words that Python uses to determine how the program is structured.

The structure of a program is referred to as "Syntax". It determines what is a comment, what is a function, what is a variable, a value, and indeed every element of the program.

A "comment" in programming is a line of code that the program 
doesn't interpret. It can be used to document what is happening, 
or the programmers thoughts at the moment they were typing.

Getting Cosy

Wordbank

logic def return
arguments function body block
indentation objects elements
split replace upper
lower join list slicing
index control structure if
else for loop
flow logical operator in

We've seen how to define variables and assign values, and do a bit of basic manipulation of those variables.

However, in any real-world program we're going to be interested in performing our own transformations on data, and combining transformations together into a larger overall purpose.

We'll see later on that our main task today will be to investigate movie reviews; and this may involve many steps.

Functions are a way of collecting together many steps of logic into a single statement, or "recipe". Much like how we would think of walking to the shops and buying apples as a "single thing", instead of all the individual steps that that would involve.

User-Defined Functions

To define a function we will use some keyswords for the first time.

The def keyword (short for "function definition") let's us begin what we call a "function block", and the return keyword let's us set what we refer to as the "return value" of the function.

Let's take a look at an example:

def seven_up(x):
    y = x + 7
    return y
Group Question 2: How many arguments does the function below have?
def greet_person(name, location):
    result = "Hello " + name + "! How's things in " + location + "?"
    return result
The syntax for specifying a function block in Python is that
the function body, i.e. the "steps", are indented.

For consistency, we will say that there should be 4 spaces
for all of the steps in the function body relative to the
position of the "def" keyword.

This rule actually applies to ANY block in Python, such as
an "if" statement, which we will see later.
Group Question 3 - What are the steps in the "foo" function
below?
def foo(x):
    w = 1
    y = x + (7 * w)
    print(y)
w = 9
(7) - Evaluate the "seven_up" and "greet_person" functions,
with different values, in your notebook.

(7.1) - Write a new function, "always_five" that always returns
the value 5.

(7.2) - Delete the "return result" line from the "greet_person"
function and try and evaluate it again, like you did in step 7.
What is different?

Objects and Functions on them

We often think of Python as an Object-oriented programming language.

An "Object" is a way of representing something about the world. Examples of things we might like to represent are:

  • Houses
  • Our friends
  • Places we've been
  • Words
  • Sentences

We won't go into the details of objects (and classes) here (take a look at this and this for a nice introduction), but we will make use of them.

As usual let's investigate by example. Consider the following string:

friends = "Jean Girard,Ricky Bobby,Cal Naughton Jr.,Susan"

We'd like to count the number of friends in this string. One solution would be to convert this string into a list, and count the elements in the list.

The function split can be used here. This is defined on the string itself:

list_of_friends = friends.split(",")
(8) - Run the code above, and count the number of friends
in the list.

(8.1) - Try the "replace" function on a string.

(8.2) - Try the "upper" and "lower" functions on strings.

(8.3) - Challenge: Use the "join" function on a string to go
        from the "list_of_friends" variable back to the
        string where the friends are separated by commas.
        This can be done in one line.

Lists

When we think of a list we are generally interested in such questions as:

  • What is the first thing in this list?
  • What is the last thing?

and we may also be interested in such tasks as:

  • Give me the first 3 elements of this list!

Python has a powerful mechanism for these tasks: List Slicing. These techniques allow us to access elements of a list. Let's see by example:

things = ["Plant", "Computer", "Book", "Lamp"]
(9) - Try the following statements with the variable above:

- print(things[0])
- print(things[1:3])
- print(things[-1])

What do you get?

(9.1) - By reading the slicing documentaiton, can you write a slice
expression that gives back only the first 3 items? Can you do it in
two different ways?

(9.2) - We can use this notation to set elements of a list, just like 
we did with variables earlier. Try updating the first element to be
your favourite meal.

(9.3) - Challenge - Can you treat a string as a list? What works? 
What doesn't?

Control Structures

So far we've seen how to define functions and variables, but we're yet to make any kind of decision based on their values!

Control structures let us change our program flow based on the values.

Let's see an example of an "if/else" statement, along with the == operator (named "equals").

quest = "..."

if "Learn Python" == quest:
    print("You may pass!")
else:
    print("*You get thrown off the bridge and into the ravine.*")
(10) - Run the code above. What happens? Can you put in a 
value that lets you pass?

(10.1) - What is the type of: "Learn Python" == quest

Logical Operators

We can use any of the following operators for making decisions:

Name Operator
Equal To ==
Not Equal To !=
Greater Than >
Less Than <
Greater Than Or Equal To >=
Less Than Or Equal To <=
(11) - Try out the above operators on integers and strings.

Which ones can you use to compare an integer to a string?

(11.1) - Are upper case letters smaller than, equal to, or larger
than lower case letters?

(11.2) - Does the order of elements matter when you compare if
two lists are equal? What about tuples?

Looping

Oftentimes we will want to run a certain step several times, or even indefinitely!

One of the standard tools is the for loop, which let's us perform a step a given number of times. Let's see an example:

for k in range(0, 10):
    print(k)

This introduces two new keywords: for and in.

(12) - Using what we've learned above, make a for loop over 
the numbers from 1 to 10, that:

- prints "hello" if the number is less than 5
- prints "goodbye!" otherwise

(12.1) - Define a list of nearby things, and use a for loop
to print only those things that start with the letter "d".

    Hint: You may like to look up functions that are available on
    strings, much as we did earlier.
Group Question 4 - Have you encountered a real-life version of
a for loop? What is it?

Take a few minutes to write a "pseudo-code" version of a for loop
that solves this task:

Example:

for movie in all_movies_ever:
    if is_great(movie):
        watch(movie)
        discuss(movie, with=friends)

Pandas!

In this section we'll see our first Python Library. But before we get into that, we need first take a look at how libraries work!

Wordbank

library import as
ceil math alias
DataFrame Series head
dtypes multi-index loc
csv sort_values unique

Libraries

A Library in Python is a collection of functions that let you do interesting things without writing raw code to do it.

  • You import an entire library in order to use it in your code,
    • You can give it an alias,

      # Import the pandas library and call it 'pd'
      import pandas as pd
    • or just use its name

      # Import the sys library (as sys)
      import sys
  • You can import only the functions you want to use

    # Import the 'ceil' function from the 'math' library
    from math import ceil

How you import libraries affects the way you use them in your code.

The ceil function in the math library lets you find the value of rounding up a number. Let's take a look at some examples:

import math as ma
ma.ceil(9.7)
import math
math.ceil(9.7)
from math import ceil
ceil(9.7)
Python comes with some libraries already installed. 

The 'math' library is one of them. These libraries that come
pre-installed are called "The Python Standard Library"

One of the main benefits of Python is that there is a large and
well-supported community of handy libraries that you can install
and use for free! 

The Python Standard Library

Today we will be learning about the pandas library. Pandas lets us play with data!

The pandas library

In your notebook environments we have pre-installed pandas
for you.

(13) - Import pandas into your notebook and give it 'pd' as an alias. 

Worked Example: Movie Reviews

We found some data about movies, these files are located under the folder data on your instances.

We can read each of these files into what we call a DataFrame and play with the data. This DataFrame is an object, and as such, it has many functions we can use to investigate it and have fun!

(14) - Use the read_csv function from pandas to read the file 
"data/movies.csv", assign this file to a variable, "movies". 

(14.1) - Look at the first 5 rows of your data using the "head" 
function on "movies"

    movies.head(5)

What are the columns in this data?

(14.2) Looking at the values, what do you think the "type" of the
data in each column is? Write your guesses down.
On Indices

An index is used by pandas to provide a unique identifier for 
each row. If none is specified, Pandas will automatically insert 
one based on the row number.

We can set multiple columns to define the index as long as the
combination of values is unique. This is called a multi-index.
Group Question 5 - Can you think of a common multi-index that
you use in your everyday life?

We can access attributes of the DataFrame to learn things about it.

For example if I want to know the column names of my dataframe I can do the following:

movies.columns
(15) - Investigate the following attributes:
- shape
- dtypes

We can investigate a full column in our DataFrame. the column when referenced by the index, is called a "Series".

Let's investigate our data a bit more closely, and look at the "title" column in our DataFrame as follows:

movies['title']

This results in what Pandas calls a Series.

On Series and DataFrames

Pandas makes a differentiation between "Series" and "DataFrames".
This becomes important when thinking about what functions we can
perform on them.

Intuitively, we will think of a Series as a single column in a
DataFrame which is still accessible via the index.

We can locate data in our DataFrame using the attribute loc. To access a particular row, or more precisely, the row indexed by 27, we would write,

movies.loc[27]

We can also access a particular set of rows based on a value in the column:

movies.loc[movies['title']=="Ghostbusters (2016)"]
(16) - Read in the file "ratings.csv" in the data folder. 

(16.1) - How many ratings are in this file?

(16.2) - Find all the ratings for userId == 14

(16.3) - Asumming 5 is the best score, what is the title of their
 best rated movie?

(16.4) - What are the genres of this movie?

(16.5) - Challenge: Use the "sort_values" function on your ratings 
DataFrame and sort them according to the "rating" column. Can you
sort them from largest to smallest, ie. descending?

(Hint: You can find documentation here.)

Pandas makes it really easy for us to add columns to our DataFrame. Suppose we are not fans of this 5-star rating, instead we want to see a score out of 100, and we want the name of the column to be "score"

ratings['score'] = 100 * ratings['rating']/5

We can also find out how many unique different scores we assigned, by using the unique function on the scores series.

(17) - Find the unique scores by doing the following:

    - Follow the code above to assing your ratings data a "score"

    - Access the "score" column as a series, like we did to access
      movie titles (movies['title'])

    - Use the "unique" function to get all unique scores.
      How many are there?
(18) - Challenge - Merging DataFrames

As it stands, the "ratings" DataFrame we only see the movieId, and 
not the name of the movie. If we were able to merge in the "movies" 
DataFrame, then we could see the ratings together with the movie name 
and other data.

Use the "merge" function on a DataFrame to merge "movies" into 
"ratings".

(18.1) - Extra Challenge - Can you find a way to only merge the 
movie name, and not the other columns.
Exercise - (19) - Joint Exercises

Later in the Tech Camp you will learn how to use APIs. IMDB has an API
that is accessible via the library ImdbPie.

This can be used to get the movie poster for the movies in this data 
set.

If you feel comfortable with APIs, come back to this exercise and use
the "imdbId" field from the "movies" data set to find the movie poster
for that movie and display it in the notebook!

Hint: Here are two functions that will be handy:

def to_imdb_format(imdbId):
    return "tt" + (("0" * 7) + str(imdbId))[-7:]

def poster_image(imdbId):
    from IPython.display import Image
    id_ = to_imdb_format(imdbId)
    title = imdb.get_title_by_id(id_)
    return Image(url=title.poster_url, width=200, height=200)
(20) - Ambitious - Finding Friends To Watch Movies With

Write some code that, using the "reviews" DataFrame, finds
users that reated the same movie highly (a 5, say) and then
in this way build a new DataFrame, "friends", with the following
columns:

    friendId1, friendId2

Funtimes!

  • Experiment with what you've learned!
  • We can help install Python on your computer
  • Questions