This tutorial will cover defining and using your own functions, and reducing duplication in your code.
It is based on:
pynoon_starter_3.ipynb
def
keyword to define our own
functions.def
line that are
indented make up the body of the function that is
executed when the function is called.def print_greeting():
print('Hello World!')
print('How are you?')
Defining the function does not execute the code inside it, but we can call the function just like any other function:
print_greeting()
def print_greeting(name):
print(f'Hello {name}!')
print('How are you?')
print_greeting('Cooper')
return
statement.return
statement.return
a value, it implicitly
returns the value None
.def shorten_description(description, max_length):
if len(description) > max_length:
short_description = description[:max_length] + '...'
return short_description
return description
short_description = shorten_description('This is a very long description', 10)
short_description
shorten_description
returns the description
unchanged if it is equal to the
max_length
:shorten_description('12345', 5) == '12345'
assert
statement will raise an error if the Boolean
expression given to it returns False
:assert shorten_description('12345', 5) == '12345'
Now let’s test that the description
is limited to the
given max_length
:
assert shorten_description('123456789', 5) == '12...'
Hmmm, why did that fail?
shorten_description('123456789', 5)
Aha! we need take into account the length of the ellipsis:
def shorten_description(description, max_length):
if len(description) > max_length:
ellipsis = '...'
return description[:(max_length - len(ellipsis))] + ellipsis
return description
name = 'Ben'
print(f'Hello {name}')
print(f'Hello {name.upper()}')
import pandas as pd
listings_df = pd.read_csv('https://pynoon.github.io/data/inside_airbnb_listings_nz_2023_09.csv')
listings_df
To transform a listing ID into a URL, we can do the following:
id = 'l11909616'
f'https://www.airbnb.co.nz/rooms/{id[1:]}'
Let’s define a function to transform a listing ID into a URL:
def id_to_url(id):
return f'https://www.airbnb.co.nz/rooms/{id[1:]}'
id_to_url('l11909616')
Calling .apply(id_to_url)
on a single column Series
passes each item in the Series to the function and returns a new Series
where each value is the corresponding value returned by the function. We
can then assign the resulting Series into a new url
column:
listings_df['url'] = listings_df['id'].apply(id_to_url)
listings_df
We can also use .apply()
with axis='columns
on an entire DataFrame to pass an entire row at a time to the
function:
def listing_to_description(row):
room_type = row['room_type']
host_name = row['host_name']
return f'{room_type} by {host_name}'
listings_df['description'] = listings_df.apply(listing_to_description, axis='columns')
listings_df
akl_listings_df = listings_df[listings_df['region_parent_name'] == 'Auckland']
akl_average_price = akl_listings_df['price_nzd'].median()
akl_above_average_price_df = akl_listings_df[akl_listings_df['price_nzd'] > akl_average_price]
display(akl_above_average_price_df)
wlg_listings_df = listings_df[listings_df['region_parent_name'] == 'Wellington City']
wlg_average_rating = wlg_listings_df['review_scores_rating'].median()
wlg_above_average_rating_df = wlg_listings_df[wlg_listings_df['review_scores_rating'] > wlg_average_rating]
display(wlg_above_average_rating_df)
def get_above_average_listings_df(listings_df, comparison_column):
"""Returns the subset of the given listings_df that is above average
according to the given comparison_column."""
average_value = listings_df[comparison_column].median()
return listings_df[listings_df[comparison_column] > average_value]
akl_above_average_price_df = get_above_average_listings_df(
listings_df=listings_df[listings_df['region_parent_name'] == 'Wellington City'],
comparison_column='price_nzd',
)
wlg_above_average_rating_df = get_above_average_listings_df(
listings_df=listings_df[listings_df['region_parent_name'] == 'Wellington City'],
comparison_column='review_scores_rating',
)
parent_region_name
argument, we
instead accept a listings_df
listings_df
local variable inside the function is separate to the
listings_df
global variable we have been using
outside the function.listings_df
is more versatile, because we
are not restricted to just filtering by region - we can pass in any
filtered (or even unfiltered) DataFrame of listings.