Loops¶
Objective¶
- Explain what for loops are normally used for.
- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
- Write for loops that use the Accumulator pattern to aggregate values.
A for loop executes commands once for each value in a collection.¶
- Doing calculations on the values in a list one by one
is as painful as working with
pressure_001,pressure_002, etc. - A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection.
- "for each thing in this group, do these operations"
In [1]:
for number in [2, 3, 5]:
print(number)
2 3 5
The first line of the for loop must end with a colon, and the body must be indented.
A for loop is made up of a collection, a loop variable, and a body.¶
- The collection,
[2, 3, 5], is what the loop is being run on. - The body,
print(number), specifies what to do for each value in the collection. - The loop variable,
number, is what changes for each iteration of the loop.- The "current thing".
Loop variables can be called anything.¶
- As with all variables, loop variables are:
- Created on demand.
- Meaningless: their names can be anything at all.
for kitten in [2, 3, 5]:
print(kitten)
Use range to iterate over a sequence of numbers.¶
- The built-in function
rangeproduces a sequence of numbers.- Not a list: the numbers are produced on demand to make looping over large ranges more efficient.
range(N)is the numbers 0..N-1- Exactly the legal indices of a list or character string of length N
In [2]:
print('a range is not a list: range(0, 3)')
for number in range(0, 3):
print(number)
a range is not a list: range(0, 3) 0 1 2
Exercise¶
- Fill in the blanks in each of the programs below to produce the indicated result.
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
____ = ____ + len(word)
print(total)
- Reorder and properly indent the lines of code below
so that they print a list with the cumulative sum of data.
The result should be
[1, 3, 5, 10].
cumulative.append(total)
for number in data:
cumulative = []
total = total + number
total = 0
print(cumulative)
data = [1,2,2,5]
Conditionals¶
Use if statements to control whether or not a block of code is executed.¶
- An
ifstatement (more properly called a conditional statement) controls whether some block of code is executed or not. - Structure is similar to a
forstatement:- First line opens with
ifand ends with a colon - Body containing one or more statements is indented (usually by 4 spaces)
- First line opens with
In [3]:
mass = 3.54
if mass > 3.0:
print(mass, 'is large')
mass = 2.07
if mass > 3.0:
print (mass, 'is large')
3.54 is large
Conditionals are often used inside loops.¶
- Not much point using a conditional when we know the value (as above).
- But useful when we have a collection to process.
In [4]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 3.0:
print(m, 'is large')
3.54 is large 9.22 is large
Use else to execute a block of code when an if condition is not true.¶
elsecan be used following anif.- Allows us to specify an alternative to execute when the
ifbranch isn't taken.
In [5]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 3.0:
print(m, 'is large')
else:
print(m, 'is small')
3.54 is large 2.07 is small 9.22 is large 1.86 is small 1.71 is small
Use elif to specify additional tests.¶
- May want to provide several alternative choices, each with its own test.
- Use
elif(short for "else if") and a condition to specify these. - Always associated with an
if. - Must come before the
else(which is the "catch all").
In [6]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 9.0:
print(m, 'is HUGE')
elif m > 3.0:
print(m, 'is large')
else:
print(m, 'is small')
3.54 is large 2.07 is small 9.22 is HUGE 1.86 is small 1.71 is small
Conditions are tested once, in order.¶
- Python steps through the branches of the conditional in order, testing each in turn.
- So ordering matters.
In [7]:
grade = 85
if grade >= 90:
print('grade is A')
elif grade >= 80:
print('grade is B')
elif grade >= 70:
print('grade is C')
grade is B
- Often use conditionals in a loop to "evolve" the values of variables.
In [8]:
velocity = 10.0
for i in range(5): # execute the loop 5 times
print(i, ':', velocity)
if velocity > 20.0:
print('moving too fast')
velocity = velocity - 5.0
else:
print('moving too slow')
velocity = velocity + 10.0
print('final velocity:', velocity)
0 : 10.0 moving too slow 1 : 20.0 moving too slow 2 : 30.0 moving too fast 3 : 25.0 moving too fast 4 : 20.0 moving too slow final velocity: 30.0
Exercise¶
- What does this program print?
pressure = 71.9
if pressure > 50.0:
pressure = 25.0
elif pressure <= 50.0:
pressure = 0.0
print(pressure)
- Modify this program so that it only processes files with fewer than 50 records.
import glob
import pandas as pd
for filename in glob.glob('data/*.csv'):
contents = pd.read_csv(filename)
____:
print(filename, len(contents))
- Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.
values = [...some test data...]
smallest, largest = None, None
for v in values:
if ____:
smallest, largest = v, v
____:
smallest = min(____, v)
largest = max(____, v)
print(smallest, largest)
Takeaway¶
- Use
ifstatements to control whether or not a block of code is executed. - Conditionals are often used inside loops.
- Use
elseto execute a block of code when anifcondition is not true. - Use
elifto specify additional tests. - Conditions are tested once, in order.
- Create a table showing variables' values to trace a program's execution.
Looping over datasets¶
Objective¶
- Be able to read and write globbing expressions that match sets of files.
- Use glob to create lists of files.
- Write for loops to perform operations on files given their names in a list.
Use a for loop to process files given a list of their names.¶
- A filename is a character string.
- And lists can contain character strings.
In [9]:
import pandas as pd
for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
data = pd.read_csv(filename, index_col='country')
print(filename, data.min())
data/gapminder_gdp_africa.csv gdpPercap_1952 298.846212 gdpPercap_1957 335.997115 gdpPercap_1962 355.203227 gdpPercap_1967 412.977514 gdpPercap_1972 464.099504 gdpPercap_1977 502.319733 gdpPercap_1982 462.211415 gdpPercap_1987 389.876185 gdpPercap_1992 410.896824 gdpPercap_1997 312.188423 gdpPercap_2002 241.165876 gdpPercap_2007 277.551859 dtype: float64 data/gapminder_gdp_asia.csv gdpPercap_1952 331.0 gdpPercap_1957 350.0 gdpPercap_1962 388.0 gdpPercap_1967 349.0 gdpPercap_1972 357.0 gdpPercap_1977 371.0 gdpPercap_1982 424.0 gdpPercap_1987 385.0 gdpPercap_1992 347.0 gdpPercap_1997 415.0 gdpPercap_2002 611.0 gdpPercap_2007 944.0 dtype: float64
Use glob.glob to find sets of files whose names match a pattern.¶
- In Unix, the term "globbing" means "matching a set of files with a pattern".
- The most common patterns are:
*meaning "match zero or more characters"?meaning "match exactly one character"
- Python's standard library contains the
globmodule to provide pattern matching functionality - The
globmodule contains a function also calledglobto match file patterns - E.g.,
glob.glob('*.txt')matches all files in the current directory whose names end with.txt. - Result is a (possibly empty) list of character strings.
In [11]:
import glob
print('all csv files in data directory:', glob.glob('data/*.csv'))
all csv files in data directory: ['data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_europe.csv', 'data/gapminder_all.csv', 'data/gapminder_gdp_oceania.csv', 'data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']
Use glob and for to process batches of files.¶
- Helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.
In [12]:
for filename in glob.glob('data/gapminder_*.csv'):
data = pd.read_csv(filename)
print(filename, data['gdpPercap_1952'].min())
data/gapminder_gdp_americas.csv 1397.717137 data/gapminder_gdp_europe.csv 973.5331948 data/gapminder_all.csv 298.8462121 data/gapminder_gdp_oceania.csv 10039.59564 data/gapminder_gdp_africa.csv 298.8462121 data/gapminder_gdp_asia.csv 331.0
Exercise¶
Which of these files is not matched by the expression
glob.glob('data/*as*.csv')?data/gapminder_gdp_africa.csvdata/gapminder_gdp_americas.csvdata/gapminder_gdp_asia.csv
- Write a program that reads in the regional data sets and plots the average GDP per capita for each region over time in a single chart. Pandas will raise an error if it encounters non-numeric columns in a dataframe computation so you may need to either filter out those columns or tell pandas to ignore them.
Solution!¶
import glob
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1)
for filename in glob.glob('data/gapminder_gdp*.csv'):
dataframe = pd.read_csv(filename)
# extract <region> from the filename, expected to be in the format 'data/gapminder_gdp_<region>.csv'.
# we will split the string using the split method and `_` as our separator,
# retrieve the last string in the list that split returns (`<region>.csv`),
# and then remove the `.csv` extension from that string.
# NOTE: the pathlib module covered in the next callout also offers
# convenient abstractions for working with filesystem paths and could solve this as well:
# from pathlib import Path
# region = Path(filename).stem.split('_')[-1]
region = filename.split('_')[-1][:-4]
# extract the years from the columns of the dataframe
headings = dataframe.columns[1:]
years = headings.str.split('_').str.get(1)
# pandas raises errors when it encounters non-numeric columns in a dataframe computation
# but we can tell pandas to ignore them with the `numeric_only` parameter
dataframe.mean(numeric_only=True).plot(ax=ax, label=region)
# NOTE: another way of doing this selects just the columns with gdp in their name using the filter method
# dataframe.filter(like="gdp").mean().plot(ax=ax, label=region)
# set the title and labels
ax.set_title('GDP Per Capita for Regions Over Time')
ax.set_xticks(range(len(years)))
ax.set_xticklabels(years)
ax.set_xlabel('Year')
plt.tight_layout()
plt.legend()
plt.show()
Dealing with File Paths¶
The pathlib module provides useful abstractions for file and path manipulation like
returning the name of a file without the file extension. This is very useful when looping over files and
directories. In the example below, we create a Path object and inspect its attributes.
In [13]:
from pathlib import Path
p = Path("data/gapminder_gdp_africa.csv")
print(p.parent)
print(p.stem)
print(p.suffix)
data gapminder_gdp_africa .csv
Takeaway¶
- Use a
forloop to process files given a list of their names. - Use
glob.globto find sets of files whose names match a pattern. - Use
globandforto process batches of files.