Chapter 7 Files
File Processing
www.py4inf.com/code/mbox-short.txt
open() returns a "file handle" - a variable used to perform operations on the file
handle is a connection. doesn't actually have data
kind of like "File -> Open" in a Word Processor
handle = open(filename, mode) #filename is a string
fhand = open('mbox.txt','r') #'r':reading,'w':writing
output:stuff = 'Hello\nWorld' #\n means new line, and it is counted as a character print stuff print len(stuff)
Hello World 11
Counting Lines in a File
important! A file handle ope for read can be treated as a sequence of strings where each line in the file is a string in the sequencexfile = open('/Users/huyifan/documents/mbox.txt') count = 0 for line in xfile: count = count + 1 print 'Line Count:',count
Reading the *Whole* File
Use read() to read the whole file (newlines and all) into a single string.
output:xfile = open('/Users/huyifan/documents/mbox.txt') inp = xfile.read() print len(inp) print inp[:20]
94625 From stephen.marquar
Searching Through a Fileoutput:for line in xfile: if line.startswith('Author'): print line
We can use the rstrip() or lstrip().Author: stephen.marquard@uct.ac.za #why there are black lines? because of \n ! Author: louis@media.berkeley.edu Author: zqian@umich.edu Author: rjlowe@iupui.edu Author: zqian@umich.edu Author: rjlowe@iupui.edu
for line in xfile: if line.startswith('A'): print line.rstrip()
output:
Author: david.horwitz@uct.ac.za Author: stephen.marquard@uct.ac.za Author: louis@media.berkeley.edu Author: louis@media.berkeley.edu Author: ray@media.berkeley.edu Author: cwen@iupui.edu Author: cwen@iupui.edu Author: cwen@iupui.edu
Skipping with continue
for line in xfile: if not line.startswith('Author'): #or: if not '@media.berkeley.edu' in line: continue #process our 'interesting lines' print line.rstrip()
The output will be the same.
Bad File Names
fname = raw_input('Enter the file name:') try: fhand = open(fname) except: print 'File cannot be openned:', fname exit() count = 0 for line in fhand: if line.startswith('Author:'): count = count + 1 print 'There were', count, 'pieces of important information'
Chapter 8 Lists
8.1 basic understanding and operations
A list is a kind of collection.
A collection allows us to put many values in a single "variable"
like:friends = ['Joe','Joey','Joseph']
A list element can be any Python object - even another list
print [[1,2],3,4]
A list can be empty
print []
Strings are "immutable" - we cannot change change the contents of a string - we must make a new string to make any change
Lists are "mutable" - we can change an element of a list using the index operator
len()
print len([[1,2],3,4])
Using the range function: generate a list
output:print range(4) # often be used as: for i in range(len(friends)) friend[i]
[0, 1, 2, 3]
Concatenating list using +
output:print range(4)+[2,3]
[0, 1, 2, 3, 2, 3]
Slice lists: like 'strings'
8.2 Build-in Functions
Building a list from scratch: xxx.append()
stuff = list() stuff.append('book') stuff.append(99) print stuff
output:
['book', 99]
xxx in xxxx?
Order the list
friends = ['Joey','Joe','Joseph','Mike','Ellen']
friends.sort()
print friends
output:
['Ellen', 'Joe', 'Joey', 'Joseph', 'Mike'] #important: it has changed itself!
The sort method (unlike in strings) means "sort yourself"
other useful build-in functions
len(numbers), max(numbers), min(numbers), sum(numbers)
Strings -> Lists: Use split()/split(';')/... to break a string into parts and produces a list of strings. (leave out all the comma/space..)
abc = 'With three words'
stuff = abc.split()
print stuff
print len(stuff)
output:['With', 'three', 'words']
3
abc = 'With;three;words'
stuff1 = abc.split()
stuff2 = abc.split(';') #important!
print stuff1
print stuff2
print len(stuff1)
print len(stuff2)
output:['With;three;words']
['With', 'three', 'words']
1
3