Chapter 9. I/O Case Study: A Library for Searching the Filesystem
The find Command
Haskell 中参数位置非常重要,如果把参数放在了错误的位置上就会失去partial application 的gives 了。
给定一个目录的列表,find 命令递归的搜索每一个目录,并打印所有匹配的文件名。
搜索条件可以是名字与“glob pattern”匹配的,entry 是普通文件,最后修改日期等。利用“and”,“or”操作符,这些条件还可以组合使用,形成更复杂的条件。
Starting Simple: Recursively Listing a Directory
递归列出一个目录及其子目录的内容
filter 语句确保一个目录的listing 不含特殊的目录名“.” 或“..”,如果我们忘记过滤这些就会造成无限递归。
{-
ghci> :m +Control.Monad
ghci> :type mapM
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
ghci> :type forM
forM :: (Monad m) => [a] -> (a -> m b) -> m [b]
-}
循环体检查当前项是否是一个目录,如果是就递归地调用getRecursiveContents 去list 那个目录。否则,它return 当前entry 的名字。(不要忘记return 的含义,它将东西封装进Monad)
递归打印目录及其子目录的内容
module RecursiveContents (getRecursiveContents) where
import Control.Monad (forM)
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath ((</>))
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ /name -> do
let path = topdir </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
-- contents <- getRecursiveContents "c:/haskell"
-- >["c:/haskell//hello.lkshw","c:/haskell//hello.lkshs",...]
Revisiting Anonymous and Named Functions
尽管在前面的章节中我们列出了一些不要使用匿名函数的理由,但在这里我们在循环体中使用了匿名函数。
在循环体中使用匿名函数是Haskell 中的匿名函最常见的应用。
forM 和 mapM 接受一个函数作为参数,多数循环体中的代码块在程序中只出现一次。既然只在一个地方使用就没必要给它命名。
Why Provide Both mapM and forM?
如果循环很短,但是data 很长,就用mapM。
如果data很短,但是循环很长,就用forM。
如果都很长,就是let 和where 将其中一个变短。
A Naive Finding Function
takeExtension 函数用于从文件名中提取扩展名
文件查找器
-- SimpleFinder.hs
module SimpleFinder (simpleFind) where
import RecursiveContents (getRecursiveContents)
simpleFind :: (FilePath -> Bool) -> FilePath -> IO [FilePath]
simpleFind p path = do
names <- getRecursiveContents path
return (filter p names)
--------------------------------------------------
-- main.hs
import SimpleFinder
import System.FilePath
files <- simpleFind (/p -> takeExtension p == ".hs") "c:/haskell"
-- > ["c:/haskell//hello//src//SimpleFinder.hs","c:/haskell//hello//src//Setup.hs",...]
这个程序存在很多问题,比如不能区分以".hs" 结尾的目录和文件,文件系统的遍历方式不受控制等等,这些问题我们将会在后面的章节中解决。
Predicates: From Poverty to Riches, While Remaining Pure
使用doesFileExist 和doesDirectoryExist函数来判断一个entry 是文件还是目录。
使用getPermissions 函数来检测一个对文或目录操作是否合法
使用getModificationTime 函数来获取一个entry 的最后修改时间。
System.Posix 和 System.Win32 模块提供了特定操作系统的更多功能。Hackage 上还存在unix-compat 这个库,为Windows 提供了类Unix API的功能。
type Predicate = FilePath -- path to directory entry
-> Permissions -- permissions
-> Maybe Integer -- file size (Nothing if not file)
-> ClockTime -- last modified
-> Bool
Predicate 就是一个接受四参数的函数,这样写是为了节约击键数和空间。
filterM 函数与filter 类似,不同的是filterM 允许其布尔函数执行I/O 。
Sizing a File Safely
获取文件大小(后面有改进版)
{-
simpleFileSize :: FilePath -> IO Integer
simpleFileSize path = do
h <- openFile path ReadMode
size <- hFileSize h
hClose h
return size
-}
获取文件大小(后面有改进版)
{-
saferFileSize :: FilePath -> IO (Maybe Integer)
saferFileSize path = handle (/_ -> return Nothing) $ do
h <- openFile path ReadMode
size <- hFileSize h
hClose h
return (Just size)
-}
The Acquire-Use-Release Cycle
获取文件大小
getFileSize :: FilePath -> IO (Maybe Integer)
getFileSize path = handle (/_ -> return Nothing) $
bracket (openFile path ReadMode) hClose $ /h -> do
size <- hFileSize h
return (Just size)
Control.Exception 模块提供了一个bracket 函数,这个函数以三个action 作为参数。
bracket 函数的第一个参数会产生一个source ,第二个参数会释放那个source,第三个参数会使用那个source 。
A Domain-Specific Language for Predicates
-- 待改进
{-
myTest path _ (Just size) _ =
takeExtension path == ".cpp" && size > 131072
myTest _ _ _ _ = False
-}
type InfoP a = FilePath -- path to directory entry
-> Permissions -- permissions
-> Maybe Integer -- file size (Nothing if not file)
-> ClockTime -- last modified
-> a
pathP :: InfoP FilePath
pathP path _ _ _ = path
sizeP :: InfoP Integer
sizeP _ _ (Just size) _ = size
sizeP _ _ Nothing _ = -1
equalP :: (Eq a) => InfoP a -> a -> InfoP Bool
equalP f k = /w x y z -> f w x y z == k
equalP' :: (Eq a) => InfoP a -> a -> InfoP Bool
equalP' f k w x y z = f w x y z == k
Avoiding Boilerplate with Lifting
liftP :: (a -> b -> c) -> InfoP a -> b -> InfoP c
liftP q f k w x y z = f w x y z `q` k
greaterP, lesserP :: (Ord a) => InfoP a -> a -> InfoP Bool
greaterP = liftP (>)
lesserP = liftP (<)
Haskell 中参数位置非常重要,如果把参数放在了错误的位置上就会失去partial application 的gives 了。
Gluing Predicates Together
Haskell 中我们将那些以其它函数作为参数,并返回新函数的函数称为combinators 。
simpleAndP :: InfoP Bool -> InfoP Bool -> InfoP Bool
simpleAndP f g w x y z = f w x y z && g w x y z
liftP2 :: (a -> b -> c) -> InfoP a -> InfoP b -> InfoP c
liftP2 q f g w x y z = f w x y z `q` g w x y z
andP = liftP2 (&&)
orP = liftP2 (||)
constP :: a -> InfoP a
constP k _ _ _ _ = k
liftP' q f k w x y z = f w x y z `q` constP k w x y z
myTest path _ (Just size) _ =
takeExtension path == ".cpp" && size > 131072
myTest _ _ _ _ = False
liftPath :: (FilePath -> a) -> InfoP a
liftPath f w _ _ _ = f w
myTest2 = (liftPath takeExtension `equalP` ".cpp") `andP`
(sizeP `greaterP` 131072)
Defining and Using New Operators
(==?) = equalP
(&&?) = andP
(>?) = greaterP
myTest3 = (liftPath takeExtension ==? ".cpp") &&? (sizeP >? 131072)
infix 4 ==?
infixr 3 &&?
infix 4 >?
myTest4 = liftPath takeExtension ==? ".cpp" &&? sizeP >? 131072
Controlling Traversal
data Info = Info {
infoPath :: FilePath
, infoPerms :: Maybe Permissions
, infoSize :: Maybe Integer
, infoModTime :: Maybe ClockTime
} deriving (Eq, Ord, Show)
traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]
traverse order path = do
names <- getUsefulContents path
contents <- mapM getInfo (path : map (path </>) names)
liftM concat $ forM (order contents) $ /info -> do
if isDirectory info && infoPath info /= path
then traverse order (infoPath info)
else return [info]
getUsefulContents :: FilePath -> IO [String]
getUsefulContents path = do
names <- getDirectoryContents path
return (filter (`notElem` [".", ".."]) names)
isDirectory :: Info -> Bool
isDirectory = maybe False searchable . infoPerms
maybeIO :: IO a -> IO (Maybe a)
maybeIO act = handle (/_ -> return Nothing) (Just `liftM` act)
getInfo :: FilePath -> IO Info
getInfo path = do
perms <- maybeIO (getPermissions path)
size <- maybeIO (bracket (openFile path ReadMode) hClose hFileSize)
modified <- maybeIO (getModificationTime path)
return (Info path perms size modified)
infos <- getInfo "c:/haskell//hello.lkshw"
-- >Info {infoPath = "c:/haskell//hello.lkshw", infoPerms = Just (Permissions {readable = True, writable = True, executable = False, searchable = False}), infoSize = Just 364, infoModTime = Just Sun Feb 27 14:23:54 Öйú±ê׼ʱ¼ä 2011}
Density, Readability, and the Learning Process
traverseVerbose order path = do
names <- getDirectoryContents path
let usefulNames = filter (`notElem` [".", ".."]) names
contents <- mapM getEntryName ("" : usefulNames)
recursiveContents <- mapM recurse (order contents)
return (concat recursiveContents)
where getEntryName name = getInfo (path </> name)
isDirectory info = case infoPerms info of
Nothing -> False
Just perms -> searchable perms
recurse info = do
if isDirectory info && infoPath info /= path
then traverseVerbose order (infoPath info)
else return [info]
Another Way of Looking at Traversal
data Iterate seed = Done { unwrap :: seed }
| Skip { unwrap :: seed }
| Continue { unwrap :: seed }
deriving (Show)
type Iterator seed = seed -> Info -> Iterate seed
foldTree :: Iterator a -> a -> FilePath -> IO a
foldTree iter initSeed path = do
endSeed <- fold initSeed path
return (unwrap endSeed)
where
fold seed subpath = getUsefulContents subpath >>= walk seed
walk seed (name:names) = do
let path' = path </> name
info <- getInfo path'
case iter seed info of
done@(Done _) -> return done
Skip seed' -> walk seed' names
Continue seed'
| isDirectory info -> do
next <- fold seed' path'
case next of
done@(Done _) -> return done
seed'' -> walk (unwrap seed'') names
| otherwise -> walk seed' names
walk seed _ = return (Continue seed)
atMostThreePictures :: Iterator [FilePath]
atMostThreePictures paths info
| length paths == 3
= Done paths
| isDirectory info && takeFileName path == ".svn"
= Skip paths
| extension `elem` [".jpg", ".png"]
= Continue (path : paths)
| otherwise
= Continue paths
where extension = map toLower (takeExtension path)
path = infoPath info
countDirectories count info =
Continue (if isDirectory info
then count + 1
else count)
这里传给foldTree 的初始seed 应是0 。
Useful Coding Guidelines
略
Common Layout Styles
略
检测给定的目录下有多少个目录(非递归)
-- foldTree countDirectories 0 "c:/haskell" >>= print -- >1 -- 如果haskell 目录下有两个目录就会输出2
-- getUsefulContents "c:/haskell" >>= print -- ["hello.lkshw","hello.lkshs","hello"]
-- walk 0 ("hello.lkshw":["hello.lkshs","hello"])
-- let path' = "c:/haskell" </> "hello.lkshw" -- "c:/haskell//hello.lkshw"
-- info <- getInfo path' -- Info {infoPath = "c:/haskell//hello.lkshw", infoPerms = Just (Permissions {readable = True, writable = True, executable = False, searchable = False}), infoSize = Just 364, infoModTime = Just Sun Feb 27 14:23:54 Öйú±ê׼ʱ¼ä 2011}
-- countDirectories 0 info -- Continue {unwrap = 0}
-- isDirectory info -- False
-- walk 0 ["hello.lkshs","hello"]
获取给定目录下所有以“.bat” 结尾的文件(递归?)
-- foldTree atMostThreePictures [] "C:/haskell//hello//src" >>= print -- ["C:/haskell//hello//src//RenameFiles.hs","C:/haskell//hello//src//Setup.hs","C:/haskell//hello//src//SimpleFinder.hs"]
-- BetterPredicate
-----------------------------------------------------------------------
module BetterPredicate (betterFind, getInfo, Info(..), countDirectories, isDirectory, getInfo, foldTree, atMostThreePictures, getUsefulContents) where
import Control.Monad (filterM, forM, liftM)
import System.Directory (Permissions(..), getModificationTime, getPermissions, getDirectoryContents)
import System.Time (ClockTime(..))
import System.FilePath --(takeExtension)
import Control.OldException (bracket, handle)
import System.IO (IOMode(..), hClose, hFileSize, openFile)
import Data.Char
-- the function we wrote earlier
import RecursiveContents (getRecursiveContents)
type Predicate = FilePath -- path to directory entry
-> Permissions -- permissions
-> Maybe Integer -- file size (Nothing if not file)
-> ClockTime -- last modified
-> Bool
getFileSize :: FilePath -> IO (Maybe Integer)
getFileSize path = handle (/_ -> return Nothing) $
bracket (openFile path ReadMode) hClose $ /h -> do
size <- hFileSize h
return (Just size)
betterFind :: Predicate -> FilePath -> IO [FilePath]
betterFind p path = getRecursiveContents path >>= filterM check
where check name = do
perms <- getPermissions name
size <- getFileSize name
modified <- getModificationTime name
return (p name perms size modified)
type InfoP a = FilePath -- path to directory entry
-> Permissions -- permissions
-> Maybe Integer -- file size (Nothing if not file)
-> ClockTime -- last modified
-> a
pathP :: InfoP FilePath
pathP path _ _ _ = path
sizeP :: InfoP Integer
sizeP _ _ (Just size) _ = size
sizeP _ _ Nothing _ = -1
equalP :: (Eq a) => InfoP a -> a -> InfoP Bool
equalP f k = /w x y z -> f w x y z == k
equalP' :: (Eq a) => InfoP a -> a -> InfoP Bool
equalP' f k w x y z = f w x y z == k
liftP :: (a -> b -> c) -> InfoP a -> b -> InfoP c
liftP q f k w x y z = f w x y z `q` k
greaterP, lesserP :: (Ord a) => InfoP a -> a -> InfoP Bool
greaterP = liftP (>)
lesserP = liftP (<)
simpleAndP :: InfoP Bool -> InfoP Bool -> InfoP Bool
simpleAndP f g w x y z = f w x y z && g w x y z
liftP2 :: (a -> b -> c) -> InfoP a -> InfoP b -> InfoP c
liftP2 q f g w x y z = f w x y z `q` g w x y z
andP = liftP2 (&&)
orP = liftP2 (||)
constP :: a -> InfoP a
constP k _ _ _ _ = k
liftP' q f k w x y z = f w x y z `q` constP k w x y z
myTest path _ (Just size) _ =
takeExtension path == ".cpp" && size > 131072
myTest _ _ _ _ = False
liftPath :: (FilePath -> a) -> InfoP a
liftPath f w _ _ _ = f w
myTest2 = (liftPath takeExtension `equalP` ".cpp") `andP`
(sizeP `greaterP` 131072)
(==?) = equalP
(&&?) = andP
(>?) = greaterP
myTest3 = (liftPath takeExtension ==? ".cpp") &&? (sizeP >? 131072)
infix 4 ==?
infixr 3 &&?
infix 4 >?
myTest4 = liftPath takeExtension ==? ".cpp" &&? sizeP >? 131072
data Info = Info {
infoPath :: FilePath
, infoPerms :: Maybe Permissions
, infoSize :: Maybe Integer
, infoModTime :: Maybe ClockTime
} deriving (Eq, Ord, Show)
traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]
traverse order path = do
names <- getUsefulContents path
contents <- mapM getInfo (path : map (path </>) names)
liftM concat $ forM (order contents) $ /info -> do
if isDirectory info && infoPath info /= path
then traverse order (infoPath info)
else return [info]
getUsefulContents :: FilePath -> IO [String]
getUsefulContents path = do
names <- getDirectoryContents path
return (filter (`notElem` [".", ".."]) names)
isDirectory :: Info -> Bool
isDirectory = maybe False searchable . infoPerms
maybeIO :: IO a -> IO (Maybe a)
maybeIO act = handle (/_ -> return Nothing) (Just `liftM` act)
getInfo :: FilePath -> IO Info
getInfo path = do
perms <- maybeIO (getPermissions path)
size <- maybeIO (bracket (openFile path ReadMode) hClose hFileSize)
modified <- maybeIO (getModificationTime path)
return (Info path perms size modified)
traverseVerbose order path = do
names <- getDirectoryContents path
let usefulNames = filter (`notElem` [".", ".."]) names
contents <- mapM getEntryName ("" : usefulNames)
recursiveContents <- mapM recurse (order contents)
return (concat recursiveContents)
where getEntryName name = getInfo (path </> name)
isDirectory info = case infoPerms info of
Nothing -> False
Just perms -> searchable perms
recurse info = do
if isDirectory info && infoPath info /= path
then traverseVerbose order (infoPath info)
else return [info]
data Iterate seed = Done { unwrap :: seed }
| Skip { unwrap :: seed }
| Continue { unwrap :: seed }
deriving (Show)
type Iterator seed = seed -> Info -> Iterate seed
foldTree :: Iterator a -> a -> FilePath -> IO a
foldTree iter initSeed path = do
endSeed <- fold initSeed path
return (unwrap endSeed)
where
fold seed subpath = getUsefulContents subpath >>= walk seed -- 亮点,fold 的作用是帮其它两函数holding 参数
walk seed (name:names) = do
let path' = path </> name
info <- getInfo path'
case iter seed info of
done@(Done _) -> return done
Skip seed' -> walk seed' names
Continue seed'
| isDirectory info -> do
next <- fold seed' path'
case next of
done@(Done _) -> return done
seed'' -> walk (unwrap seed'') names
| otherwise -> walk seed' names
walk seed _ = return (Continue seed)
atMostThreePictures :: Iterator [FilePath]
atMostThreePictures paths info
| length paths == 3 -- 控制深度?
= Done paths
| isDirectory info && takeFileName path == ".svn"
= Skip paths
| extension `elem` [".bat", ".png"] -- 搜索条件
= Continue (path : paths)
| otherwise
= Continue paths
where extension = map toLower (takeExtension path)
path = infoPath info
countDirectories count info =
Continue (if isDirectory info
then count + 1
else count)