Real World Haskell - Chapter 9. I/O Case Study: A Library for Searching the Filesystem

本文介绍了一个用Haskell编写的文件系统搜索库,详细解释了如何递归地搜索目录并筛选文件。通过定义一系列谓词和组合器,实现灵活且纯净的文件搜索功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

 

 

 

Chapter 9. I/O Case Study: A Library for Searching the Filesystem

 

The find Command

 

Haskell 中参数位置非常重要,如果把参数放在了错误的位置上就会失去partial application gives 了。

 

给定一个目录的列表,find 命令递归的搜索每一个目录,并打印所有匹配的文件名。

 

搜索条件可以是名字与“glob pattern”匹配的,entry 是普通文件,最后修改日期等。利用“and”,“or”操作符,这些条件还可以组合使用,形成更复杂的条件。

 

Starting Simple: Recursively Listing a Directory

 

递归列出一个目录及其子目录的内容

 

filter 语句确保一个目录的listing 不含特殊的目录名“. 或“..”,如果我们忘记过滤这些就会造成无限递归。

 

{-

ghci> :m +Control.Monad

ghci> :type mapM

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

ghci> :type forM

forM :: (Monad m) => [a] -> (a -> m b) -> m [b]

-}

 

循环体检查当前项是否是一个目录,如果是就递归地调用getRecursiveContents list 那个目录。否则,它return 当前entry 的名字。(不要忘记return 的含义,它将东西封装进Monad)


 

递归打印目录及其子目录的内容

 

module RecursiveContents (getRecursiveContents) where

 

import Control.Monad (forM)

import System.Directory (doesDirectoryExist, getDirectoryContents)

import System.FilePath ((</>))

 

getRecursiveContents :: FilePath -> IO [FilePath]

getRecursiveContents topdir = do

    names <- getDirectoryContents topdir

    let properNames = filter (`notElem` [".", ".."]) names

    paths <- forM properNames $ /name -> do

        let path = topdir </> name

        isDirectory <- doesDirectoryExist path

        if isDirectory

            then getRecursiveContents path

            else return [path]

    return (concat paths)

 

-- contents <- getRecursiveContents "c:/haskell" 

-- >["c:/haskell//hello.lkshw","c:/haskell//hello.lkshs",...]


 

Revisiting Anonymous and Named Functions

 

尽管在前面的章节中我们列出了一些不要使用匿名函数的理由,但在这里我们在循环体中使用了匿名函数。

 

在循环体中使用匿名函数是Haskell 中的匿名函最常见的应用。

 

forM mapM 接受一个函数作为参数,多数循环体中的代码块在程序中只出现一次。既然只在一个地方使用就没必要给它命名。

 

Why Provide Both mapM and forM?

 

如果循环很短,但是data 很长,就用mapM

 

如果data很短,但是循环很长,就用forM

 

如果都很长,就是let where 将其中一个变短。


 

A Naive Finding Function

 

takeExtension 函数用于从文件名中提取扩展名

 

文件查找器

 

-- SimpleFinder.hs

module SimpleFinder (simpleFind) where

 

import RecursiveContents (getRecursiveContents)

 

simpleFind :: (FilePath -> Bool) -> FilePath -> IO [FilePath]

simpleFind p path = do

    names <- getRecursiveContents path

    return (filter p names)

 

--------------------------------------------------

-- main.hs

import SimpleFinder

import System.FilePath

 

files <- simpleFind (/p -> takeExtension p == ".hs") "c:/haskell"

-- > ["c:/haskell//hello//src//SimpleFinder.hs","c:/haskell//hello//src//Setup.hs",...]

 

这个程序存在很多问题,比如不能区分以".hs" 结尾的目录和文件,文件系统的遍历方式不受控制等等,这些问题我们将会在后面的章节中解决。


 

Predicates: From Poverty to Riches, While Remaining Pure

 

使用doesFileExist doesDirectoryExist函数来判断一个entry 是文件还是目录。

 

使用getPermissions 函数来检测一个对文或目录操作是否合法

 

使用getModificationTime 函数来获取一个entry 的最后修改时间。

 

System.Posix System.Win32 模块提供了特定操作系统的更多功能。Hackage 上还存在unix-compat 这个库,为Windows 提供了类Unix API的功能。

 

type Predicate = FilePath -- path to directory entry

               -> Permissions -- permissions

               -> Maybe Integer -- file size (Nothing if not file)

               -> ClockTime -- last modified

               -> Bool

 

Predicate 就是一个接受四参数的函数,这样写是为了节约击键数和空间。

 

filterM 函数与filter 类似,不同的是filterM 允许其布尔函数执行I/O


 

Sizing a File Safely

 

获取文件大小(后面有改进版)

{-

simpleFileSize :: FilePath -> IO Integer

simpleFileSize path = do

    h <- openFile path ReadMode

    size <- hFileSize h

    hClose h

    return size

-}

 

获取文件大小(后面有改进版)

{-

saferFileSize :: FilePath -> IO (Maybe Integer)

saferFileSize path = handle (/_ -> return Nothing) $ do

    h <- openFile path ReadMode

    size <- hFileSize h

    hClose h

    return (Just size)

-}

 

The Acquire-Use-Release Cycle

 

获取文件大小

 

getFileSize :: FilePath -> IO (Maybe Integer)

getFileSize path = handle (/_ -> return Nothing) $

    bracket (openFile path ReadMode) hClose $ /h -> do

        size <- hFileSize h

        return (Just size)

 

Control.Exception 模块提供了一个bracket 函数,这个函数以三个action 作为参数。

 

bracket 函数的第一个参数会产生一个source ,第二个参数会释放那个source,第三个参数会使用那个source


 

A Domain-Specific Language for Predicates

 

-- 待改进

{-

myTest path _ (Just size) _ =

    takeExtension path == ".cpp" && size > 131072

myTest _ _ _ _ = False

-}

 

type InfoP a = FilePath -- path to directory entry

             -> Permissions -- permissions

             -> Maybe Integer -- file size (Nothing if not file)

             -> ClockTime -- last modified

             -> a

 

pathP :: InfoP FilePath

pathP path _ _ _ = path

 

sizeP :: InfoP Integer

sizeP _ _ (Just size) _ = size

sizeP _ _ Nothing _ = -1

 

equalP :: (Eq a) => InfoP a -> a -> InfoP Bool

equalP f k = /w x y z -> f w x y z == k

 

equalP' :: (Eq a) => InfoP a -> a -> InfoP Bool

equalP' f k w x y z = f w x y z == k

 

Avoiding Boilerplate with Lifting

 

liftP :: (a -> b -> c) -> InfoP a -> b -> InfoP c

liftP q f k w x y z = f w x y z `q` k

 

greaterP, lesserP :: (Ord a) => InfoP a -> a -> InfoP Bool

greaterP = liftP (>)

lesserP = liftP (<)

 

Haskell 中参数位置非常重要,如果把参数放在了错误的位置上就会失去partial application gives 了。

 

Gluing Predicates Together

 

Haskell 中我们将那些以其它函数作为参数,并返回新函数的函数称为combinators

 

simpleAndP :: InfoP Bool -> InfoP Bool -> InfoP Bool

simpleAndP f g w x y z = f w x y z && g w x y z

 

liftP2 :: (a -> b -> c) -> InfoP a -> InfoP b -> InfoP c

liftP2 q f g w x y z = f w x y z `q` g w x y z

 

andP = liftP2 (&&)

orP = liftP2 (||)

 

constP :: a -> InfoP a

constP k _ _ _ _ = k

 

liftP' q f k w x y z = f w x y z `q` constP k w x y z

 

myTest path _ (Just size) _ =

    takeExtension path == ".cpp" && size > 131072

myTest _ _ _ _ = False

 

liftPath :: (FilePath -> a) -> InfoP a

liftPath f w _ _ _ = f w

 

myTest2 = (liftPath takeExtension `equalP` ".cpp") `andP`

    (sizeP `greaterP` 131072)

 

Defining and Using New Operators

 

(==?) = equalP

(&&?) = andP

(>?) = greaterP

 

myTest3 = (liftPath takeExtension ==? ".cpp") &&? (sizeP >? 131072)

 

infix 4 ==?

infixr 3 &&?

infix 4 >?

myTest4 = liftPath takeExtension ==? ".cpp" &&? sizeP >? 131072


 

Controlling Traversal

 

data Info = Info {

    infoPath :: FilePath

  , infoPerms :: Maybe Permissions

  , infoSize :: Maybe Integer

  , infoModTime :: Maybe ClockTime

  } deriving (Eq, Ord, Show)

 

traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]

traverse order path = do

    names <- getUsefulContents path

    contents <- mapM getInfo (path : map (path </>) names)

    liftM concat $ forM (order contents) $ /info -> do

    if isDirectory info && infoPath info /= path

       then traverse order (infoPath info)

       else return [info]

 

getUsefulContents :: FilePath -> IO [String]

getUsefulContents path = do

    names <- getDirectoryContents path

    return (filter (`notElem` [".", ".."]) names)

 

isDirectory :: Info -> Bool

isDirectory = maybe False searchable . infoPerms

 

maybeIO :: IO a -> IO (Maybe a)

maybeIO act = handle (/_ -> return Nothing) (Just `liftM` act)

 

getInfo :: FilePath -> IO Info

getInfo path = do

    perms <- maybeIO (getPermissions path)

    size <- maybeIO (bracket (openFile path ReadMode) hClose hFileSize)

    modified <- maybeIO (getModificationTime path)

return (Info path perms size modified)

 

infos <- getInfo "c:/haskell//hello.lkshw" 

    -- >Info {infoPath = "c:/haskell//hello.lkshw", infoPerms = Just (Permissions {readable = True, writable = True, executable = False, searchable = False}), infoSize = Just 364, infoModTime = Just Sun Feb 27 14:23:54 Öйú±ê׼ʱ¼ä 2011}

Density, Readability, and the Learning Process

 

traverseVerbose order path = do

    names <- getDirectoryContents path

    let usefulNames = filter (`notElem` [".", ".."]) names

    contents <- mapM getEntryName ("" : usefulNames)

    recursiveContents <- mapM recurse (order contents)

    return (concat recursiveContents)

  where getEntryName name = getInfo (path </> name)

        isDirectory info = case infoPerms info of

                            Nothing -> False

                            Just perms -> searchable perms

        recurse info = do

            if isDirectory info && infoPath info /= path

               then traverseVerbose order (infoPath info)

               else return [info]

 

Another Way of Looking at Traversal

 

data Iterate seed = Done { unwrap :: seed }

                  | Skip { unwrap :: seed }

                  | Continue { unwrap :: seed }

                    deriving (Show)

                   

type Iterator seed = seed -> Info -> Iterate seed

 

foldTree :: Iterator a -> a -> FilePath -> IO a

foldTree iter initSeed path = do

    endSeed <- fold initSeed path

    return (unwrap endSeed)

  where

    fold seed subpath = getUsefulContents subpath >>= walk seed

   

    walk seed (name:names) = do

        let path' = path </> name

        info <- getInfo path'

        case iter seed info of

            done@(Done _) -> return done

            Skip seed' -> walk seed' names

            Continue seed'

                | isDirectory info -> do

                    next <- fold seed' path'

                    case next of

                        done@(Done _) -> return done

                        seed'' -> walk (unwrap seed'') names

                | otherwise -> walk seed' names

    walk seed _ = return (Continue seed)

 

 

atMostThreePictures :: Iterator [FilePath]

atMostThreePictures paths info

    | length paths == 3

      = Done paths

    | isDirectory info && takeFileName path == ".svn"

      = Skip paths

    | extension `elem` [".jpg", ".png"]

      = Continue (path : paths)

    | otherwise

      = Continue paths

  where extension = map toLower (takeExtension path)

        path = infoPath info

 

countDirectories count info =

    Continue (if isDirectory info

              then count + 1

              else count)

 

这里传给foldTree 的初始seed 应是0

 

Useful Coding Guidelines

 

 

Common Layout Styles

 


 

检测给定的目录下有多少个目录(非递归)

 

-- foldTree countDirectories 0 "c:/haskell" >>= print  -- >1  -- 如果haskell 目录下有两个目录就会输出2

-- getUsefulContents "c:/haskell" >>= print  -- ["hello.lkshw","hello.lkshs","hello"]

-- walk 0 ("hello.lkshw":["hello.lkshs","hello"]) 

-- let path' = "c:/haskell" </> "hello.lkshw"  -- "c:/haskell//hello.lkshw"

-- info <- getInfo path'  -- Info {infoPath = "c:/haskell//hello.lkshw", infoPerms = Just (Permissions {readable = True, writable = True, executable = False, searchable = False}), infoSize = Just 364, infoModTime = Just Sun Feb 27 14:23:54 Öйú±ê׼ʱ¼ä 2011}

-- countDirectories 0 info  -- Continue {unwrap = 0}

-- isDirectory info  -- False

-- walk 0 ["hello.lkshs","hello"]

 

获取给定目录下所有以“.bat 结尾的文件(递归?)

 

-- foldTree atMostThreePictures [] "C:/haskell//hello//src" >>= print -- ["C:/haskell//hello//src//RenameFiles.hs","C:/haskell//hello//src//Setup.hs","C:/haskell//hello//src//SimpleFinder.hs"]

 

-- BetterPredicate

-----------------------------------------------------------------------

module BetterPredicate (betterFind, getInfo, Info(..), countDirectories, isDirectory, getInfo, foldTree, atMostThreePictures, getUsefulContents) where

 

import Control.Monad (filterM, forM, liftM)

import System.Directory (Permissions(..), getModificationTime, getPermissions, getDirectoryContents)

import System.Time (ClockTime(..))

import System.FilePath --(takeExtension)

import Control.OldException (bracket, handle)

import System.IO (IOMode(..), hClose, hFileSize, openFile)

import Data.Char

 

-- the function we wrote earlier

import RecursiveContents (getRecursiveContents)

 

type Predicate = FilePath -- path to directory entry

               -> Permissions -- permissions

               -> Maybe Integer -- file size (Nothing if not file)

               -> ClockTime -- last modified

               -> Bool

 

getFileSize :: FilePath -> IO (Maybe Integer)

getFileSize path = handle (/_ -> return Nothing) $

    bracket (openFile path ReadMode) hClose $ /h -> do

        size <- hFileSize h

        return (Just size)

 

betterFind :: Predicate -> FilePath -> IO [FilePath]

betterFind p path = getRecursiveContents path >>= filterM check

    where check name = do

            perms <- getPermissions name

            size <- getFileSize name

            modified <- getModificationTime name

            return (p name perms size modified)

 

type InfoP a = FilePath -- path to directory entry

             -> Permissions -- permissions

             -> Maybe Integer -- file size (Nothing if not file)

             -> ClockTime -- last modified

             -> a

 

pathP :: InfoP FilePath

pathP path _ _ _ = path

 

sizeP :: InfoP Integer

sizeP _ _ (Just size) _ = size

sizeP _ _ Nothing _ = -1

 

equalP :: (Eq a) => InfoP a -> a -> InfoP Bool

equalP f k = /w x y z -> f w x y z == k

 

equalP' :: (Eq a) => InfoP a -> a -> InfoP Bool

equalP' f k w x y z = f w x y z == k

 

 

 

liftP :: (a -> b -> c) -> InfoP a -> b -> InfoP c

liftP q f k w x y z = f w x y z `q` k

 

greaterP, lesserP :: (Ord a) => InfoP a -> a -> InfoP Bool

greaterP = liftP (>)

lesserP = liftP (<)

 

simpleAndP :: InfoP Bool -> InfoP Bool -> InfoP Bool

simpleAndP f g w x y z = f w x y z && g w x y z

 

liftP2 :: (a -> b -> c) -> InfoP a -> InfoP b -> InfoP c

liftP2 q f g w x y z = f w x y z `q` g w x y z

 

andP = liftP2 (&&)

orP = liftP2 (||)

 

constP :: a -> InfoP a

constP k _ _ _ _ = k

 

liftP' q f k w x y z = f w x y z `q` constP k w x y z

 

myTest path _ (Just size) _ =

    takeExtension path == ".cpp" && size > 131072

myTest _ _ _ _ = False

 

liftPath :: (FilePath -> a) -> InfoP a

liftPath f w _ _ _ = f w

 

myTest2 = (liftPath takeExtension `equalP` ".cpp") `andP`

    (sizeP `greaterP` 131072)

 

 

(==?) = equalP

(&&?) = andP

(>?) = greaterP

 

myTest3 = (liftPath takeExtension ==? ".cpp") &&? (sizeP >? 131072)

 

infix 4 ==?

infixr 3 &&?

infix 4 >?

myTest4 = liftPath takeExtension ==? ".cpp" &&? sizeP >? 131072

 

data Info = Info {

    infoPath :: FilePath

  , infoPerms :: Maybe Permissions

  , infoSize :: Maybe Integer

  , infoModTime :: Maybe ClockTime

  } deriving (Eq, Ord, Show)

 

traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]

traverse order path = do

    names <- getUsefulContents path

    contents <- mapM getInfo (path : map (path </>) names)

    liftM concat $ forM (order contents) $ /info -> do

    if isDirectory info && infoPath info /= path

       then traverse order (infoPath info)

       else return [info]

 

getUsefulContents :: FilePath -> IO [String]

getUsefulContents path = do

    names <- getDirectoryContents path

    return (filter (`notElem` [".", ".."]) names)

 

isDirectory :: Info -> Bool

isDirectory = maybe False searchable . infoPerms

 

maybeIO :: IO a -> IO (Maybe a)

maybeIO act = handle (/_ -> return Nothing) (Just `liftM` act)

 

getInfo :: FilePath -> IO Info

getInfo path = do

    perms <- maybeIO (getPermissions path)

    size <- maybeIO (bracket (openFile path ReadMode) hClose hFileSize)

    modified <- maybeIO (getModificationTime path)

    return (Info path perms size modified)

 

traverseVerbose order path = do

    names <- getDirectoryContents path

    let usefulNames = filter (`notElem` [".", ".."]) names

    contents <- mapM getEntryName ("" : usefulNames)

    recursiveContents <- mapM recurse (order contents)

    return (concat recursiveContents)

  where getEntryName name = getInfo (path </> name)

        isDirectory info = case infoPerms info of

                            Nothing -> False

                            Just perms -> searchable perms

        recurse info = do

            if isDirectory info && infoPath info /= path

               then traverseVerbose order (infoPath info)

               else return [info]

 

data Iterate seed = Done { unwrap :: seed }

                  | Skip { unwrap :: seed }

                  | Continue { unwrap :: seed }

                    deriving (Show)

                    

type Iterator seed = seed -> Info -> Iterate seed

 

foldTree :: Iterator a -> a -> FilePath -> IO a

foldTree iter initSeed path = do

    endSeed <- fold initSeed path

    return (unwrap endSeed)

  where

    fold seed subpath = getUsefulContents subpath >>= walk seed  -- 亮点,fold 的作用是帮其它两函数holding 参数

   

    walk seed (name:names) = do

        let path' = path </> name

        info <- getInfo path'

        case iter seed info of

            done@(Done _) -> return done

            Skip seed' -> walk seed' names

            Continue seed'

                | isDirectory info -> do

                    next <- fold seed' path'

                    case next of

                        done@(Done _) -> return done

                        seed'' -> walk (unwrap seed'') names

                | otherwise -> walk seed' names

    walk seed _ = return (Continue seed)

 

 

atMostThreePictures :: Iterator [FilePath]

atMostThreePictures paths info

    | length paths == 3  -- 控制深度?

      = Done paths

    | isDirectory info && takeFileName path == ".svn"

      = Skip paths

    | extension `elem` [".bat", ".png"]  -- 搜索条件

      = Continue (path : paths)

    | otherwise

      = Continue paths

  where extension = map toLower (takeExtension path)

        path = infoPath info

 

countDirectories count info =

    Continue (if isDirectory info

              then count + 1

              else count)

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值