Real World Haskell - Chapter 7. I/O

最新推荐文章于 2011-03-29 17:22:00 发布

原创最新推荐文章于 2011-03-29 17:22:00 发布 · 1.2k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#haskell #action #string #io #numbers #import

Haskell 专栏收录该内容

11 篇文章

订阅专栏

Chapter 7. I/O

使用<- 从IO 获取输入，使用let 从pure code 获取输入。

pure code 就是相同的输入返回相同的输出，并且没有side effects 的代码。在Haskell 中只有I/O actions 不遵循这一规则。

严格分隔pure code 和非pure cod 有利于编译器自动优化和并行化。

Classic I/O in Haskell

-- runghc.bat

@echo off

ghci main

main = do

putStrLn "Greetings! What is your name?"

inpStr <- getLine

putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"

$ runghc basicio.hs

Greetings! What is your name?

John

Welcome to Haskell, John!

putStrLn 会在输出一个String 后再输出一个换行。

ghci> let writefoo = putStrLn "foo"

ghci> writefoo

foo

foo 不是witefoo 的返回值，而是putStrLn 的一个side effect(这就是不纯的后果！)

main 函数本身就是一个IO ()，关键字do 表示后面的代码有side effect 。

name2reply :: String -> String

name2reply name =

"Pleased to meet you, " ++ name ++ "./n" ++

"Your name contains " ++ charcount ++ " characters."

where charcount = show (length name)

main :: IO ()

main = do

putStrLn "Greetings once again. What is your name?"

inpStr <- getLine

let outStr = name2reply inpStr

putStrLn outStr

使用<- 从IO 获取输入，使用let 从pure code 获取输入。

Pure Versus I/O

pure code 就是相同的输入返回相同的输出，并且没有side effects 的代码。在Haskell 中只有I/O actions 不遵循这一规则。

Why Purity Matters

略

Working with Files and Handles

openFile 函数(需要引入System.IO)会返回一个文件句柄。配套的hPutStrLn 用来往文件输出。用完后要用hClose 关闭句柄。

作为开始，让我们以命令式的方式来读和写文件。这看起来像你在其它语言看到过的while 循环。这不是的Haskell 的最佳方式，后面我们会改进它。

读写文件(循环的方式)

import System.IO

import Data.Char(toUpper)

main :: IO ()

main = do

inh <- openFile "input.txt" ReadMode

outh <- openFile "output.txt" WriteMode

mainloop inh outh

hClose inh

hClose outh

mainloop :: Handle -> Handle -> IO ()

mainloop inh outh =

do ineof <- hIsEOF inh

if ineof

then return ()

else do inpStr <- hGetLine inh

hPutStrLn outh (map toUpper inpStr)

mainloop inh outh

return 的含义与C 等语言的是不同的，return 与“<-” 是反义。return 接受一个pure value 并wraps 到IO 里。

所有I/O action 都必须返回IO type，如果你的结果是从pure computation 来的就必须return to wrap it in IO。

More on openFile

使用openBinaryFile 处理二进制文件。一些操作系统，如Windows 在处理二进制和文本文件上的行为是相当不同的。

Closing Handles

Haskell 会为文件在内部维护一个缓存，直到用hClose 关闭文件才会进行数据写入。

Seek and Tell

hTell 函数报告current position 前面有多少个字节。刚开始是0，读了5 字节以后就是5，等等。

hSeek 函数设置current position

hIsSeekable 函数用来检查一个Handle 是否可以seek 。

Standard Input, Output, and Error

getLine, print 的实现

getLine = hGetLine stdin

putStrLn = hPutStrLn stdout

print = hPrint stdout

使和echo 命令给程序输入参数

echo John|runghc callingpure.hs

Temporary Files

openTempFile, openBinaryTempFile,System.Directory.getTemporaryDirectory

Extended Example: Functional I/O and Temporary Files

如果 1 的输出正好可以作为2 的输入，就可以将两函数组合成复合函数。

在Leksah IDE中添加依赖包

Leksah ->Edit Package ->Dependencies ->Enter ->输入“directory” ->Add ->Save

读写临时文件

import System.IO

import System.Directory(getTemporaryDirectory, removeFile)

import System.IO.Error(catch)

import Control.Exception(finally)

main :: IO ()

main = withTempFile "mytemp.txt" myAction

myAction :: FilePath -> Handle -> IO ()

myAction tempname temph =

putStrLn "Welcome to tempfile.hs"

putStrLn $ "I have a temporary file at " ++ tempname

pos <- hTell temph

putStrLn $ "My initial position is " ++ show pos

let tempdata = show [1..10]

putStrLn $ "Writing one line containing " ++

show (length tempdata) ++ " bytes: " ++

tempdata

hPutStrLn temph tempdata

pos <- hTell temph

putStrLn $ "After writing, my new position is " ++ show pos

putStrLn $ "The file content is: "

hSeek temph AbsoluteSeek 0

c <- hGetContents temph

putStrLn c

putStrLn $ "Which could be expressed as this Haskell literal:"

print c

withTempFile :: String -> (FilePath -> Handle -> IO a) -> IO a

withTempFile pattern func =

tempdir <- catch (getTemporaryDirectory) (/_ -> return ".")

(tempfile, temph) <- openTempFile tempdir pattern -- pattern 就是"mytemp.txt"，mytemp976.txt 里的数字是系统加的，而且每次都不同。

finally (func tempfile temph)

(do hClose temph

removeFile tempfile)

hPutStrLn 与hPutStr 的区别是hPutStrLn 带换行。

Lazy I/O

Haskell 是lazy 语言，I/O 数据也仅在其值必须被known 时才evaluated 出来。

hGetContents

由于lazy 特性，一次整个读500 GB 大小的文件是可能的。

hGetContents 在你调用的时侯，实际上没有任何数据被读取。

用lazy I/O 来处理500G 的数据文件(后面有改进版)

import System.IO

import Data.Char(toUpper)

main :: IO ()

main = do

inh <- openFile "input.txt" ReadMode

outh <- openFile "output.txt" WriteMode

inpStr <- hGetContents inh

let result = processData inpStr

hPutStr outh result

hClose inh

hClose outh

processData :: String -> String

processData = map toUpper

用lazy I/O 来处理500G 的数据文件

import System.IO

import Data.Char(toUpper)

main = do

inh <- openFile "input.txt" ReadMode

outh <- openFile "output.txt" WriteMode

inpStr <- hGetContents inh

hPutStr outh (map toUpper inpStr)

hClose inh

hClose outh

readFile and writeFile

Haskell 程序员常用hGetContents 作为过滤器。他们读一个文件，过滤某些内容，然后再将结果写到别的什么地方。实际上使用readFile 和writeFile是实现过滤器的更简洁的方法。

readFile 在内部使用hGetContents

用lazy I/O 来处理500G 的数据文件

-- ab.bat

runhaskell main

cmd

------------------------------

-- input.txt

hello,world!

-------------------------------

-- main.hs

import Data.Char(toUpper)

main = do

inpStr <- readFile "input.txt"

writeFile "output.txt" (map toUpper inpStr)

运行ab.bat，生成output.txt，内容是：

HELLO,WORLD!

A Word on Lazy Output

略

interact

使用interact 与用户交互

-- main.hs

import Data.Char(toUpper)

main = interact (map toUpper)

-- 先运行批处理

-- runghc.bat

@echo off

ghci main

*Main> main -- 然后，运行main 函数

hello,world! -- 接着，输入字串

HELLO,WORLD! -- 最后，程序输出字串

--加输入提示版

module Main where

import Data.Char(toUpper)

main = interact (map toUpper . (++) "Your data, in uppercase, is:/n/n")

-- 1. (++) "Your data, in uppercase, is:/n/n" :: [Char] -> [Char]

-- 2. map toUpper :: [Char] -> [Char]

-- 1 的输出正好可以2 作为输入。

以上代码有个小问题，输入提示的部分也变成大写了。

module Main where

import Data.Char(toUpper)

main = interact ((++) "Your data, in uppercase, is:/n/n" .

map toUpper)

这里把提示字串移出map 外了。

Filters with interact

main = interact (unlines . filter (elem 'a') . lines)

-- runghc filter.hs < input.txt

I like Haskell

Haskell is great

The IO Monad

如果要从键盘读一行，I/O 函数不可能每次都返回同样的结果对不对？可以认为I/O 就是改变世界的状态。

Actions

action 类似函数。定义action 的时侯它们什么也不做，在被invoked 的时侯就会执行一些任务。

IO () 是一个action

ghci> :type putStrLn

putStrLn :: String -> IO ()

ghci> :type getLine

getLine :: IO String

action 可以存储或传递到纯代码里。

runall :: [IO ()] -> IO ()

runall [] = return ()

runall (firstelem:remainingelems) =

do firstelem

runall remainingelems

abc = putStrLn "abc" -- 一个action

main = do print "Start of the program"

do abc

print $ map show [1..10] -- > ["1","2","3","4","5","6","7","8","9","10"]

runall $ map (/s -> putStrLn ("Data: " ++ s) ) ["1","2","3","4","5","6","7","8","9","10"]

print "Done!"

“$” 的意思是给后面的所有东西加个括号。

可以这样认为：do 块中除了let 外每一条语句都必须产生一个I/O action 。

str2action :: String -> IO ()

str2action input = putStrLn ("Data: " ++ input)

list2actions :: [String] -> [IO ()]

list2actions = map str2action

numbers :: [Int]

numbers = [1..10]

strings :: [String]

strings = map show numbers

actions :: [IO ()]

actions = list2actions strings

printitall :: IO ()

printitall = runall actions

-- Take a list of actions, and execute each of them in turn.

runall :: [IO ()] -> IO ()

runall [] = return ()

runall (firstelem:remainingelems) =

do firstelem

runall remainingelems

main = do str2action "Start of the program"

runall $ map (/s -> putStrLn ("Data: " ++ s) ) ["1","2","3","4","5","6","7","8","9","10"]

--printitall

str2action "Done!"

加复数“s” 字尾用于提示“这是一个是列表”

这里的代码完成的功能是：数字 ->字串 ->action

使用mapM_ 函数产生I/O 输入

str2message :: String -> String

str2message input = "Data: " ++ input

str2action :: String -> IO ()

str2action = putStrLn . str2message

numbers :: [Int]

numbers = [1..10]

main = do str2action "Start of the program"

mapM_ (str2action . show) numbers

str2action "Done!"

mapM_ 类似map ，接受一个I/O action 函数作第一个参数，一个列表作为第二个参数。mapM_ 会抛出那个I/O 函数的结果。

ghci> :type mapM

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

ghci> :type mapM_

mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()

mapM ，mapM 这些函数实际上工作于任何Monad 上。

带下划线的函数通常会丢弃它们的结果。

map 和mapM 的区别是mapM 会执行action 。

Sequencing

do 块实际上就是将许多actions “joining together” 的简便做法。有两个操作符可以用来代替do 块。

用来代替do 语句的运算符“>>”，“>>=”

ghci> :type (>>)

(>>) :: (Monad m) => m a -> m b -> m b

ghci> :type (>>=)

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

“>>” 运算符将两个action 连起来。第一个action 先执行，然后第二个action 再执行。整个表达式的结果就是第二个action 的结果，并且会丢弃第一个结果。

putStrLn "line 1" >> putStrLn "line 2"

“>>=” 运算符会返回一个action ，并将这个action 传给右边的表达式。

getLine >>= putStrLn

getLine 从I/O 读一行，然后传给putStrLn 打印出来。

main =

putStrLn "Greetings! What is your name?" >>

getLine >>=

(/inpStr -> putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!")

The True Nature of Return

return 用于将数据封装进Monad 。对I/O ，return 会先获取纯数据，然后传给I/O Monad 。

main =

putStrLn "Greetings! What is your name?" >>

return "guys" >>=

(/inpStr -> putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!")

询问用户“yes” or "no"

isGreen :: IO Bool

isGreen =

do putStrLn "Is green your favorite color?"

inpStr <- return "yes" -- getLine

return ((toUpper . head $ inpStr) == 'Y')

将纯与不纯代码分开

isYes :: String -> Bool

isYes inpStr = (toUpper . head $ inpStr) == 'Y'

isGreen :: IO Bool

isGreen =

do putStrLn "Is green your favorite color?"

inpStr <- getLine

return (isYes inpStr)

“<-” 操作符用于从Monad 中取出纯数据

returnTest :: IO ()

returnTest =

do one <- return 1

let two = 2

putStrLn $ show (one + two)

main = returnTest -- >3

“let” 操作符用于在do 块中定义纯代码(不带action)

Is Haskell Really Imperative?

略

Side Effects with Lazy I/O

当我们说Haskell 没有side effects ，究竟意味着什么？恶劣的循环就算是纯代码也可能导致系统内存耗尽然后crash。

纯函数不会修改全局变量，不会请求I/O 。

hGetContents 不适合这些场合：与用户交互以获取数据，或从管道获取其它程序的数据。

Buffering

I/O 子系统是现代计算机最慢的部分。

操作系统会将最常用的数据放到内存中。

程序语言通常进行buffering ，会从OS 申请一大块内存，这意味着就算代码中只操作一个字节的数据也会占用那个大内存。

Buffering Modes

三种BufferMode 类型： NoBuffering, LineBuffering, 和BlockBuffering 。

NoBuffering 从OS 一次读一个字符，一次写入一个字符(立即写入的)。性能很低，不适用于general-purpose use。

LineBuffering 遇到换行符，或是数据量太大时就执行写操作。读入换行符之前的所有数据。当从终端读取时一遇到回车就立即返回数据。通常作为默放设置。

BlockBuffering 引起Haskell 在可能的时侯读或写固定大小的数据。它有最佳性能，在处理成组的大数据时。接受一个Mabe 参数，用Just 4096 设置buffer 为4096，用Nothing 设置默认buffer 大小。

默认buffering 模式依赖于操作系统和Haskell 实现。

用hGetBuffering 获取当前模式，用hSetBuffering 设置buffer 模式。便如hSetBuffering stdin (BlockBuffering Nothing)

Flushing The Buffer

hFlush ，hClose 都会将buffer 中的数据立即写入。

Reading Command-Line Arguments

System.Environment.getArgs 返回IO [String]，列表里的每个元素对应命令行传过来的一个参数。

System.Environment.getProgName 用于获取程序名。

System.Console.GetOpt 提供了一些解析命令行选项的工具，对使用复杂选项的程序很有用。

Environment Variables

System.Environment: getEnv 或getEnvironment. getEnv 这两个函数用于查找指定的变量，如果不存在就引发错误。

getEnvironment 返回整个环境[(String, String)]。lookup 函数用于查找特定的环境变量。

linux 中可以用System.Posix.Env 模块中的putEnv 或setEnv 来设置环境变量。对Windows ，Haskell 中不存在这种函数。