Rust 练习册 81:Grep与文件处理

在Unix/Linux系统中,grep是一个强大的文本搜索工具,它能使用正则表达式搜索文件并输出匹配的行。在 Exercism 的 “grep” 练习中,我们需要实现一个简化版的grep命令,支持多种搜索选项。这不仅能帮助我们掌握文件处理和正则表达式,还能深入学习Rust中的错误处理、命令行参数解析和系统编程。

什么是Grep问题?

Grep(Global Regular Expression Print)是Unix/Linux系统中一个经典的文本搜索工具。它能够:

  1. 在文件中搜索指定的模式
  2. 支持多种选项,如大小写不敏感、显示行号、反转匹配等
  3. 输出匹配的行或文件名

让我们先看看练习提供的结构和函数签名:

use anyhow::Error;

/// While using `&[&str]` to handle flags is convenient for exercise purposes,
/// and resembles the output of [`std::env::args`], in real-world projects it is
/// both more convenient and more idiomatic to contain runtime configuration in
/// a dedicated struct. Therefore, we suggest that you do so in this exercise.
///
/// In the real world, it's common to use crates such as [`clap`] or
/// [`structopt`] to handle argument parsing, and of course doing so is
/// permitted in this exercise as well, though it may be somewhat overkill.
///
/// [`clap`]: https://crates.io/crates/clap
/// [`std::env::args`]: https://doc.rust-lang.org/std/env/fn.args.html
/// [`structopt`]: https://crates.io/crates/structopt
#[derive(Debug)]
pub struct Flags;

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        unimplemented!(
            "Given the flags {:?} implement your own 'Flags' struct to handle flags-related logic",
            flags
        );
    }
}

pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    unimplemented!(
        "Search the files '{:?}' for '{}' pattern and save the matches in a vector. Your search logic should be aware of the given flags '{:?}'",
        files,
        pattern,
        flags
    );
}

我们需要实现一个完整的grep工具,支持多种搜索选项和文件处理功能。

设计分析

1. 核心组件

  1. Flags结构体:存储命令行选项
  2. grep函数:执行搜索逻辑
  3. 文件处理:读取和搜索多个文件
  4. 模式匹配:支持正则表达式和普通字符串匹配

2. 支持的选项

根据测试用例,我们需要支持以下选项:

  • -n:显示行号
  • -l:只显示包含匹配项的文件名
  • -i:大小写不敏感匹配
  • -x:整行匹配
  • -v:反转匹配(显示不匹配的行)

完整实现

1. Flags结构体实现

use anyhow::Error;
use std::fs;
use regex::Regex;

#[derive(Debug, Clone)]
pub struct Flags {
    print_line_numbers: bool,
    print_file_names_only: bool,
    case_insensitive: bool,
    match_entire_line: bool,
    invert_match: bool,
}

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        let mut result = Flags {
            print_line_numbers: false,
            print_file_names_only: false,
            case_insensitive: false,
            match_entire_line: false,
            invert_match: false,
        };
        
        for flag in flags {
            match *flag {
                "-n" => result.print_line_numbers = true,
                "-l" => result.print_file_names_only = true,
                "-i" => result.case_insensitive = true,
                "-x" => result.match_entire_line = true,
                "-v" => result.invert_match = true,
                _ => {} // 忽略未知选项
            }
        }
        
        result
    }
    
    pub fn print_line_numbers(&self) -> bool {
        self.print_line_numbers
    }
    
    pub fn print_file_names_only(&self) -> bool {
        self.print_file_names_only
    }
    
    pub fn case_insensitive(&self) -> bool {
        self.case_insensitive
    }
    
    pub fn match_entire_line(&self) -> bool {
        self.match_entire_line
    }
    
    pub fn invert_match(&self) -> bool {
        self.invert_match
    }
}

2. grep函数实现

pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    let mut file_names_with_matches = Vec::new();
    
    // 根据选项构建正则表达式
    let regex_pattern = if flags.case_insensitive() {
        format!("(?i){}", pattern)
    } else {
        pattern.to_string()
    };
    
    let regex = Regex::new(&regex_pattern)?;
    
    // 处理每个文件
    for &file_name in files {
        let content = fs::read_to_string(file_name)?;
        let lines: Vec<&str> = content.lines().collect();
        
        let mut file_has_match = false;
        
        for (line_index, &line) in lines.iter().enumerate() {
            let line_number = line_index + 1;
            
            // 检查是否匹配
            let is_match = if flags.match_entire_line() {
                regex.is_match(line)
            } else {
                regex.is_match(line)
            };
            
            // 根据-v选项反转匹配结果
            let should_include = if flags.invert_match() {
                !is_match
            } else {
                is_match
            };
            
            if should_include {
                file_has_match = true;
                
                // 如果只需要文件名,且已经记录过该文件,则跳过
                if flags.print_file_names_only() {
                    break;
                }
                
                // 构建输出行
                let output_line = if files.len() > 1 {
                    if flags.print_line_numbers() {
                        format!("{}:{}:{}", file_name, line_number, line)
                    } else {
                        format!("{}:{}", file_name, line)
                    }
                } else {
                    if flags.print_line_numbers() {
                        format!("{}:{}", line_number, line)
                    } else {
                        line.to_string()
                    }
                };
                
                results.push(output_line);
            }
        }
        
        // 如果只需要文件名且当前文件有匹配项
        if flags.print_file_names_only() && file_has_match {
            file_names_with_matches.push(file_name.to_string());
        }
    }
    
    // 如果只需要文件名,返回文件名列表
    if flags.print_file_names_only() {
        Ok(file_names_with_matches)
    } else {
        Ok(results)
    }
}

3. 完整实现(包含依赖)

// 在Cargo.toml中添加依赖
// [dependencies]
// anyhow = "1.0"
// regex = "1.0"

use anyhow::Error;
use std::fs;
use regex::Regex;

#[derive(Debug, Clone)]
pub struct Flags {
    print_line_numbers: bool,
    print_file_names_only: bool,
    case_insensitive: bool,
    match_entire_line: bool,
    invert_match: bool,
}

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        let mut result = Flags {
            print_line_numbers: false,
            print_file_names_only: false,
            case_insensitive: false,
            match_entire_line: false,
            invert_match: false,
        };
        
        for flag in flags {
            match *flag {
                "-n" => result.print_line_numbers = true,
                "-l" => result.print_file_names_only = true,
                "-i" => result.case_insensitive = true,
                "-x" => result.match_entire_line = true,
                "-v" => result.invert_match = true,
                _ => {} // 忽略未知选项
            }
        }
        
        result
    }
    
    pub fn print_line_numbers(&self) -> bool {
        self.print_line_numbers
    }
    
    pub fn print_file_names_only(&self) -> bool {
        self.print_file_names_only
    }
    
    pub fn case_insensitive(&self) -> bool {
        self.case_insensitive
    }
    
    pub fn match_entire_line(&self) -> bool {
        self.match_entire_line
    }
    
    pub fn invert_match(&self) -> bool {
        self.invert_match
    }
}

pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    let mut file_names_with_matches = Vec::new();
    
    // 根据选项构建正则表达式
    let regex_pattern = if flags.case_insensitive() {
        format!("(?i){}", pattern)
    } else {
        pattern.to_string()
    };
    
    let regex = Regex::new(&regex_pattern)?;
    
    // 处理每个文件
    for &file_name in files {
        let content = fs::read_to_string(file_name)?;
        let lines: Vec<&str> = content.lines().collect();
        
        let mut file_has_match = false;
        
        for (line_index, &line) in lines.iter().enumerate() {
            let line_number = line_index + 1;
            
            // 检查是否匹配
            let is_match = if flags.match_entire_line() {
                regex.is_match(line)
            } else {
                regex.is_match(line)
            };
            
            // 根据-v选项反转匹配结果
            let should_include = if flags.invert_match() {
                !is_match
            } else {
                is_match
            };
            
            if should_include {
                file_has_match = true;
                
                // 如果只需要文件名,且已经记录过该文件,则跳过
                if flags.print_file_names_only() {
                    break;
                }
                
                // 构建输出行
                let output_line = if files.len() > 1 {
                    if flags.print_line_numbers() {
                        format!("{}:{}:{}", file_name, line_number, line)
                    } else {
                        format!("{}:{}", file_name, line)
                    }
                } else {
                    if flags.print_line_numbers() {
                        format!("{}:{}", line_number, line)
                    } else {
                        line.to_string()
                    }
                };
                
                results.push(output_line);
            }
        }
        
        // 如果只需要文件名且当前文件有匹配项
        if flags.print_file_names_only() && file_has_match {
            file_names_with_matches.push(file_name.to_string());
        }
    }
    
    // 如果只需要文件名,返回文件名列表
    if flags.print_file_names_only() {
        Ok(file_names_with_matches)
    } else {
        Ok(results)
    }
}

测试用例分析

通过查看测试用例,我们可以更好地理解需求:

#[test]
fn test_nonexistent_file_returns_error() {
    let pattern = "Agamemnon";
    let flags = Flags::new(&[]);
    let files = vec!["test_nonexistent_file_returns_error_iliad.txt"];
    assert!(grep(&pattern, &flags, &files).is_err());
}

不存在的文件应该返回错误。

#[test]
fn test_one_file_one_match_no_flags() {
    let pattern = "Agamemnon";
    let flags = Flags::new(&[]);
    let files = vec!["iliad.txt"];
    let expected = vec!["Of Atreus, Agamemnon, King of men."];
    // 应该返回匹配的行
}

单文件单匹配项,无选项时应返回匹配行。

#[test]
fn test_one_file_one_match_print_line_numbers_flag() {
    let pattern = "Forbidden";
    let flags = Flags::new(&["-n"]);
    let files = vec!["paradise_lost.txt"];
    let expected = vec!["2:Of that Forbidden Tree, whose mortal tast"];
    // 应该返回带行号的匹配行
}

使用-n选项时应显示行号。

#[test]
fn test_one_file_one_match_caseinsensitive_flag() {
    let pattern = "FORBIDDEN";
    let flags = Flags::new(&["-i"]);
    let files = vec!["paradise_lost.txt"];
    let expected = vec!["Of that Forbidden Tree, whose mortal tast"];
    // 应该进行大小写不敏感匹配
}

使用-i选项时应进行大小写不敏感匹配。

#[test]
fn test_one_file_one_match_print_file_names_flag() {
    let pattern = "Forbidden";
    let flags = Flags::new(&["-l"]);
    let files = vec!["paradise_lost.txt"];
    let expected = vec!["paradise_lost.txt"];
    // 应该只返回文件名
}

使用-l选项时应只返回包含匹配项的文件名。

#[test]
fn test_one_file_several_matches_inverted_flag() {
    let pattern = "Of";
    let flags = Flags::new(&["-v"]);
    let files = vec!["paradise_lost.txt"];
    // 应该返回不包含"Of"的行
}

使用-v选项时应返回不匹配的行。

#[test]
fn test_multiple_files_one_match_no_flags() {
    let pattern = "Agamemnon";
    let flags = Flags::new(&[]);
    let files = vec!["iliad.txt", "midsummer_night.txt", "paradise_lost.txt"];
    // 多文件时应包含文件名前缀
}

多文件搜索时应包含文件名前缀。

性能优化版本

考虑性能的优化实现:

use anyhow::Error;
use std::fs;
use regex::Regex;

#[derive(Debug, Clone)]
pub struct Flags {
    print_line_numbers: bool,
    print_file_names_only: bool,
    case_insensitive: bool,
    match_entire_line: bool,
    invert_match: bool,
}

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        let mut result = Flags {
            print_line_numbers: false,
            print_file_names_only: false,
            case_insensitive: false,
            match_entire_line: false,
            invert_match: false,
        };
        
        for flag in flags {
            match *flag {
                "-n" => result.print_line_numbers = true,
                "-l" => result.print_file_names_only = true,
                "-i" => result.case_insensitive = true,
                "-x" => result.match_entire_line = true,
                "-v" => result.invert_match = true,
                _ => {} // 忽略未知选项
            }
        }
        
        result
    }
}

pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    
    // 预编译正则表达式以提高性能
    let regex = if flags.case_insensitive {
        Regex::new(&format!("(?i){}", regex::escape(pattern)))?
    } else {
        Regex::new(&regex::escape(pattern))?
    };
    
    let is_multi_file = files.len() > 1;
    
    // 为结果预分配容量
    results.reserve(files.len() * 10); // 估计值
    
    for &file_name in files {
        let content = fs::read_to_string(file_name)?;
        
        if flags.print_file_names_only {
            // 对于-l选项,一旦找到匹配就停止
            for line in content.lines() {
                let is_match = if flags.match_entire_line {
                    regex.is_match(line)
                } else {
                    regex.is_match(line)
                };
                
                let should_include = if flags.invert_match {
                    !is_match
                } else {
                    is_match
                };
                
                if should_include {
                    results.push(file_name.to_string());
                    break;
                }
            }
        } else {
            // 处理每一行
            for (line_index, line) in content.lines().enumerate() {
                let line_number = line_index + 1;
                
                let is_match = if flags.match_entire_line {
                    regex.is_match(line)
                } else {
                    regex.is_match(line)
                };
                
                let should_include = if flags.invert_match {
                    !is_match
                } else {
                    is_match
                };
                
                if should_include {
                    let output = if is_multi_file {
                        if flags.print_line_numbers {
                            format!("{}:{}:{}", file_name, line_number, line)
                        } else {
                            format!("{}:{}", file_name, line)
                        }
                    } else {
                        if flags.print_line_numbers {
                            format!("{}:{}", line_number, line)
                        } else {
                            line.to_string()
                        }
                    };
                    
                    results.push(output);
                }
            }
        }
    }
    
    // 对于-l选项,去重
    if flags.print_file_names_only {
        results.sort();
        results.dedup();
    }
    
    Ok(results)
}

错误处理和边界情况

考虑更多边界情况的实现:

use anyhow::Error;
use std::fs;
use regex::Regex;

#[derive(Debug, Clone)]
pub struct Flags {
    print_line_numbers: bool,
    print_file_names_only: bool,
    case_insensitive: bool,
    match_entire_line: bool,
    invert_match: bool,
}

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        let mut result = Flags {
            print_line_numbers: false,
            print_file_names_only: false,
            case_insensitive: false,
            match_entire_line: false,
            invert_match: false,
        };
        
        for flag in flags {
            match *flag {
                "-n" => result.print_line_numbers = true,
                "-l" => result.print_file_names_only = true,
                "-i" => result.case_insensitive = true,
                "-x" => result.match_entire_line = true,
                "-v" => result.invert_match = true,
                _ => {} // 忽略未知选项
            }
        }
        
        result
    }
}

#[derive(Debug)]
pub enum GrepError {
    FileNotFound(String),
    InvalidPattern(String),
    EmptyPattern,
}

pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    // 验证输入
    if pattern.is_empty() {
        return Err(anyhow::anyhow!("Pattern cannot be empty"));
    }
    
    let mut results = Vec::new();
    
    // 编译正则表达式
    let regex_pattern = if flags.case_insensitive {
        format!("(?i){}", regex::escape(pattern))
    } else {
        regex::escape(pattern)
    };
    
    let regex = Regex::new(&regex_pattern)
        .map_err(|e| anyhow::anyhow!("Invalid pattern '{}': {}", pattern, e))?;
    
    let is_multi_file = files.len() > 1;
    
    for &file_name in files {
        // 读取文件内容
        let content = fs::read_to_string(file_name)
            .map_err(|e| anyhow::anyhow!("Cannot read file '{}': {}", file_name, e))?;
        
        if flags.print_file_names_only {
            // 对于-l选项,一旦找到匹配就停止
            let mut file_has_match = false;
            for line in content.lines() {
                let is_match = if flags.match_entire_line {
                    regex.is_match(line)
                } else {
                    regex.is_match(line)
                };
                
                let should_include = if flags.invert_match {
                    !is_match
                } else {
                    is_match
                };
                
                if should_include {
                    file_has_match = true;
                    break;
                }
            }
            
            if file_has_match {
                results.push(file_name.to_string());
            }
        } else {
            // 处理每一行
            for (line_index, line) in content.lines().enumerate() {
                let line_number = line_index + 1;
                
                let is_match = if flags.match_entire_line {
                    regex.is_match(line)
                } else {
                    regex.is_match(line)
                };
                
                let should_include = if flags.invert_match {
                    !is_match
                } else {
                    is_match
                };
                
                if should_include {
                    let output = if is_multi_file {
                        if flags.print_line_numbers {
                            format!("{}:{}:{}", file_name, line_number, line)
                        } else {
                            format!("{}:{}", file_name, line)
                        }
                    } else {
                        if flags.print_line_numbers {
                            format!("{}:{}", line_number, line)
                        } else {
                            line.to_string()
                        }
                    };
                    
                    results.push(output);
                }
            }
        }
    }
    
    // 对于-l选项,去重并排序
    if flags.print_file_names_only {
        results.sort();
        results.dedup();
    }
    
    Ok(results)
}

扩展功能

基于基础实现,我们可以添加更多功能:

use anyhow::Error;
use std::fs;
use regex::Regex;

#[derive(Debug, Clone)]
pub struct Flags {
    print_line_numbers: bool,
    print_file_names_only: bool,
    case_insensitive: bool,
    match_entire_line: bool,
    invert_match: bool,
    quiet: bool,           // -q 选项:静默模式
    max_matches: Option<usize>, // 限制匹配数量
}

impl Flags {
    pub fn new(flags: &[&str]) -> Self {
        let mut result = Flags {
            print_line_numbers: false,
            print_file_names_only: false,
            case_insensitive: false,
            match_entire_line: false,
            invert_match: false,
            quiet: false,
            max_matches: None,
        };
        
        let mut i = 0;
        while i < flags.len() {
            match flags[i] {
                "-n" => result.print_line_numbers = true,
                "-l" => result.print_file_names_only = true,
                "-i" => result.case_insensitive = true,
                "-x" => result.match_entire_line = true,
                "-v" => result.invert_match = true,
                "-q" => result.quiet = true,
                "-m" => {
                    if i + 1 < flags.len() {
                        if let Ok(num) = flags[i + 1].parse::<usize>() {
                            result.max_matches = Some(num);
                            i += 1; // 跳过下一个参数
                        }
                    }
                }
                _ => {} // 忽略未知选项
            }
            i += 1;
        }
        
        result
    }
}

pub struct GrepEngine;

impl GrepEngine {
    pub fn new() -> Self {
        GrepEngine
    }
    
    pub fn grep(&self, pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
        if pattern.is_empty() {
            return Err(anyhow::anyhow!("Pattern cannot be empty"));
        }
        
        let mut results = Vec::new();
        let mut match_count = 0;
        
        // 编译正则表达式
        let regex_pattern = if flags.case_insensitive {
            format!("(?i){}", regex::escape(pattern))
        } else {
            regex::escape(pattern)
        };
        
        let regex = Regex::new(&regex_pattern)
            .map_err(|e| anyhow::anyhow!("Invalid pattern '{}': {}", pattern, e))?;
        
        let is_multi_file = files.len() > 1;
        
        for &file_name in files {
            let content = fs::read_to_string(file_name)
                .map_err(|e| anyhow::anyhow!("Cannot read file '{}': {}", file_name, e))?;
            
            if flags.print_file_names_only {
                let mut file_has_match = false;
                for line in content.lines() {
                    let is_match = if flags.match_entire_line {
                        regex.is_match(line)
                    } else {
                        regex.is_match(line)
                    };
                    
                    let should_include = if flags.invert_match {
                        !is_match
                    } else {
                        is_match
                    };
                    
                    if should_include {
                        file_has_match = true;
                        break;
                    }
                }
                
                if file_has_match {
                    results.push(file_name.to_string());
                }
            } else {
                for (line_index, line) in content.lines().enumerate() {
                    let line_number = line_index + 1;
                    
                    let is_match = if flags.match_entire_line {
                        regex.is_match(line)
                    } else {
                        regex.is_match(line)
                    };
                    
                    let should_include = if flags.invert_match {
                        !is_match
                    } else {
                        is_match
                    };
                    
                    if should_include {
                        match_count += 1;
                        
                        // 检查是否达到最大匹配数
                        if let Some(max) = flags.max_matches {
                            if match_count > max {
                                break;
                            }
                        }
                        
                        // 静默模式下不收集结果
                        if !flags.quiet {
                            let output = if is_multi_file {
                                if flags.print_line_numbers {
                                    format!("{}:{}:{}", file_name, line_number, line)
                                } else {
                                    format!("{}:{}", file_name, line)
                                }
                            } else {
                                if flags.print_line_numbers {
                                    format!("{}:{}", line_number, line)
                                } else {
                                    line.to_string()
                                }
                            };
                            
                            results.push(output);
                        }
                    }
                }
            }
        }
        
        if flags.print_file_names_only {
            results.sort();
            results.dedup();
        }
        
        Ok(results)
    }
    
    // 统计匹配信息
    pub fn grep_with_stats(&self, pattern: &str, flags: &Flags, files: &[&str]) 
        -> Result<(Vec<String>, GrepStats), Error> {
        let results = self.grep(pattern, flags, files)?;
        
        // 这里可以添加统计信息的计算
        let stats = GrepStats {
            files_searched: files.len(),
            matches_found: results.len(),
        };
        
        Ok((results, stats))
    }
}

pub struct GrepStats {
    pub files_searched: usize,
    pub matches_found: usize,
}

// 便利函数
pub fn grep(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    GrepEngine::new().grep(pattern, flags, files)
}

实际应用场景

Grep在实际开发中有以下应用:

  1. 日志分析:在大量日志文件中搜索特定模式
  2. 代码搜索:在代码库中查找特定函数或变量
  3. 数据处理:从大型数据文件中提取特定信息
  4. 系统管理:在配置文件中查找特定设置
  5. 安全审计:在系统文件中搜索可疑模式
  6. 文本处理:批量处理和过滤文本文件
  7. DevOps工具:构建自动化工具和脚本

算法复杂度分析

  1. 时间复杂度

    • 文件读取:O(n),其中n是文件总大小
    • 正则表达式匹配:O(m×k),其中m是总行数,k是平均每行长度
    • 总体:O(n + m×k)
  2. 空间复杂度

    • 文件内容存储:O(n)
    • 结果存储:O(p×q),其中p是匹配行数,q是平均每行长度
    • 总体:O(n + p×q)

与其他实现方式的比较

// 使用标准库实现(不使用正则表达式)
use anyhow::Error;
use std::fs;

pub fn grep_simple(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    
    for &file_name in files {
        let content = fs::read_to_string(file_name)?;
        
        for (line_index, line) in content.lines().enumerate() {
            let line_number = line_index + 1;
            
            // 使用简单字符串匹配而不是正则表达式
            let is_match = if flags.case_insensitive {
                line.to_lowercase().contains(&pattern.to_lowercase())
            } else {
                line.contains(pattern)
            };
            
            let should_include = if flags.invert_match {
                !is_match
            } else {
                is_match
            };
            
            if should_include {
                let output = if flags.print_line_numbers {
                    format!("{}:{}", line_number, line)
                } else {
                    line.to_string()
                };
                
                results.push(output);
            }
        }
    }
    
    Ok(results)
}

// 使用内存映射文件的高性能实现
use anyhow::Error;
use memmap::Mmap;
use std::fs::File;

pub fn grep_mmap(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    
    for &file_name in files {
        let file = File::open(file_name)?;
        let mmap = unsafe { Mmap::map(&file)? };
        let content = std::str::from_utf8(&mmap)?;
        
        for (line_index, line) in content.lines().enumerate() {
            let line_number = line_index + 1;
            
            let is_match = line.contains(pattern);
            let should_include = if flags.invert_match {
                !is_match
            } else {
                is_match
            };
            
            if should_include {
                let output = if flags.print_line_numbers {
                    format!("{}:{}", line_number, line)
                } else {
                    line.to_string()
                };
                
                results.push(output);
            }
        }
    }
    
    Ok(results)
}

// 流式处理大文件的实现
use anyhow::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};

pub fn grep_streaming(pattern: &str, flags: &Flags, files: &[&str]) -> Result<Vec<String>, Error> {
    let mut results = Vec::new();
    
    for &file_name in files {
        let file = File::open(file_name)?;
        let reader = BufReader::new(file);
        
        for (line_index, line_result) in reader.lines().enumerate() {
            let line = line_result?;
            let line_number = line_index + 1;
            
            let is_match = line.contains(pattern);
            let should_include = if flags.invert_match {
                !is_match
            } else {
                is_match
            };
            
            if should_include {
                let output = if flags.print_line_numbers {
                    format!("{}:{}", line_number, line)
                } else {
                    line
                };
                
                results.push(output);
            }
        }
    }
    
    Ok(results)
}

总结

通过 grep 练习,我们学到了:

  1. 文件处理:掌握了读取和处理文件内容的基本方法
  2. 正则表达式:学会了使用正则表达式进行模式匹配
  3. 命令行参数解析:理解了如何解析和处理命令行选项
  4. 错误处理:熟练使用 anyhow 库进行错误处理
  5. 字符串处理:掌握了字符串搜索和处理技巧
  6. 系统编程:了解了系统工具的基本实现原理

这些技能在实际开发中非常有用,特别是在构建命令行工具、日志分析系统、文本处理工具等场景中。Grep虽然是一个经典的Unix工具,但它涉及到了文件处理、正则表达式、命令行解析和系统编程等许多核心概念,是学习Rust系统编程的良好起点。

通过这个练习,我们也看到了Rust在系统编程方面的强大能力,以及如何用安全且高效的方式实现经典的系统工具。这种结合了安全性和性能的语言特性正是Rust的魅力所在。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

少湖说

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值