使用PowerShell读取文件数据

本文介绍了如何使用PowerShell的Get-Content函数和StreamReader库来高效读取和解析自定义文件。讨论了如何读取整个文件、部分文件、跳过指定行,以及在资源有限的情况下读取大文件。通过示例展示了如何使用StreamReader逐行读取并应用字符串方法进行数据处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

We have some custom files that we receive from different providers and for these situations we are unable to use standard ETL programs without any customization. Since we’re expanding our ability to read these custom files with .NET, we’re looking for efficient ways to read files with PowerShell that we can use in SQL Server Job Agents, Windows Task Schedulers, or with our custom program, which can execute PowerShell scripts. We have many tools for parsing data and wanted to know efficient ways of reading the data for parsing, along with getting specific lines of data from files by number, or by the first or last line of the file. For reading files efficiently, what are some functions or libraries we can use?

我们有一些来自不同提供商的自定义文件,在这种情况下,如果不进行任何自定义,我们将无法使用标准ETL程序。 由于我们正在扩展使用.NET读取这些自定义文件的能力,因此,我们正在寻找一种使用PowerShell读取文件的有效方法,这些方法可以在SQL Server作业代理,Windows Task Scheduler或我们的自定义程序中使用,执行PowerShell脚本。 我们有许多用于解析数据的工具,并且想了解读取数据进行解析的有效方法,以及从文件中按编号或文件的第一行或最后一行获取特定的数据行。 为了有效地读取文件,我们可以使用哪些功能或库?

总览 (Overview)

For reading data from files, we generally want to focus on three major functions for completing these tasks along with some examples listed next to them of these in practice:

为了从文件中读取数据,我们通常希望专注于完成这些任务的三个主要功能,以及实践中紧随其后的一些示例:

  1. How to read an entire file, part of a file or skip around in a file. We may face a situation where we want to read every line except the first and last.

    如何读取整个文件,部分文件或在文件中跳过。 我们可能会遇到一种情况,我们希望读取除第一行和最后一行之外的每一行。
  2. How to read a file by using few system resources. We may have a 100GB file that we only want to read 108KB worth of data.

    如何使用很少的系统资源来读取文件。 我们可能有一个100GB的文件,我们只想读取108KB的数据。
  3. How to read a file in a manner that easily allows us to parse data we need or allows us to use functions or tools we use with other data. Since many developers have string parsing tools, moving data to a string format – if possible – allows us to re-use many string parsing tools.

    如何以轻松允许我们解析所需数据或允许我们使用与其他数据一起使用的功能或工具的方式读取文件。 由于许多开发人员都拥有字符串解析工具,因此将数据移动到字符串格式(如果可能的话)使我们能够重用许多字符串解析工具。

The above applies to most situations involved with parsing data from files. We’ll start by looking at a built-in PowerShell function for reading data, then look at a custom way of reading data from files using PowerShell.

以上适用于涉及从文件中解析数据的大多数情况。 我们将首先查看用于读取数据的内置PowerShell函数,然后查看使用PowerShell从文件读取数据的自定义方式。

PowerShell的Get-Content函数 (PowerShell’s Get-Content function)

The latest version of PowerShell (version 5) and many earlier versions of PowerShell come with the Get-Content function and this function allows us to quickly read a file’s data. In the below script, we output an entire file’s data on the PowerShell ISE screen – a screen which we’ll be using for demonstration purposes throughout this article:

最新版本的PowerShell(版本5)和许多早期版本的PowerShell带有Get-Content函数,该功能使我们能够快速读取文件的数据。 在下面的脚本中,我们在PowerShell ISE屏幕上输出整个文件的数据,该屏幕将用于本文的演示目的:

Get-Content "C:\logging\logging.txt"

We can save this entire amount of data into a string, called ourfilesdata:

我们可以将全部数据保存到一个名为ourfilesdata的字符串中:

$ourfilesdata = Get-Content "C:\logging\logging.txt"
$ourfilesdata

We get the same result as the above, the only difference here is that we’ve saved the entire file into a variable. We face one drawback to this though, if we save an entire file to a variable or if we output an entire file: if the file size is large, we’ll have read the entire file into variable or output the entire file on the screen. This begins to cost us performance, as we deal with larger file sizes.

我们得到与上面相同的结果,唯一的不同是我们已经将整个文件保存到一个变量中。 但是,如果将整个文件保存到变量中或输出整个文件,我们将面临一个缺点:如果文件很大,我们将整个文件读入变量或在屏幕上输出整个文件。 随着我们处理更大的文件大小,这开始使我们失去性能。

We can select a part of the file by treating our variable (object being another name) like a SQL query where we select some of the files instead of all of it. In the code below, we select the first five lines of the file, rather than the entire file:

我们可以像对待SQL查询一样对待变量(对象是另一个名称),从而选择文件的一部分,从中选择一些文件而不是全部文件。 在下面的代码中,我们选择文件的前五行,而不是整个文件:

$ourfilesdata = Get-Content "C:\logging\logging.txt"
$ourfilesdata | Select-Object -First 5

We can also use the same function to get the last five lines of the file, using a similar syntax:

我们还可以使用相同的函数,使用类似的语法来获取文件的最后五行:

$ourfilesdata = Get-Content "C:\logging\logging.txt"
$ourfilesdata | Select-Object -Last 5

PowerShell’s built-in Get-Content function can be useful, but if we want to store very little data on each read for reasons of parsing, or if we want to read line by line for parsing a file, we may want to use .NET’s StreamReader class, which will allow us to customize our usage for increased efficiency. This makes Get-Content a great basic reader for file data.

PowerShell的内置Get-Content函数可能很有用,但是如果出于分析的原因而希望每次读取存储很少的数据,或者如果要逐行读取以解析文件,则可能需要使用.NET。 StreamReader类,它将使我们能够自定义用法以提高效率。 这使Get-Content成为文件数据的理想基础阅读器。

StreamReader库 (The StreamReader library)

In a new PowerShell ISE window, we’ll create a StreamReader object and dispose this same object by executing the below PowerShell code:

在新的PowerShell ISE窗口中,我们将创建一个StreamReader对象,并通过执行以下PowerShell代码来处理同一对象:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
### Reading file stuff here
$newstreamreader.Dispose()

In general, anytime we create a new object, it’s a best practice to remove that object, as it releases computing resources on that object. While it’s true that .NET will automatically do this, I still recommend doing this manually, as you may work with languages that don’t automatically do this in the future and it’s a good practice.

通常,每次创建新对象时,最佳做法是删除该对象,因为它会释放该对象上的计算资源。 .NET确实会自动执行此操作,但我仍然建议手动执行此操作,因为您可能会使用将来不会自动执行此操作的语言,这是一个好习惯。

Nothing happens when we execute the above code because we’ve called no method – we’ve only created an object and removed it. The first method we’ll look at is the ReadToEnd() method:

当执行上面的代码时,什么都没有发生,因为我们没有调用任何方法–我们仅创建了一个对象并将其删除。 我们将要看的第一个方法是ReadToEnd()方法:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
$newstreamreader.ReadToEnd()
$newstreamreader.Dispose()

As we see in the output, we can read all the data from the file like with Get-Content using the ReadToEnd() method; how do we read each line of data? Included in the StreamReader class is the ReadLine() method and if we called it instead of ReadToEnd(), we would get the first line of our files data:

如输出所示,我们可以使用ReadToEnd()方法从文件中读取所有数据,就像使用Get-Content一样; 我们如何读取每一行数据? StreamReader类中包含ReadLine()方法,如果我们调用它而不是ReadToEnd(),我们将获取文件数据的第一行:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
$newstreamreader.ReadLine()
$newstreamreader.Dispose()

Since we told the StreamReader to read the line, it read the first line of the file and stopped. The reason for this is that the ReadLine() method only reads the current line (in this case, line one). We must keep reading the file until we reach the end of the file. How do we know when a file ends? The ending line is null. In other words, we want the StreamReader to keep reading the file (while loop) as long as each new line is not null (in other words, has data). To demonstrate this, let’s add a line counter to each line we iterate over so that we can see the logic with numbers and text:

由于我们告诉StreamReader读取该行,所以它读取了文件的第一行并停止了。 这样做的原因是ReadLine()方法仅读取当前行(在本例中为第一行)。 我们必须继续读取文件,直到到达文件末尾。 我们如何知道文件何时结束? 结束行为空。 换句话说,只要每个新行都不为空(换句话说,有数据),我们希望StreamReader继续读取文件(while循环)。 为了说明这一点,让我们在迭代的每一行上添加一个行计数器,以便我们可以看到带有数字和文本的逻辑:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
$eachlinenumber = 1
while (($readeachline =$newstreamreader.ReadLine()) -ne $null)
{
    Write-Host "$eachlinenumber  $readeachline"
    $eachlinenumber++
}
$newstreamreader.Dispose()

As StreamReader reads each line, it stores the line’s data into the object we’ve created $readeachline. We can apply string functions to this line of data on each pass, such as getting the first ten characters of line of data:

当StreamReader读取每一行时,它将行的数据存储到我们创建的$ readeachline对象中。 我们可以在每次通过时将字符串函数应用于此数据行,例如获取数据行的前十个字符:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
$eachlinenumber = 1
while (($readeachline = $newstreamreader.ReadLine()) -ne $null)
{
    $readeachline.Substring(0,10)
}
$newstreamreader.Dispose()

We can extend this example and call two other string methods – this time including the string methods IndexOf and Replace(). We only call these methods on the first two lines by only getting the lines less than line three:

我们可以扩展该示例并调用其他两个字符串方法–这次包括字符串方法IndexOf和Replace()。 我们仅通过获得少于第三行的行来在前两行调用这些方法:

$newstreamreader = New-Object System.IO.StreamReader("C:\logging\logging.txt")
$eachlinenumber = 1
while (($readeachline = $newstreamreader.ReadLine()) -ne $null)
{
    if ($eachlinenumber -lt 3)
    {
        Write-Host "$eachlinenumber"
        $readeachline.Substring(0,12)
        $readeachline.IndexOf(" ")
        $readeachline.Replace(" ","replace")
    }
    $eachlinenumber++
}
$newstreamreader.Dispose()

For parsing data, we can use our string methods on each line of the file – or on a specific line of the file – on each iteration of the loop.

为了解析数据,我们可以在循环的每次迭代中,在文件的每一行或文件的特定行上使用字符串方法。

Finally, we want to be able to get a specific line from the file – we can get the first and last line of the file by using Get-Content. Let’s use StreamReader to bet a specific line number that we pass to a custom function that we create. We’ll create a reusable function that returns a specific line number, next we want to wrap our function for re-use while requiring two inputs: the file location and the specific line number we want to return.

最后,我们希望能够从文件中获取特定行-我们可以使用Get-Content获取文件的第一行和最后一行。 让我们使用StreamReader投注一个特定的行号,该行号传递给我们创建的自定义函数。 我们将创建一个可重用的函数,该函数返回特定的行号,接下来,我们需要包装函数以供重复使用,同时需要两个输入:文件位置和我们要返回的特定行号。

Function Get-FileData {
    Param(
        [Parameter(Mandatory=$true)][string]$ourfilelocation
        , [Parameter(Mandatory=$true)][int]$specificline
    )
    Process
    {
        $newstreamreader = New-Object System.IO.StreamReader($ourfilelocation)
        $eachlinenumber = 1
        while (($readeachline = $newstreamreader.ReadLine()) -ne $null)
        {
            if ($eachlinenumber -eq $specificline)
            {
                $readeachline
                break;
            }
            $eachlinenumber++
        }
        $newstreamreader.Dispose()
    }
}
 
$savelinetovariable = Get-FileData -ourfilelocation "C:\logging\logging.txt" -specificline 17
$savelinetovariable

If we check, line 17 returns “Buffer pool extension is already disabled. No action is necessary.” correctly. In addition, we break the if statement – as there’s no need to continue reading the file once we obtain the line of data we want. Also, the dispose method does end the object, as we can check by calling the method from the command line in PowerShell ISE and it will return nothing (we could also check within the function and get the same result):

如果我们检查,第17行返回“缓冲池扩展已被禁用。 无需采取任何措施。” 正确地。 此外,我们中断了if语句–因为一旦获得所需的数据行,就无需继续读取文件。 另外,dispose方法确实结束了对象,因为我们可以通过在PowerShell ISE中从命令行调用该方法来进行检查,并且它不会返回任何内容(我们也可以在函数中进行检查并获得相同的结果):

最后的想法 (Final thoughts)

For custom performance, StreamReader offers more potential, as we can read each line of data and apply our additional functions as we need. But we don’t always need something to customize and we may only want to read a few first and last lines of data, in which case the function Get-Content suits us well. In addition, smaller files work well with Get-Content as we’re never getting too much data at a time. Just be careful if the files tend to grow in time.

对于自定义性能,StreamReader提供了更大的潜力,因为我们可以读取每行数据并根据需要应用其他功能。 但是我们并不总是需要自定义内容,我们可能只想读取几行数据的第一行和最后一行,在这种情况下,Get-Content函数非常适合我们。 此外,较小的文件可与Get-Content很好地配合使用,因为我们永远不会一次获取太多数据。 请注意文件是否会随着时间增长。

翻译自: https://www.sqlshack.com/reading-file-data-with-powershell/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值