strings - Scanner Delimiters

最新推荐文章于 2021-08-13 11:35:30 发布

原创最新推荐文章于 2021-08-13 11:35:30 发布 · 133 阅读

0 ·

CC 4.0 BY-SA版权

Thinking in Java 专栏收录该内容

155 篇文章

订阅专栏

博客介绍了Java中Scanner默认按空白分割输入标记，也可通过正则表达式指定分隔符。给出多个示例展示如何从字符串读取内容，还提到默认空白分隔符及重置方法，最后列出相关参考资料。

By default, a Scanner splits input tokens along whitespace, but we can also specify our own delimiter pattern in the form of a regular expression.

example 1:

// strings/ScannerDelimiter.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.

import java.util.*;

public class ScannerDelimiter {
  public static void main(String[] args) {
    Scanner scanner = new Scanner("12, 42, 78, 99, 42");
    scanner.useDelimiter("\\s*,\\s*");
    while (scanner.hasNextInt()) {
      System.out.println(scanner.nextInt());
    }
  }
}
/* Output:
12
42
78
99
42
*/

example 2:

This example reads several items in from a string:


     String input = "1 fish 2 fish red fish blue fish";
     Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
     System.out.println(s.nextInt());
     System.out.println(s.nextInt());
     System.out.println(s.next());
     System.out.println(s.next());
     s.close();

prints the following output:


     1
     2
     red
     blue

The same output can be generated with this code, which uses a regular expression to parse all four tokens at once:


     String input = "1 fish 2 fish red fish blue fish";
     Scanner s = new Scanner(input);
     s.findInLine("(\\d+) fish (\\d+) fish (\\w+) fish (\\w+)");
     MatchResult result = s.match();
     for (int i=1; i<=result.groupCount(); i++) {
         System.out.println(result.group(i)); 
     }
     s.close();

The default whitespace delimiter used by a scanner is as recognized by Character.isWhitespace. The reset() method will reset the value of the scanner's delimiter to the default whitespace delimiter regardless of whether it was previously changed.

example 3:

// strings/ThreatAnalyzer.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.

import java.util.*;
import java.util.regex.*;

public class ThreatAnalyzer {
  static String threatData =
      "58.27.82.161@08/10/2015\n"
          + "204.45.234.40@08/11/2015\n"
          + "58.27.82.161@08/11/2015\n"
          + "58.27.82.161@08/12/2015\n"
          + "58.27.82.161@08/12/2015\n"
          + "[Next log section with different data format]";

  public static void main(String[] args) {
    Scanner scanner = new Scanner(threatData);
    String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@" + "(\\d{2}/\\d{2}/\\d{4})";
    while (scanner.hasNext(pattern)) {
      scanner.next(pattern);
      MatchResult match = scanner.match();
      String ip = match.group(1);
      String date = match.group(2);
      System.out.format("Threat on %s from %s%n", date, ip);
    }
  }
}
/* Output:
Threat on 08/10/2015 from 58.27.82.161
Threat on 08/11/2015 from 204.45.234.40
Threat on 08/11/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
*/

Regular expression

POSIX	Non-standard	Perl/Tcl	Vim	Java	ASCII	Description
	`[:ascii:]`[29]			`\p{ASCII}`	`[\x00-\x7F]`	ASCII characters
`[:alnum:]`				`\p{Alnum}`	`[A-Za-z0-9]`	Alphanumeric characters
	`[:word:]`[29]	`\w`	`\w`	`\w`	`[A-Za-z0-9_]`	Alphanumeric characters plus "_"
		`\W`	`\W`	`\W`	`[^A-Za-z0-9_]`	Non-word characters
`[:alpha:]`			`\a`	`\p{Alpha}`	`[A-Za-z]`	Alphabetic characters
`[:blank:]`			`\s`	`\p{Blank}`	`[ \t]`	Space and tab