Regular Expressions in Java

In the project of Data Mining, I have to make use of the regular expressions to deal with the large amount of text in html.

I used regular expression in Linux (grep) before and find it quite an efficient way to deal with text, especially when their amount is very large.

 

Introduction

Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set. They can be used to search, edit, or manipulate text and data. You must learn a specific syntax to create regular expressions — one that goes beyond the normal syntax of the Java programming language. Regular expressions vary in complexity, but once you understand the basics of how they're constructed, you'll be able to decipher (or create) any regular expression.

 

The package of java.util.regex

It primary consists three classes:

Pattern: a compiled representation of a regular expression.

Matcher: interprets the Patten and performs match operation against an input string.

PatternSyntaxException: indicates an syntax error in a regular expression pattern

 

A single regular expression program

 1 package regexTestHarness;
 2 
 3 import java.util.regex.Pattern;
 4 import java.util.regex.Matcher;
 5 import java.io.BufferedReader;
 6 import java.io.InputStreamReader;
 7 
 8 public class RegexTestHarness {
 9     public static void main(String[] args) {
10         try {
11 
12             System.out.println("%nEnter your regex: ");
13 
14             InputStreamReader isr = new InputStreamReader(System.in);
15 
16             BufferedReader br = new BufferedReader(isr);
17 
18             String s = br.readLine();
19 
20             Pattern pattern = Pattern.compile(s);
21 
22             System.out.println("%nEnter your text: ");
23 
24             isr = new InputStreamReader(System.in);
25 
26             br = new BufferedReader(isr);
27 
28             s = br.readLine();
29 
30             Matcher matcher = pattern.matcher(s);
31 
32             boolean found = false;
33             while (matcher.find()) {
34                 System.out.print("I found the text " + matcher.group()
35                         + " starting at " + "index " + matcher.start()
36                         + " and ending at index " + matcher.end());
37                 found = true;
38             }
39             if (!found) {
40                 System.out.println("No match found.");
41             }
42         } catch (Exception e) {
43             e.printStackTrace();
44         }
45     }
46 
47 }

 

Chracter classes and Predefined classes

ConstructDescription
[abc]a, b, or c (simple class)
[^abc]Any character except a, b, or c (negation)
[a-zA-Z]a through z, or A through Z, inclusive (range)
[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]d, e, or f (intersection)
[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z] (subtraction)

 

ConstructDescription
.Any character (may or may not match line terminators)
\dA digit: [0-9]
\DA non-digit: [^0-9]
\sA whitespace character: [ \t\n\x0B\f\r]
\SA non-whitespace character: [^\s]
\wA word character: [a-zA-Z_0-9]
\WA non-word character: [^\w]

 

Quantifiers

GreedyReluctantPossessiveMeaning
X?X??X?+X, once or not at all
X*X*?X*+X, zero or more times
X+X+?X++X, one or more times
X{n}X{n}?X{n}+X, exactly n times
X{n,}X{n,}?X{n,}+X, at least n times
X{n,m}X{n,m}?X{n,m}+X, at least n but not more than m times

Chinese Characters

[\u4e00-\u9fa5]

转载于:https://www.cnblogs.com/johnpher/archive/2012/07/02/2573865.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值