hihocoder1385 A Simple Job JAVA String split 字符串切割

这是一个关于文本分析的小型挑战赛,任务是从给定的文本中找出使用频率最高的短语。比赛涉及处理连续空格、字符串分割等问题,并通过示例代码展示了如何解决这些问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

#1385 : A Simple Job

时间限制: 1000ms
单点时限: 1000ms
内存限制: 256MB

描述

Institute of Computational Linguistics (ICL), Peking University is an interdisciplinary institute of science and liberal arts, it focuses primarily on the fundamental researches and applications of language information processing. The research of ICL covers a wide range of areas, including Chinese syntax, language parsing, computational lexicography, semantic dictionaries, computational semantics and application systems.

Professor X is working for ICL. His little daughter Jane is 9 years old and has learned something about programming. She is always very interested in her daddy's research. During this summer vacation, she took a free programming and algorithm course for kids provided by the School of EECS, Peking University. When the course was finished, she said to Professor X: "Daddy, I just learned a lot of fancy algorithms. Now I can help you! Please give me something to research on!" Professor X laughed and said:"Ok, let's start from a simple job. I will give you a lot of text, you should tell me which phrase is most frequently used in the text."

Please help Jane to write a program to do the job.

输入

There are no more than 20 test cases.

In each case, there are one or more lines of text ended by a line of "####". The text includes words, spaces, ','s and '.'s. A word consists of only lowercase letters. Two adjacent words make a "phrase". Two words which there are just one or more spaces between them are considered adjacent. No word is split across two lines and two words which belong to different lines can't form a phrase. Two phrases which the only difference between them is the number of spaces, are considered the same.

Please note that the maximum length of a line is 500 characters, and there are at most 50 lines in a test case. It's guaranteed that there are at least 1 phrase in each test case.

输出

For each test case, print the most frequently used phrase and the number of times it appears, separated by a ':' . If there are more than one choice, print the one which has the smallest dictionary order. Please note that if there are more than one spaces between the two words of a phrase, just keep one space.

样例输入
above,all ,above all good at good at good
at good at above all me this is
####
world hello ok
####
样例输出
at good:3
hello ok:1

一看就想用split,但是交一发WA,看了两遍,没错啊,一定是连续空格出了问题,郁闷之时写了个小测试程序


import java.awt.List;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class Main {
	public static void main(String[] args) {
		Scanner in = new Scanner(System.in);
		while(true) {
			String s = in.nextLine();
			String ss[] = s.split(" ");	
			System.out.println("ok");
			ArrayList<String> ls = new ArrayList<String>();
			Collections.addAll(ls, ss);
			System.out.println(ls.size());
			for (int i = 0; i < ls.size(); i++) {
				System.out.print(ls.get(i) + "*");
			}
			System.out.println("");
			
			for (int i = 0; i < ls.size(); i++) {
				String string = ls.get(i);
				if (string.isEmpty()) {
					ls.remove(i);
					i--;
				}
			}
			System.out.println(ls.size());
			for (int i = 0; i < ls.size(); i++) {
				String string = ls.get(i);
				if (string.isEmpty()) {
					System.out.print("Warning!");
				}
				System.out.print(string + "*");
				
			}
			System.out.println("");
		}
	}
}




发现两个问题,一个是诡异的******

debug一看,split在切割连续空格的时候会形成空串……要在程序中外加判断,必要的话要去除

第二个是remove之后al.size()减小了……我还以为al.size()一直是固定的……

改改就过了……JAVA还是用的不熟……


AC代码:

import java.util.*;

public class Main {
	public static void main(String[] args) {
		Scanner in = new Scanner(System.in);
		String str;
		String tss[][] = new String[505][];
		Map<String, Integer> m = new TreeMap<String, Integer>(); //TreeSet自动字典序排序
		while (in.hasNextLine()) {
			String ans = ""; //用于最后输出频度最高短语
			int cnt = -1;    //频度最高短语的出现次数
			while ((str = in.nextLine()) != null) {
				if (str.charAt(0) == '#') {
					Iterator<String> it = m.keySet().iterator();
					while (it.hasNext()) {
						String key = it.next().toString();
						Integer v = m.get(key);
						if (v > cnt) {
							cnt = v;
							ans = key;
						}
					}
					m.clear();
					break;
				}
				String ss[] = str.split(",|\\."); // 以","和"."分割str
				for (int i = 0; i < ss.length; i++) {
					tss[i] = ss[i].split(" "); // 再以空格分隔一次,注意分隔时连续的空格分割会形成空字符串
				}
				for (int i = 0; i < ss.length; i++) {
					String t;
					if (tss[i].length == 1) {
						t = tss[i][0];
						if (!t.equals("")) { //如果是空字符串则忽略
							if (m.get(t) == null) {
								m.put(t, 1);
							} else {
								m.put(t, m.get(t) + 1);
							}
						}
					} else {
						ArrayList<String> al = new ArrayList<String>();
						Collections.addAll(al, tss[i]); // 将tss[i]放到al中,用于去除空字符串
						// debug j < al.size(),不是int lenal = al.size()...j < lenal,al的大小一直在变
						for (int j = 0; j < al.size(); j++) {
							if (al.get(j).isEmpty()) {
								al.remove(j);
								j--; // 注意下标变化,删除一个元素以后相当于j加了1
							}
						}
						int lenal = al.size();
						for (int j = 1; j < lenal; j++) {
							t = al.get(j - 1) + " " + al.get(j);
							if (m.get(t) == null) {
								m.put(t, 1);
							} else {
								m.put(t, m.get(t) + 1);
							}
						}
					}
				}
			}
			System.out.printf("%s:%d\n", ans, cnt);
		}
		in.close();
	}
}


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值