IndiaHacks 2016 - Online Edition (Div. 1 + Div. 2) 653B Bear and Compressing(dfs + stl)

B. Bear and Compressing
time limit per test
2 seconds
memory limit per test
256 megabytes
input
standard input
output
standard output

Limak is a little polar bear. Polar bears hate long strings and thus they like to compress them. You should also know that Limak is so young that he knows only first six letters of the English alphabet: 'a', 'b', 'c', 'd', 'e' and 'f'.

You are given a set of q possible operations. Limak can perform them in any order, any operation may be applied any number of times. The i-th operation is described by a string ai of length two and a string bi of length one. No two of q possible operations have the same string ai.

When Limak has a string s he can perform the i-th operation on s if the first two letters of s match a two-letter string ai. Performing the i-th operation removes first two letters of s and inserts there a string bi. See the notes section for further clarification.

You may note that performing an operation decreases the length of a string s exactly by 1. Also, for some sets of operations there may be a string that cannot be compressed any further, because the first two letters don't match any ai.

Limak wants to start with a string of length n and perform n - 1 operations to finally get a one-letter string "a". In how many ways can he choose the starting string to be able to get "a"? Remember that Limak can use only letters he knows.

Input

The first line contains two integers n and q (2 ≤ n ≤ 61 ≤ q ≤ 36) — the length of the initial string and the number of available operations.

The next q lines describe the possible operations. The i-th of them contains two strings ai and bi (|ai| = 2, |bi| = 1). It's guaranteed thatai ≠ aj for i ≠ j and that all ai and bi consist of only first six lowercase English letters.

Output

Print the number of strings of length n that Limak will be able to transform to string "a" by applying only operations given in the input.

Examples
input
3 5
ab a
cc c
ca a
ee c
ff d
output
4
input
2 8
af e
dc d
cc f
bc b
da b
eb a
bb b
ff c
output
1
input
6 2
bb a
ba a
output
0
Note

In the first sample, we count initial strings of length 3 from which Limak can get a required string "a". There are 4 such strings: "abb", "cab", "cca", "eea". The first one Limak can compress using operation 1 two times (changing "ab" to a single "a"). The first operation would change "abb" to "ab" and the second operation would change "ab" to "a".

Other three strings may be compressed as follows:

  • "cab "ab "a"
  • "cca "ca "a"
  • "eea "ca "a"

In the second sample, the only correct initial string is "eb" because it can be immediately compressed to "a".



题目链接:点击打开链接

给出n, q, 有q种字符串对应的关系, 表示两个字符可以转化为一个字符, 问有多少个长度为n 由abcdef构成的字符串, 两两转化最后得到

字符'a'.

一共才6个字符, 枚举枚举所有可能出现的情况进行判断即可.

AC代码:

#include "iostream"
#include "cstdio"
#include "cstring"
#include "algorithm"
#include "queue"
#include "stack"
#include "cmath"
#include "utility"
#include "map"
#include "set"
#include "vector"
#include "list"
#include "string"
#include "cstdlib"
using namespace std;
typedef long long ll;
#define X first
#define Y second
#define exp 1e-8
const int MOD = 1e9 + 7;
const int INF = 0x3f3f3f3f;
const int MAXN = 10;
int n, q, sum;
std::map<string, string> mp;
string s[MAXN];
void Dfs(string x, int len)
{
	if(len >= 2 && mp[x.substr(0, 2)] == "") return;
	if(len == n) {
		string ans = x.substr(0, 2);
		if(mp[ans] != "") ans = mp[ans];
		else return;
		for(int i = 2; i < n; ++i) {
			string tmp = ans + x[i];
			if(mp[tmp] != "") ans = mp[tmp];
			else return;
		}
		if(ans == s[0]) sum++;
		return;
	}
	for(int i = 0; i < 6; ++i) 
		Dfs(x + s[i], len + 1);
	return;
}
int main(int argc, char const *argv[])
{
	for(int i = 0; i < 6; ++i)
		s[i] = 'a' + i;
	cin >> n >> q;
	while(q--) {
		string x, y;
		cin >> x >> y;
		mp[x] = y;
	}
	for(int i = 0; i < 6; ++i)
		Dfs(s[i], 1);
	cout << sum << endl;
	return 0;
}


<think>我们有一个目录结构,其中包含多个Parquet文件(例如在data/chunk-000/目录下)。我们需要修改脚本,使其能够批量转换一个目录下的所有Parquet文件(包括子目录)为JSONL文件。 要求: 1. 输入一个目录路径,递归查找所有.parquet文件。 2. 为每个Parquet文件在相同目录下生成同名的.jsonl文件(或者可以指定输出目录,保持原有目录结构)。 3. 保留原脚本的分块处理、进度显示和错误处理功能。 修改思路: - 添加一个命令行参数,允许用户指定输入目录(或文件)和输出目录。 - 遍历输入目录,找到所有.parquet文件。 - 对于每个文件,调用原有的转换函数(需稍作调整以支持输出路径指定)。 我们修改脚本如下: 1. 支持两种模式:单个文件转换和目录批量转换。 2. 添加递归查找功能。 3. 添加输出目录选项,如果指定了输出目录,则转换后的文件将保存在输出目录中,并保持与输入目录相同的相对路径。 例如: 输入目录: /path/to/input 输出目录: /path/to/output 则输入文件: /path/to/input/data/chunk-000/episode_000000.parquet 输出文件: /path/to/output/data/chunk-000/episode_000000.jsonl 步骤: 1. 解析命令行参数,支持输入路径(可以是文件或目录)和输出目录参数。 2. 如果输入是文件,则直接转换,输出到指定输出文件或默认同目录下同名.jsonl文件。 3. 如果输入是目录,则递归查找所有.parquet文件,然后逐个转换。 注意:要确保输出目录存在,如果不存在则创建。 修改后的脚本:</think>以下是将脚本修改为支持批量转换多个Parquet文件的完整解决方案,基于您提供的目录结构示例: ```python #!/usr/bin/env python3 """ 批量 Parquet 转 JSONL 转换工具 支持目录递归扫描、并行处理和进度监控 """ import argparse import os import time import pandas as pd import pyarrow.parquet as pq from multiprocessing import Pool, cpu_count from tqdm import tqdm import json def convert_single_parquet(input_path, output_path, chunksize=100000): """ 转换单个 Parquet 文件为 JSONL 格式 参数: input_path: 输入的 Parquet 文件路径 output_path: 输出的 JSONL 文件路径 chunksize: 处理分块大小(行数) """ try: # 确保输出目录存在 os.makedirs(os.path.dirname(output_path), exist_ok=True) parquet_file = pq.ParquetFile(input_path) total_rows = parquet_file.metadata.num_rows with open(output_path, 'w', encoding='utf-8') as jsonl_file: for batch in parquet_file.iter_batches(batch_size=chunksize): df = batch.to_pandas() df.to_json(jsonl_file, orient='records', lines=True, force_ascii=False) jsonl_file.write('\n') return { "input": input_path, "output": output_path, "rows": total_rows, "status": "success" } except Exception as e: return { "input": input_path, "error": str(e), "status": "failed" } def batch_convert(input_dir, output_dir, chunksize=100000, workers=None): """ 批量转换目录中的所有 Parquet 文件 参数: input_dir: 输入目录路径 output_dir: 输出目录路径 chunksize: 处理分块大小 workers: 并行工作进程数 """ print(f"🔍 扫描目录: {input_dir}") start_time = time.time() # 查找所有 Parquet 文件 parquet_files = [] for root, _, files in os.walk(input_dir): for file in files: if file.endswith('.parquet'): input_path = os.path.join(root, file) # 保持相对路径结构 rel_path = os.path.relpath(root, input_dir) output_path = os.path.join(output_dir, rel_path, os.path.splitext(file)[0] + '.jsonl') parquet_files.append((input_path, output_path)) if not parquet_files: print("⚠️ 未找到任何 Parquet 文件") return print(f"📦 找到 {len(parquet_files)} 个 Parquet 文件") # 设置并行工作进程 if workers is None: workers = min(cpu_count(), 8) # 最多使用8个核心 print(f"⚙️ 使用 {workers} 个并行工作进程") # 并行处理文件 results = [] with Pool(processes=workers) as pool: tasks = [(ip, op, chunksize) for ip, op in parquet_files] with tqdm(total=len(tasks), desc="转换进度") as pbar: for result in pool.starmap(convert_single_parquet, tasks): results.append(result) pbar.update(1) # 输出统计信息 duration = time.time() - start_time success = sum(1 for r in results if r['status'] == 'success') failed = len(results) - success print(f"\n✅ 转换完成! 总计: {len(results)} 文件 | 成功: {success} | 失败: {failed}") print(f"⏱️ 总耗时: {duration:.2f} 秒") # 保存转换日志 log_path = os.path.join(output_dir, "conversion_log.json") with open(log_path, 'w') as log_file: json.dump(results, log_file, indent=2) print(f"📝 转换日志已保存至: {log_path}") # 输出失败文件列表 if failed > 0: print("\n❌ 失败文件列表:") for r in results: if r['status'] == 'failed': print(f"- {r['input']}: {r['error']}") if __name__ == "__main__": parser = argparse.ArgumentParser(description='批量 Parquet 转 JSONL 转换工具') parser.add_argument('input', help='输入目录路径') parser.add_argument('-o', '--output', required=True, help='输出目录路径') parser.add_argument('-c', '--chunksize', type=int, default=100000, help='分块大小(行数)') parser.add_argument('-w', '--workers', type=int, help=f'并行工作进程数 (默认: CPU核心数,最多8个)') args = parser.parse_args() # 执行批量转换 batch_convert( input_dir=args.input, output_dir=args.output, chunksize=args.chunksize, workers=args.workers ) ``` ### 使用说明 1. **安装依赖**: ```bash pip install pyarrow pandas tqdm ``` 2. **运行脚本**: ```bash # 基本用法(保留目录结构) python parquet_to_jsonl_batch.py data -o converted_data # 指定分块大小和并行度 python parquet_to_jsonl_batch.py data -o converted_data -c 500000 -w 4 ``` ### 功能特点 1. **目录结构保留**: - 自动保持输入目录结构[^1] - 示例:`data/chunk-000/episode_000000.parquet` → `converted_data/chunk-000/episode_000000.jsonl` 2. **并行处理**: - 自动检测 CPU 核心数(默认最多8个进程) - 可手动指定并行度 (`-w` 参数) - 使用进程池避免 GIL 限制 3. **进度监控**: ```plaintext 转换进度: 100%|██████████| 5/5 [00:12<00:00, 2.41s/file] ✅ 转换完成! 总计: 5 文件 | 成功: 5 | 失败: 0 ⏱️ 总耗时: 12.05 秒 📝 转换日志已保存至: converted_data/conversion_log.json ``` 4. **错误处理**: - 单个文件失败不影响整体任务 - 详细错误日志记录 - 生成转换报告 (JSON 格式) 5. **大文件优化**: - 可调整分块大小处理特大文件 - 内存友好型流式处理 ### 针对您目录结构的处理逻辑 根据您的示例结构: ```plaintext data/ └── chunk-000/ ├── episode_000000.parquet ├── episode_000001.parquet ... ``` 脚本将: 1. 递归扫描 `data` 目录 2. 保持 `chunk-000` 子目录结构 3. 为每个 Parquet 文件生成同名 JSONL 文件 4. 输出到指定目录(如 `converted_data`) ### 性能建议 1. **SSD 存储**: ```bash # 使用高性能存储 python parquet_to_jsonl_batch.py /ssd/data -o /ssd/converted ``` 2. **云存储优化**: ```bash # 使用S3存储时(需安装s3fs) pip install s3fs python parquet_to_jsonl_batch.py s3://my-bucket/data -o s3://my-bucket/converted ``` 3. **超大集群处理**: ```bash # 使用Dask分布式集群 dask-scheduler & dask-worker --nthreads 4 --memory-limit 8GB python distributed_converter.py --scheduler 127.0.0.1:8786 ```
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值