Number of Atoms

本文介绍了一种解析化学分子式并计算各元素数量的算法。通过使用栈和递归方法,该算法能够正确处理嵌套括号,实现对复杂分子式的精确解析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Given a chemical formula (given as a string), return the count of each atom.

An atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

1 or more digits representing the count of that element may follow if the count is greater than 1. If the count is 1, no digits will follow. For example, H2O and H2O2 are possible, but H1O2 is impossible.

Two formulas concatenated together produce another formula. For example, H2O2He3Mg4 is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula. For example, (H2O2) and (H2O2)3 are formulas.

Given a formula, output the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

Example 1:

Input: 
formula = "H2O"
Output: "H2O"
Explanation: 
The count of elements are {'H': 2, 'O': 1}.

 

Example 2:

Input: 
formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: 
The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

 

Example 3:

Input: 
formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: 
The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

 

Note:

  • All atom names consist of lowercase letters, except for the first character which is uppercase.
  • The length of formula will be in the range [1, 1000].
  • formula will only consist of letters, digits, and round parentheses, and is a valid formula as defined in the problem.

 

题目理解:

给定分子式,计算分子式中每一种元素的数量

解题思路:

这个题目主要的难点在于如何处理括号,因为同一种元素可能出现在括号里面和外面,在计算括号中元素数量时,需要对这两部分的元素分别处理。

我采用的方法是将元素的种类和数量作为一个整体存入栈中,将所有的左括号“(”直接入栈,如果遇到有括号“)”,那么将栈中左括号之前的所有元素取出来,改写其数量,再重新加入栈中,这样就可以解决不同位置的同种元素的数量处理问题,同时也可以处理括号的嵌套

class Solution {
    public String countOfAtoms(String formula) {
        char[] chs = formula.toCharArray();
        int it = 0, len = formula.length();
        Stack<String> st = new Stack<>();
        while(it < len){
            char ch = chs[it];
            if(ch >= 'A' && ch <= 'Z'){
                String atom = "" + ch;
                it++;
                while(it < len && chs[it] >= 'a' && chs[it] <= 'z'){
                    atom += chs[it];
                    it++;
                }
                st.push(atom + ",1");
            }
            else if(ch >= '0' && ch <= '9'){
                String num = "" + ch;
                it++;
                while(it < len && chs[it] >= '0' && chs[it] <= '9'){
                    num += chs[it];
                    it++;
                }
                int count = Integer.valueOf(num);
                if(st.peek().equals(")")){
                    List<String> list = new ArrayList<>();
                    st.pop();
                    while(!st.peek().equals("("))
                        list.add(st.pop());
                    st.pop();
                    for(String atom : list) {
                        int pos = atom.indexOf(',');
                        int base = Integer.valueOf(atom.substring(pos + 1));
                        int cur = base * count;
                        st.push(atom.substring(0, pos) + "," + cur);
                    }
                }
                else{
                    String atom = st.pop();
                    int pos = atom.indexOf(',');
                    int base = Integer.valueOf(atom.substring(pos + 1));
                    int cur = base * count;
                    st.push(atom.substring(0, pos) + "," + cur);
                }
            }
            else if(ch == '(' || ch == ')'){
                st.push(String.valueOf(ch));
                it++;
            }
        }
        TreeMap<String, Integer> map = new TreeMap<>();
        for(String str : st){
            int pos = str.indexOf(',');
            String atom = str.substring(0, pos);
            int num = Integer.valueOf(str.substring(pos + 1));
            map.put(atom, map.getOrDefault(atom, 0) + num);
        }
        StringBuilder sb = new StringBuilder("");
        for(String atom : map.keySet()){
            sb.append(atom);
            int num = map.get(atom);
            if(num > 1)
                sb.append(map.get(atom));
        }
        return sb.toString();
    }
}

 

import numpy as np import matplotlib.pyplot as plt from scipy.spatial import cKDTree from tqdm import tqdm import os import sys import time import argparse import multiprocessing from collections import defaultdict import csv import matplotlib as mpl import warnings import re # 忽略可能的警告 warnings.filterwarnings("ignore", category=UserWarning) # 原子类型映射 (根据Masses部分动态更新) ATOM_TYPE_MAPPING = {} TYPE_SYMBOL_MAPPING = {} # 专业绘图设置 plt.style.use('seaborn-v0_8-whitegrid') mpl.rcParams.update({ 'font.family': 'serif', 'font.serif': ['Times New Roman', 'DejaVu Serif'], 'font.size': 12, 'axes.labelsize': 14, 'axes.titlesize': 16, 'xtick.labelsize': 12, 'ytick.labelsize': 12, 'figure.dpi': 600, 'savefig.dpi': 600, 'figure.figsize': (8, 6), 'lines.linewidth': 2.0, 'legend.fontsize': 10, 'legend.framealpha': 0.8, 'mathtext.default': 'regular', 'axes.linewidth': 1.5, 'xtick.major.width': 1.5, 'ytick.major.width': 1.5, 'xtick.major.size': 5, 'ytick.major.size': 5, }) def wrap_coordinates(coords, box): """将坐标映射到周期性盒子内 [0, box] 区间""" wrapped_coords = coords.copy() for dim in range(3): if box[dim] > 0: # 将坐标调整到 [0, box[dim]) 区间 wrapped_coords[:, dim] = wrapped_coords[:, dim] % box[dim] return wrapped_coords def get_hbond_config(): """返回氢键配置,包含距离和角度阈值""" return [ { "name": "P-O/P=O···Hw", "donor_type": "water_oxygens", "acceptor_type": "P-O/P=O", "h_type": "water_hydrogens", "distance_threshold": 2.375, "angle_threshold": 143.99, "color": "#1f77b4" }, { "name": "P-O/P=O···Hh", "donor_type": "hydronium_oxygens", "acceptor_type": "P-O/P=O", "h_type": "hydronium_hydrogens", "distance_threshold": 2.275, "angle_threshold": 157.82, "color": "#ff7f0e" }, { "name": "P-O/P=O···Hp", "donor_type": "P-OH", "acceptor_type": "P-O/P=O", "h_type": "phosphate_hydrogens", "distance_threshold": 2.175, "angle_threshold": 155.00, "color": "#2ca02c" }, { "name": "P-OH···Ow", "donor_type": "P-OH", "acceptor_type": "water_oxygens", "h_type": "phosphate_hydrogens", "distance_threshold": 2.275, "angle_threshold": 155.13, "color": "#d62728" }, { "name": "Hw···Ow", "donor_type": "water_oxygens", "acceptor_type": "water_oxygens", "h_type": "water_hydrogens", "distance_threshold": 2.375, "angle_threshold": 138.73, "color": "#9467bd" }, { "name": "Hh···Ow", "donor_type": "hydronium_oxygens", "acceptor_type": "water_oxygens", "h_type": "hydronium_hydrogens", "distance_threshold": 2.225, "angle_threshold": 155.31, "color": "#8c564b" }, { "name": "Hw···F", "donor_type": "water_oxygens", "acceptor_type": "fluoride_atoms", "h_type": "water_hydrogens", "distance_threshold": 2.225, "angle_threshold": 137.68, "color": "#e377c2" }, { "name": "Hh···F", "donor_type": "hydronium_oxygens", "acceptor_type": "fluoride_atoms", "h_type": "hydronium_hydrogens", "distance_threshold": 2.175, "angle_threshold": 154.64, "color": "#7f7f7f" }, { "name": "Hp···F", "donor_type": "P-OH", "acceptor_type": "fluoride_atoms", "h_type": "phosphate_hydrogens", "distance_threshold": 2.275, "angle_threshold": 153.71, "color": "#bcbd22" } ] def calculate_angle(coords, lattice, donor_idx, h_idx, acceptor_idx): """计算D-H···A键角 (角度制),使用笛卡尔向量表示并处理周期性""" # 获取分数坐标 frac_coords = coords # 获取氢原子H的分数坐标 h_frac = frac_coords[h_idx] # 计算供体D相对于H的分数坐标差 d_frac = frac_coords[donor_idx] dh_frac = d_frac - h_frac # 计算受体A相对于H的分数坐标差 a_frac = frac_coords[acceptor_idx] ah_frac = a_frac - h_frac # 应用周期性修正 (将分数坐标差限制在[-0.5, 0.5]范围内) dh_frac = np.where(dh_frac > 0.5, dh_frac - 1, dh_frac) dh_frac = np.where(dh_frac < -0.5, dh_frac + 1, dh_frac) ah_frac = np.where(ah_frac > 0.5, ah_frac - 1, ah_frac) ah_frac = np.where(ah_frac < -0.5, ah_frac + 1, ah_frac) # 转换为笛卡尔向量 (H->D 和 H->A) vec_hd = np.dot(dh_frac, lattice) # H->D向量 vec_ha = np.dot(ah_frac, lattice) # H->A向量 # 计算向量点积 dot_product = np.dot(vec_hd, vec_ha) # 计算向量模长 norm_hd = np.linalg.norm(vec_hd) norm_ha = np.linalg.norm(vec_ha) # 避免除以零 if norm_hd < 1e-6 or norm_ha < 1e-6: return None # 计算余弦值 cos_theta = dot_product / (norm_hd * norm_ha) # 处理数值误差 cos_theta = np.clip(cos_theta, -1.0, 1.0) # 计算角度 (弧度转角度) angle_rad = np.arccos(cos_theta) angle_deg = np.degrees(angle_rad) return angle_deg def identify_atom_types(atom_types, coords, lattice, box): """原子类型识别函数""" # 初始化数据结构 atom_categories = { "phosphate_oxygens": {"P-O/P=O": [], "P-OH": []}, "phosphate_hydrogens": [], "water_oxygens": [], "water_hydrogens": [], "hydronium_oxygens": [], "hydronium_hydrogens": [], "fluoride_atoms": [i for i, a_type in enumerate(atom_types) if a_type == "F"], "aluminum_atoms": [i for i, a_type in enumerate(atom_types) if a_type == "Al"] } # 将坐标映射到盒子内 [0, box] 区间 wrapped_coords = wrap_coordinates(coords, box) # 构建全局KDTree kdtree = cKDTree(wrapped_coords, boxsize=box) # 识别磷酸基团 p_atoms = [i for i, a_type in enumerate(atom_types) if a_type == "P"] phosphate_oxygens = [] for p_idx in p_atoms: # 查找P周围的O原子 (距离 < 1.6&Aring;) neighbors = kdtree.query_ball_point(wrapped_coords[p_idx], r=1.6) p_o_indices = [idx for idx in neighbors if idx != p_idx and atom_types[idx] == "O"] phosphate_oxygens.extend(p_o_indices) # 识别所有H原子并确定归属 hydrogen_owners = {} h_atoms = [i for i, a_type in enumerate(atom_types) if a_type == "H"] for h_idx in h_atoms: neighbors = kdtree.query_ball_point(wrapped_coords[h_idx], r=1.2) candidate_os = [idx for idx in neighbors if idx != h_idx and atom_types[idx] == "O"] if not candidate_os: continue min_dist = float('inf') owner_o = None for o_idx in candidate_os: # 使用映射后的坐标计算距离 dist = np.linalg.norm(wrapped_coords[h_idx] - wrapped_coords[o_idx]) if dist < min_dist: min_dist = dist owner_o = o_idx hydrogen_owners[h_idx] = owner_o # 分类磷酸氧:带H的为P-OH,不带H的为P-O/P=O for o_idx in phosphate_oxygens: has_hydrogen = any(owner_o == o_idx for h_idx, owner_o in hydrogen_owners.items()) if has_hydrogen: atom_categories["phosphate_oxygens"]["P-OH"].append(o_idx) else: atom_categories["phosphate_oxygens"]["P-O/P=O"].append(o_idx) # 识别水和水合氢离子 all_o_indices = [i for i, a_type in enumerate(atom_types) if a_type == "O"] non_phosphate_os = [o_idx for o_idx in all_o_indices if o_idx not in phosphate_oxygens] o_h_count = {} for h_idx, owner_o in hydrogen_owners.items(): o_h_count[owner_o] = o_h_count.get(owner_o, 0) + 1 for o_idx in non_phosphate_os: h_count = o_h_count.get(o_idx, 0) attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] if h_count == 2: atom_categories["water_oxygens"].append(o_idx) atom_categories["water_hydrogens"].extend(attached_hs) elif h_count == 3: atom_categories["hydronium_oxygens"].append(o_idx) atom_categories["hydronium_hydrogens"].extend(attached_hs) # 识别磷酸基团的H原子 for o_idx in atom_categories["phosphate_oxygens"]["P-OH"]: attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] atom_categories["phosphate_hydrogens"].extend(attached_hs) return atom_categories def find_hbonds_in_frame(atom_types, coords, lattice, box, atom_categories, hbond_config): """在当前帧中寻找所有氢键""" hbond_results = {hbond["name"]: [] for hbond in hbond_config} # 将坐标映射到盒子内 [0, box] 区间 wrapped_coords = wrap_coordinates(coords, box) # 构建全局KDTree kdtree = cKDTree(wrapped_coords, boxsize=box) # 处理每一类氢键 for hbond in hbond_config: # 获取供体原子列表 if hbond["donor_type"] == "P-OH": donors = atom_categories["phosphate_oxygens"]["P-OH"] else: donors = atom_categories[hbond["donor_type"]] # 获取受体原子列表 if hbond["acceptor_type"] == "P-O/P=O": acceptors = atom_categories["phosphate_oxygens"]["P-O/P=O"] elif hbond["acceptor_type"] == "P-OH": acceptors = atom_categories["phosphate_oxygens"]["P-OH"] else: acceptors = atom_categories[hbond["acceptor_type"]] # 获取氢原子列表 hydrogens = atom_categories[hbond["h_type"]] # 如果没有氢原子或受体,跳过 if len(hydrogens) == 0 or len(acceptors) == 0: continue # 为受体构建KDTree acceptor_coords = wrapped_coords[acceptors] acceptor_kdtree = cKDTree(acceptor_coords, boxsize=box) # 遍历所有氢原子 for h_idx in hydrogens: h_coords = wrapped_coords[h_idx] # 查找与H成键的供体 (距离 < bond_threshold) donor_neighbors = kdtree.query_ball_point(h_coords, r=1.3) donor_candidates = [idx for idx in donor_neighbors if idx in donors] # 如果没有找到供体,跳过 if not donor_candidates: continue # 选择最近的供体 min_dist = float('inf') donor_idx = None for d_idx in donor_candidates: dist = np.linalg.norm(wrapped_coords[h_idx] - wrapped_coords[d_idx]) if dist < min_dist: min_dist = dist donor_idx = d_idx # 查找在阈值内的受体 acceptor_indices = acceptor_kdtree.query_ball_point(h_coords, r=hbond["distance_threshold"]) for a_idx_offset in acceptor_indices: a_idx = acceptors[a_idx_offset] # 排除供体自身 if a_idx == donor_idx: continue # 计算键角 (使用原始坐标) angle = calculate_angle(coords, lattice, donor_idx, h_idx, a_idx) if angle is not None and angle >= hbond["angle_threshold"]: # 记录氢键 (供体, 氢, 受体) hbond_results[hbond["name"]].append((donor_idx, h_idx, a_idx)) return hbond_results def read_data_file(data_file): """读取LAMMPS data文件,支持Materials Studio格式""" atom_types = [] coords = [] box = np.zeros(3) # 初始化状态机 section = None box_dims = [] masses_parsed = False with open(data_file, 'r') as f: for line in f: stripped = line.strip() # 跳过空行 if not stripped: continue # 检测部分标题 if "xlo xhi" in stripped: parts = stripped.split() if len(parts) >= 2: try: xlo = float(parts[0]) xhi = float(parts[1]) box_dims.append(xhi - xlo) except ValueError: pass section = "box" elif "ylo yhi" in stripped: parts = stripped.split() if len(parts) >= 2: try: ylo = float(parts[0]) yhi = float(parts[1]) box_dims.append(yhi - ylo) except ValueError: pass elif "zlo zhi" in stripped: parts = stripped.split() if len(parts) >= 2: try: zlo = float(parts[0]) zhi = float(parts[1]) box_dims.append(zhi - zlo) except ValueError: pass elif stripped.startswith("Masses"): section = "masses" continue elif stripped.startswith("Atoms"): section = "atoms" # 下一行可能是标题或空行 continue # 解析内容 if section == "masses": # 解析质量行: "1 1.00800000 # H" if stripped and stripped[0].isdigit(): parts = re.split(r'\s+', stripped, 2) if len(parts) >= 2: try: atom_type_num = int(parts[0]) mass = float(parts[1]) # 尝试从注释中提取元素符号 symbol = "Unknown" if len(parts) > 2 and "#" in parts[2]: comment = parts[2].split("#")[1].strip() if comment: symbol = comment # 更新原子类型映射 ATOM_TYPE_MAPPING[atom_type_num] = symbol TYPE_SYMBOL_MAPPING[symbol] = atom_type_num except (ValueError, IndexError): pass elif section == "atoms": # 解析原子行: "1 7 17.470000000000 55.085000000000 14.227000000000" if stripped and stripped[0].isdigit(): parts = re.split(r'\s+', stripped) if len(parts) >= 5: # 至少需要id, type, x, y, z try: # 第一列是原子ID,第二列是原子类型 atom_id = int(parts[0]) atom_type_num = int(parts[1]) x = float(parts[2]) y = float(parts[3]) z = float(parts[4]) # 获取原子符号 atom_type = ATOM_TYPE_MAPPING.get(atom_type_num, "Unknown") atom_types.append(atom_type) coords.append([x, y, z]) except (ValueError, IndexError): print(f"Warning: Skipping invalid atom line: {stripped}") # 设置盒子尺寸 if len(box_dims) == 3: box = np.array(box_dims) else: print("Warning: Could not parse box dimensions. Using default values.") box = np.array([100.0, 100.0, 100.0]) # 默认盒子尺寸 # 确保我们有原子类型映射 if not ATOM_TYPE_MAPPING: # 使用默认映射 default_mapping = {1: "H", 2: "O", 3: "F", 7: "P"} ATOM_TYPE_MAPPING.update(default_mapping) for num, sym in default_mapping.items(): TYPE_SYMBOL_MAPPING[sym] = num return np.array(atom_types), np.array(coords), box def read_lammpstrj_frame(file_handle): """从LAMMPS轨迹文件中读取一帧""" frame_data = {} # 读取时间步 file_handle.readline() # ITEM: TIMESTEP timestep = int(file_handle.readline().strip()) # 读取原子数量 file_handle.readline() # ITEM: NUMBER OF ATOMS num_atoms = int(file_handle.readline().strip()) # 读取盒子尺寸 file_handle.readline() # ITEM: BOX BOUNDS pp pp pp box_x = list(map(float, file_handle.readline().split())) box_y = list(map(float, file_handle.readline().split())) box_z = list(map(float, file_handle.readline().split())) box = np.array([ box_x[1] - box_x[0], box_y[1] - box_y[0], box_z[1] - box_z[0] ]) # 读取原子数据 file_handle.readline() # ITEM: ATOMS id type x y z atom_types = [] coords = [] for _ in range(num_atoms): parts = file_handle.readline().split() if len(parts) < 5: # 确保有足够的字段 continue atom_id = int(parts[0]) atom_type_num = int(parts[1]) x = float(parts[2]) y = float(parts[3]) z = float(parts[4]) # 映射原子类型 atom_type = ATOM_TYPE_MAPPING.get(atom_type_num, "Unknown") atom_types.append(atom_type) coords.append([x, y, z]) frame_data = { "timestep": timestep, "atom_types": np.array(atom_types), "coords": np.array(coords), "box": box } return frame_data def calculate_hbond_lifetime(hbond_events, delta_t, max_tau): """计算氢键寿命相关函数""" # 初始化相关函数数组 corr_func = np.zeros(max_tau) norm_factor = np.zeros(max_tau) # 计算时间原点数量 num_origins = len(hbond_events) - max_tau # 遍历所有可能的时间原点 for t0 in range(0, num_origins, delta_t): # 获取当前时间原点存在的氢键 current_hbonds = set(hbond_events[t0]) # 遍历时间间隔 for tau in range(max_tau): t = t0 + tau if t >= len(hbond_events): break # 计算在时间t仍然存在的氢键数量 surviving = current_hbonds.intersection(hbond_events[t]) corr_func[tau] += len(surviving) norm_factor[tau] += len(current_hbonds) # 归一化相关函数 for tau in range(max_tau): if norm_factor[tau] > 0: corr_func[tau] /= norm_factor[tau] else: corr_func[tau] = 0.0 return corr_func def integrate_correlation_function(corr_func, dt): """积分相关函数计算弛豫时间""" # 使用梯形法积分 integral = np.trapz(corr_func, dx=dt) return integral def process_frame(frame_data): """处理单帧数据""" atom_types = frame_data["atom_types"] coords = frame_data["coords"] box = frame_data["box"] # 创建晶格矩阵 (正交盒子) lattice = np.diag(box) # 识别原子类别 atom_categories = identify_atom_types(atom_types, coords, lattice, box) # 获取氢键配置 hbond_config = get_hbond_config() # 查找氢键 hbond_results = find_hbonds_in_frame(atom_types, coords, lattice, box, atom_categories, hbond_config) return hbond_results def test_first_frame(data_file, traj_file): """测试模式:只处理第一帧,包含详细调试信息""" print("Running in test mode: analyzing first frame only") try: # 读取data文件获取初始原子类型和坐标 atom_types, coords, box = read_data_file(data_file) print(f"Successfully read data file: {data_file}") print(f"Box dimensions: {box}") print(f"Number of atoms: {len(atom_types)}") # 打印原子类型映射 print("\nAtom Type Mapping:") for type_num, symbol in ATOM_TYPE_MAPPING.items(): print(f"Type {type_num} -> {symbol}") # 创建晶格矩阵 (正交盒子) lattice = np.diag(box) # 识别原子类别 atom_categories = identify_atom_types(atom_types, coords, lattice, box) # 打印原子类别统计 print("\nAtom Categories:") for cat, values in atom_categories.items(): if isinstance(values, dict): for sub_cat, indices in values.items(): print(f"{cat}.{sub_cat}: {len(indices)} atoms") else: print(f"{cat}: {len(values)} atoms") # 获取氢键配置 hbond_config = get_hbond_config() # 查找氢键 hbond_results = find_hbonds_in_frame(atom_types, coords, lattice, box, atom_categories, hbond_config) # 打印氢键统计 print("\nHydrogen Bonds Found:") total_hbonds = 0 for hbond_type, bonds in hbond_results.items(): print(f"{hbond_type}: {len(bonds)} bonds") total_hbonds += len(bonds) print(f"\nTotal Hydrogen Bonds: {total_hbonds}") # 保存结果到CSV os.makedirs("hbond_analysis", exist_ok=True) csv_path = os.path.join("hbond_analysis", "first_frame_hbonds.csv") with open(csv_path, 'w', newline='') as csvfile: writer = csv.writer(csvfile) writer.writerow(["Hydrogen Bond Type", "Count"]) for hbond_type, bonds in hbond_results.items(): writer.writerow([hbond_type, len(bonds)]) print(f"\nSaved hydrogen bond counts to: {csv_path}") except Exception as e: print(f"Error during test mode: {str(e)}") import traceback traceback.print_exc() def calculate_hbond_lifetimes(data_file, traj_file, workers=1, delta_t=10, max_tau=1000, dt=0.1): """计算氢键寿命""" print(f"Starting hydrogen bond lifetime analysis with {workers} workers") # 读取data文件获取初始原子类型和坐标 atom_types, _, box = read_data_file(data_file) # 创建晶格矩阵 (正交盒子) lattice = np.diag(box) # 获取氢键配置 hbond_config = get_hbond_config() # 初始化氢键事件记录器 hbond_events = {hbond["name"]: [] for hbond in hbond_config} # 打开轨迹文件 frame_count = 0 with open(traj_file, 'r') as f: # 计算总帧数 total_frames = sum(1 for line in f if "ITEM: TIMESTEP" in line) f.seek(0) # 重置文件指针 # 进度条 pbar = tqdm(total=total_frames, desc="Processing frames") # 读取每一帧 while True: line = f.readline() if not line: break if "ITEM: TIMESTEP" in line: frame_data = read_lammpstrj_frame(f) frame_count += 1 # 处理当前帧 atom_categories = identify_atom_types( frame_data["atom_types"], frame_data["coords"], lattice, frame_data["box"] ) frame_hbonds = find_hbonds_in_frame( frame_data["atom_types"], frame_data["coords"], lattice, frame_data["box"], atom_categories, hbond_config ) # 记录氢键事件 (只记录氢键类型和原子索引元组) for hbond_type, bonds in frame_hbonds.items(): # 使用frozenset存储原子索引元组确保可哈希 hbond_set = set() for bond in bonds: # 使用(donor, acceptor)作为氢键标识符 hbond_set.add((bond[0], bond[2])) hbond_events[hbond_type].append(hbond_set) pbar.update(1) pbar.close() print(f"Processed {frame_count} frames") # 计算每种氢键类型的相关函数 results = {} for hbond_type, events in hbond_events.items(): print(f"\nCalculating lifetime for {hbond_type}...") corr_func = calculate_hbond_lifetime(events, delta_t, max_tau) tau = integrate_correlation_function(corr_func, dt) results[hbond_type] = {"corr_func": corr_func, "tau": tau} # 打印结果 print(f" Relaxation time for {hbond_type}: {tau:.2f} fs") # 绘制相关函数 plot_correlation_functions(results, dt, max_tau) return results def plot_correlation_functions(results, dt, max_tau): """绘制氢键寿命相关函数""" os.makedirs("hbond_lifetimes", exist_ok=True) # 创建时间轴 time_axis = np.arange(0, max_tau * dt, dt) # 创建图形 plt.figure(figsize=(10, 8)) # 绘制每种氢键类型的相关函数 for hbond_type, data in results.items(): corr_func = data["corr_func"] tau = data["tau"] # 截断时间轴以匹配相关函数长度 t_plot = time_axis[:len(corr_func)] plt.plot( t_plot, corr_func, label=f"{hbond_type} (τ={tau:.2f} fs)", linewidth=2 ) # 设置图形属性 plt.title("Hydrogen Bond Lifetime Correlation Functions", fontsize=16) plt.xlabel("Time (fs)", fontsize=14) plt.ylabel("Survival Probability", fontsize=14) plt.yscale("log") plt.grid(True, linestyle='--', alpha=0.7) plt.legend(fontsize=10, framealpha=0.8) # 保存图像 plot_path = os.path.join("hbond_lifetimes", "hbond_lifetimes.tiff") plt.tight_layout() plt.savefig(plot_path, dpi=600) plt.close() print(f"\nSaved correlation function plot to: {plot_path}") # 保存数据到CSV csv_path = os.path.join("hbond_lifetimes", "hbond_lifetimes.csv") with open(csv_path, 'w', newline='') as csvfile: writer = csv.writer(csvfile) header = ["Time (fs)"] for hbond_type in results.keys(): header.append(f"{hbond_type} (C(t))") header.append(f"{hbond_type} (τ)") writer.writerow(header) for i in range(len(time_axis)): if i >= max_tau: break row = [time_axis[i]] for hbond_type, data in results.items(): corr_func = data["corr_func"] tau = data["tau"] if i < len(corr_func): row.append(corr_func[i]) else: row.append("") row.append(tau) writer.writerow(row) print(f"Saved lifetime data to: {csv_path}") def main(): parser = argparse.ArgumentParser(description='Hydrogen Bond Analysis for LAMMPS Trajectories') parser.add_argument('--test', action='store_true', help='Run in test mode (first frame only)') parser.add_argument('--workers', type=int, default=multiprocessing.cpu_count(), help=f'Number of parallel workers (default: {multiprocessing.cpu_count()})') parser.add_argument('--delta_t', type=int, default=10, help='Time origin spacing for correlation function (default: 10 frames)') parser.add_argument('--max_tau', type=int, default=1000, help='Maximum time for correlation function (default: 1000 frames)') parser.add_argument('--dt', type=float, default=0.1, help='Time step (fs) (default: 0.1 fs)') args = parser.parse_args() # 设置默认文件 data_file = "H3PO4-23pure.data" traj_file = "nvt-P2O5-353K-23.lammpstrj" # 检查文件是否存在 if not os.path.exists(data_file): print(f"Error: Data file not found: {data_file}") sys.exit(1) if not os.path.exists(traj_file): print(f"Error: Trajectory file not found: {traj_file}") sys.exit(1) start_time = time.time() if args.test: # 测试模式:只处理第一帧 test_first_frame(data_file, traj_file) else: # 完整模式:计算氢键寿命 calculate_hbond_lifetimes( data_file, traj_file, workers=args.workers, delta_t=args.delta_t, max_tau=args.max_tau, dt=args.dt ) elapsed = time.time() - start_time print(f"\nAnalysis completed in {elapsed:.2f} seconds") if __name__ == "__main__": main() 以上代码对第一帧的识别已经无误,但在计算氢键寿命时候报错,但在读取LAMMPS轨迹文件时候,读取时间步的时候时间读取到了字符串“ITEM:NUMBER OF ATOMS”,在这里我们主要涉及两个文件,一个是将dump后缀改为lammpstrj后缀的文件以及一个data文件,为什么总会在读取的方面报错一些问题呢?理论上来说读取LAMMMPS的数据真的那么难吗?还遇到过stack trace错误堆栈跟踪,可能是在windos使用multiprocessing有关,数据无法被正确序列化
最新发布
08-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值