LintCode-570: Find the Missing Number II (DFS经典题)

这题跟简单的找missing number (用XOR)的完全没关系,因为每次划分子字符串可以是1-2个字符,只能用DFS。
我的方法是用的经典的DFS回溯。

代码如下:

class Solution {
public:
    /**
     * @param n: An integer
     * @param str: a string with number from 1-n in random order and miss one number
     * @return: An integer
     */
    bool findIt = false; 
    int g_missingNum;

    int findMissing2(int n, string &str) {
        vector<int> visited(n+1, 0);
        helper(n, str, 0, visited);
        return g_missingNum;
    }

    void helper(int n, string &str, int index, vector<int>& visited) {
        if (index == str.size()) {
            if (findIt) return;

            int missingCount = 0;
            int missingNum;
            for (int i = 1; i <= n; ++i) {
                if (!visited[i]) {
                    missingCount++;
                    missingNum = i; 
                }
            }

            if (missingCount == 1) {
                findIt = true;
                g_missingNum = missingNum;
            }
            return;
        }

        for (int i = 1; i <= 2; ++i) {
            if (index + i <= str.size()) {
                int candidateNum = stoi(str.substr(index, i));

                if ((candidateNum <= 0) || (candidateNum > n) || 
                     visited[candidateNum] ||    //no duplicate numbers
                     (str.substr(index, i)[0] == '0')     //do not consider '0' or '09'
                    ) continue;  //剪枝 

                visited[candidateNum] = 1;
                helper(n, str, index + i, visited);
                visited[candidateNum] = 0;
            }
        }
    }
};

注意:
1)要剪枝,不然超时
2)要用visited[]数组,并且要注意避免重复。我之前准备用一个set来存储每次的结果,把所有的结果都放到一个二维vector里面,那样可能也可以,但是没有这个visited[]数组来得简单,这里连二维数组也不用,因为只要找到一个解就可以,后面的都不用做了。
之前的那个subset 和 Split String 的题为什么要用一个二维vector来存results呢?因为那两题需要得到所有的解。

Cut or not to cut, it is a question. In Fruit Ninja, comprising three or more fruit in one cut gains extra bonuses. This kind of cuts are called bonus cuts. Also, performing the bonus cuts in a short time are considered continual, iff. when all the bonus cuts are sorted, the time difference between every adjacent cuts is no more than a given period length of W. As a fruit master, you have predicted the times of potential bonus cuts though the whole game. Now, your task is to determine how to cut the fruits in order to gain the most bonuses, namely, the largest number of continual bonus cuts. Obviously, each fruit is allowed to cut at most once. i.e. After previous cut, a fruit will be regarded as invisible and won't be cut any more. In addition, you must cut all the fruit altogether in one potential cut. i.e. If your potential cut contains 6 fruits, 2 of which have been cut previously, the 4 left fruits have to be cut altogether. There are multiple test cases. The first line contains an integer, the number of test cases. In each test case, there are three integer in the first line: N(N<=30), the number of predicted cuts, M(M<=200), the number of fruits, W(W<=100), the time window. N lines follows. In each line, the first integer Ci(Ci<=10) indicates the number of fruits in the i-th cuts. The second integer Ti(Ti<=2000) indicate the time of this cut. It is guaranteed that every time is unique among all the cuts. Then follow Ci numbers, ranging from 0 to M-1, representing the identifier of each fruit. If two identifiers in different cuts are the same, it means they represent the same fruit. For each test case, the first line contains one integer A, the largest number of continual bonus cuts. In the second line, there are A integers, K1, K2, ..., K_A, ranging from 1 to N, indicating the (Ki)-th cuts are included in the answer. The integers are in ascending order and each separated by one space. If there are multiple best solutions, any one is accepted. 输入样例 1 4 10 4 3 1 1 2 3 4 3 3 4 6 5 3 7 7 8 9 3 5 9 5 4 输出样例 3 1 2 3
最新发布
12-27
# # Copyright 2025 The InfiniFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # import logging import os import random import re import sys import threading from copy import deepcopy from io import BytesIO from timeit import default_timer as timer import numpy as np import pdfplumber import trio import xgboost as xgb from huggingface_hub import snapshot_download from PIL import Image from pypdf import PdfReader as pdf2_read from api import settings from api.utils.file_utils import get_project_base_directory from deepdoc.vision import OCR, LayoutRecognizer, Recognizer, TableStructureRecognizer from rag.app.picture import vision_llm_chunk as picture_vision_llm_chunk from rag.nlp import rag_tokenizer from rag.prompts import vision_llm_describe_prompt from rag.settings import PARALLEL_DEVICES LOCK_KEY_pdfplumber = "global_shared_lock_pdfplumber" if LOCK_KEY_pdfplumber not in sys.modules: sys.modules[LOCK_KEY_pdfplumber] = threading.Lock() class RAGFlowPdfParser: def __init__(self, **kwargs): """ If you have trouble downloading HuggingFace models, -_^ this might help!! For Linux: export HF_ENDPOINT=https://hf-mirror.com For Windows: Good luck ^_- """ self.ocr = OCR() self.parallel_limiter = None if PARALLEL_DEVICES is not None and PARALLEL_DEVICES > 1: self.parallel_limiter = [trio.CapacityLimiter(1) for _ in range(PARALLEL_DEVICES)] if hasattr(self, "model_speciess"): self.layouter = LayoutRecognizer("layout." + self.model_speciess) else: self.layouter = LayoutRecognizer("layout") self.tbl_det = TableStructureRecognizer() self.updown_cnt_mdl = xgb.Booster() if not settings.LIGHTEN: try: import torch.cuda if torch.cuda.is_available(): self.updown_cnt_mdl.set_param({"device": "cuda"}) except Exception: logging.exception("RAGFlowPdfParser __init__") try: model_dir = os.path.join( get_project_base_directory(), "rag/res/deepdoc") self.updown_cnt_mdl.load_model(os.path.join( model_dir, "updown_concat_xgb.model")) except Exception: model_dir = snapshot_download( repo_id="InfiniFlow/text_concat_xgb_v1.0", local_dir=os.path.join(get_project_base_directory(), "rag/res/deepdoc"), local_dir_use_symlinks=False) self.updown_cnt_mdl.load_model(os.path.join( model_dir, "updown_concat_xgb.model")) self.page_from = 0 def __char_width(self, c): return (c["x1"] - c["x0"]) // max(len(c["text"]), 1) def __height(self, c): return c["bottom"] - c["top"] def _x_dis(self, a, b): return min(abs(a["x1"] - b["x0"]), abs(a["x0"] - b["x1"]), abs(a["x0"] + a["x1"] - b["x0"] - b["x1"]) / 2) def _y_dis( self, a, b): return ( b["top"] + b["bottom"] - a["top"] - a["bottom"]) / 2 def _match_proj(self, b): proj_patt = [ r"第[零一二三四五六七八九十百]+章", r"第[零一二三四五六七八九十百]+[条节]", r"[零一二三四五六七八九十百]+[、是  ]", r"[\((][零一二三四五六七八九十百]+[)\)]", r"[\((][0-9]+[)\)]", r"[0-9]+(、|\.[  ]|)|\.[^0-9./a-zA-Z_%><-]{4,})", r"[0-9]+\.[0-9.]+(、|\.[  ])", r"[⚫•➢①② ]", ] return any([re.match(p, b["text"]) for p in proj_patt]) def _updown_concat_features(self, up, down): w = max(self.__char_width(up), self.__char_width(down)) h = max(self.__height(up), self.__height(down)) y_dis = self._y_dis(up, down) LEN = 6 tks_down = rag_tokenizer.tokenize(down["text"][:LEN]).split() tks_up = rag_tokenizer.tokenize(up["text"][-LEN:]).split() tks_all = up["text"][-LEN:].strip() \ + (" " if re.match(r"[a-zA-Z0-9]+", up["text"][-1] + down["text"][0]) else "") \ + down["text"][:LEN].strip() tks_all = rag_tokenizer.tokenize(tks_all).split() fea = [ up.get("R", -1) == down.get("R", -1), y_dis / h, down["page_number"] - up["page_number"], up["layout_type"] == down["layout_type"], up["layout_type"] == "text", down["layout_type"] == "text", up["layout_type"] == "table", down["layout_type"] == "table", True if re.search( r"([。?!;!?;+))]|[a-z]\.)$", up["text"]) else False, True if re.search(r"[,:‘“、0-9(+-]$", up["text"]) else False, True if re.search( r"(^.?[/,?;:\],。;:’”?!》】)-])", down["text"]) else False, True if re.match(r"[\((][^\(\)()]+[)\)]$", up["text"]) else False, True if re.search(r"[,,][^。.]+$", up["text"]) else False, True if re.search(r"[,,][^。.]+$", up["text"]) else False, True if re.search(r"[\((][^\))]+$", up["text"]) and re.search(r"[\))]", down["text"]) else False, self._match_proj(down), True if re.match(r"[A-Z]", down["text"]) else False, True if re.match(r"[A-Z]", up["text"][-1]) else False, True if re.match(r"[a-z0-9]", up["text"][-1]) else False, True if re.match(r"[0-9.%,-]+$", down["text"]) else False, up["text"].strip()[-2:] == down["text"].strip()[-2:] if len(up["text"].strip() ) > 1 and len( down["text"].strip()) > 1 else False, up["x0"] > down["x1"], abs(self.__height(up) - self.__height(down)) / min(self.__height(up), self.__height(down)), self._x_dis(up, down) / max(w, 0.000001), (len(up["text"]) - len(down["text"])) / max(len(up["text"]), len(down["text"])), len(tks_all) - len(tks_up) - len(tks_down), len(tks_down) - len(tks_up), tks_down[-1] == tks_up[-1] if tks_down and tks_up else False, max(down["in_row"], up["in_row"]), abs(down["in_row"] - up["in_row"]), len(tks_down) == 1 and rag_tokenizer.tag(tks_down[0]).find("n") >= 0, len(tks_up) == 1 and rag_tokenizer.tag(tks_up[0]).find("n") >= 0 ] return fea @staticmethod def sort_X_by_page(arr, threashold): # sort using y1 first and then x1 arr = sorted(arr, key=lambda r: (r["page_number"], r["x0"], r["top"])) for i in range(len(arr) - 1): for j in range(i, -1, -1): # restore the order using th if abs(arr[j + 1]["x0"] - arr[j]["x0"]) < threashold \ and arr[j + 1]["top"] < arr[j]["top"] \ and arr[j + 1]["page_number"] == arr[j]["page_number"]: tmp = arr[j] arr[j] = arr[j + 1] arr[j + 1] = tmp return arr def _has_color(self, o): if o.get("ncs", "") == "DeviceGray": if o["stroking_color"] and o["stroking_color"][0] == 1 and o["non_stroking_color"] and \ o["non_stroking_color"][0] == 1: if re.match(r"[a-zT_\[\]\(\)-]+", o.get("text", "")): return False return True def _table_transformer_job(self, ZM): logging.debug("Table processing...") imgs, pos = [], [] tbcnt = [0] MARGIN = 10 self.tb_cpns = [] assert len(self.page_layout) == len(self.page_images) for p, tbls in enumerate(self.page_layout): # for page tbls = [f for f in tbls if f["type"] == "table"] tbcnt.append(len(tbls)) if not tbls: continue for tb in tbls: # for table left, top, right, bott = tb["x0"] - MARGIN, tb["top"] - MARGIN, \ tb["x1"] + MARGIN, tb["bottom"] + MARGIN left *= ZM top *= ZM right *= ZM bott *= ZM pos.append((left, top)) imgs.append(self.page_images[p].crop((left, top, right, bott))) assert len(self.page_images) == len(tbcnt) - 1 if not imgs: return recos = self.tbl_det(imgs) tbcnt = np.cumsum(tbcnt) for i in range(len(tbcnt) - 1): # for page pg = [] for j, tb_items in enumerate( recos[tbcnt[i]: tbcnt[i + 1]]): # for table poss = pos[tbcnt[i]: tbcnt[i + 1]] for it in tb_items: # for table components it["x0"] = (it["x0"] + poss[j][0]) it["x1"] = (it["x1"] + poss[j][0]) it["top"] = (it["top"] + poss[j][1]) it["bottom"] = (it["bottom"] + poss[j][1]) for n in ["x0", "x1", "top", "bottom"]: it[n] /= ZM it["top"] += self.page_cum_height[i] it["bottom"] += self.page_cum_height[i] it["pn"] = i it["layoutno"] = j pg.append(it) self.tb_cpns.extend(pg) def gather(kwd, fzy=10, ption=0.6): eles = Recognizer.sort_Y_firstly( [r for r in self.tb_cpns if re.match(kwd, r["label"])], fzy) eles = Recognizer.layouts_cleanup(self.boxes, eles, 5, ption) return Recognizer.sort_Y_firstly(eles, 0) # add R,H,C,SP tag to boxes within table layout headers = gather(r".*header$") rows = gather(r".* (row|header)") spans = gather(r".*spanning") clmns = sorted([r for r in self.tb_cpns if re.match( r"table column$", r["label"])], key=lambda x: (x["pn"], x["layoutno"], x["x0"])) clmns = Recognizer.layouts_cleanup(self.boxes, clmns, 5, 0.5) for b in self.boxes: if b.get("layout_type", "") != "table": continue ii = Recognizer.find_overlapped_with_threashold(b, rows, thr=0.3) if ii is not None: b["R"] = ii b["R_top"] = rows[ii]["top"] b["R_bott"] = rows[ii]["bottom"] ii = Recognizer.find_overlapped_with_threashold( b, headers, thr=0.3) if ii is not None: b["H_top"] = headers[ii]["top"] b["H_bott"] = headers[ii]["bottom"] b["H_left"] = headers[ii]["x0"] b["H_right"] = headers[ii]["x1"] b["H"] = ii ii = Recognizer.find_horizontally_tightest_fit(b, clmns) if ii is not None: b["C"] = ii b["C_left"] = clmns[ii]["x0"] b["C_right"] = clmns[ii]["x1"] ii = Recognizer.find_overlapped_with_threashold(b, spans, thr=0.3) if ii is not None: b["H_top"] = spans[ii]["top"] b["H_bott"] = spans[ii]["bottom"] b["H_left"] = spans[ii]["x0"] b["H_right"] = spans[ii]["x1"] b["SP"] = ii def __ocr(self, pagenum, img, chars, ZM=3, device_id: int | None = None): start = timer() bxs = self.ocr.detect(np.array(img), device_id) logging.info(f"__ocr detecting boxes of a image cost ({timer() - start}s)") start = timer() if not bxs: self.boxes.append([]) return bxs = [(line[0], line[1][0]) for line in bxs] bxs = Recognizer.sort_Y_firstly( [{"x0": b[0][0] / ZM, "x1": b[1][0] / ZM, "top": b[0][1] / ZM, "text": "", "txt": t, "bottom": b[-1][1] / ZM, "page_number": pagenum} for b, t in bxs if b[0][0] <= b[1][0] and b[0][1] <= b[-1][1]], self.mean_height[-1] / 3 ) # merge chars in the same rect for c in Recognizer.sort_Y_firstly( chars, self.mean_height[pagenum - 1] // 4): ii = Recognizer.find_overlapped(c, bxs) if ii is None: self.lefted_chars.append(c) continue ch = c["bottom"] - c["top"] bh = bxs[ii]["bottom"] - bxs[ii]["top"] if abs(ch - bh) / max(ch, bh) >= 0.7 and c["text"] != ' ': self.lefted_chars.append(c) continue if c["text"] == " " and bxs[ii]["text"]: if re.match(r"[0-9a-zA--яА-Я,.?;:!%%]", bxs[ii]["text"][-1]): bxs[ii]["text"] += " " else: bxs[ii]["text"] += c["text"] logging.info(f"__ocr sorting {len(chars)} chars cost {timer() - start}s") start = timer() boxes_to_reg = [] img_np = np.array(img) for b in bxs: if not b["text"]: left, right, top, bott = b["x0"] * ZM, b["x1"] * \ ZM, b["top"] * ZM, b["bottom"] * ZM b["box_image"] = self.ocr.get_rotate_crop_image(img_np, np.array([[left, top], [right, top], [right, bott], [left, bott]], dtype=np.float32)) boxes_to_reg.append(b) del b["txt"] texts = self.ocr.recognize_batch([b["box_image"] for b in boxes_to_reg], device_id) for i in range(len(boxes_to_reg)): boxes_to_reg[i]["text"] = texts[i] del boxes_to_reg[i]["box_image"] logging.info(f"__ocr recognize {len(bxs)} boxes cost {timer() - start}s") bxs = [b for b in bxs if b["text"]] if self.mean_height[-1] == 0: self.mean_height[-1] = np.median([b["bottom"] - b["top"] for b in bxs]) self.boxes.append(bxs) def _layouts_rec(self, ZM, drop=True): assert len(self.page_images) == len(self.boxes) self.boxes, self.page_layout = self.layouter( self.page_images, self.boxes, ZM, drop=drop) # cumlative Y for i in range(len(self.boxes)): self.boxes[i]["top"] += \ self.page_cum_height[self.boxes[i]["page_number"] - 1] self.boxes[i]["bottom"] += \ self.page_cum_height[self.boxes[i]["page_number"] - 1] def _text_merge(self): # merge adjusted boxes bxs = self.boxes def end_with(b, txt): txt = txt.strip() tt = b.get("text", "").strip() return tt and tt.find(txt) == len(tt) - len(txt) def start_with(b, txts): tt = b.get("text", "").strip() return tt and any([tt.find(t.strip()) == 0 for t in txts]) # horizontally merge adjacent box with the same layout i = 0 while i < len(bxs) - 1: b = bxs[i] b_ = bxs[i + 1] if b.get("layoutno", "0") != b_.get("layoutno", "1") or b.get("layout_type", "") in ["table", "figure", "equation"]: i += 1 continue if abs(self._y_dis(b, b_) ) < self.mean_height[bxs[i]["page_number"] - 1] / 3: # merge bxs[i]["x1"] = b_["x1"] bxs[i]["top"] = (b["top"] + b_["top"]) / 2 bxs[i]["bottom"] = (b["bottom"] + b_["bottom"]) / 2 bxs[i]["text"] += b_["text"] bxs.pop(i + 1) continue i += 1 continue dis_thr = 1 dis = b["x1"] - b_["x0"] if b.get("layout_type", "") != "text" or b_.get( "layout_type", "") != "text": if end_with(b, ",") or start_with(b_, "(,"): dis_thr = -8 else: i += 1 continue if abs(self._y_dis(b, b_)) < self.mean_height[bxs[i]["page_number"] - 1] / 5 \ and dis >= dis_thr and b["x1"] < b_["x1"]: # merge bxs[i]["x1"] = b_["x1"] bxs[i]["top"] = (b["top"] + b_["top"]) / 2 bxs[i]["bottom"] = (b["bottom"] + b_["bottom"]) / 2 bxs[i]["text"] += b_["text"] bxs.pop(i + 1) continue i += 1 self.boxes = bxs def _naive_vertical_merge(self): bxs = Recognizer.sort_Y_firstly( self.boxes, np.median( self.mean_height) / 3) i = 0 while i + 1 < len(bxs): b = bxs[i] b_ = bxs[i + 1] if b["page_number"] < b_["page_number"] and re.match( r"[0-9 •一—-]+$", b["text"]): bxs.pop(i) continue if not b["text"].strip(): bxs.pop(i) continue concatting_feats = [ b["text"].strip()[-1] in ",;:'\",、‘“;:-", len(b["text"].strip()) > 1 and b["text"].strip( )[-2] in ",;:'\",‘“、;:", b_["text"].strip() and b_["text"].strip()[0] in "。;?!?”)),,、:", ] # features for not concating feats = [ b.get("layoutno", 0) != b_.get("layoutno", 0), b["text"].strip()[-1] in "。?!?", self.is_english and b["text"].strip()[-1] in ".!?", b["page_number"] == b_["page_number"] and b_["top"] - b["bottom"] > self.mean_height[b["page_number"] - 1] * 1.5, b["page_number"] < b_["page_number"] and abs( b["x0"] - b_["x0"]) > self.mean_width[b["page_number"] - 1] * 4, ] # split features detach_feats = [b["x1"] < b_["x0"], b["x0"] > b_["x1"]] if (any(feats) and not any(concatting_feats)) or any(detach_feats): logging.debug("{} {} {} {}".format( b["text"], b_["text"], any(feats), any(concatting_feats), )) i += 1 continue # merge up and down b["bottom"] = b_["bottom"] b["text"] += b_["text"] b["x0"] = min(b["x0"], b_["x0"]) b["x1"] = max(b["x1"], b_["x1"]) bxs.pop(i + 1) self.boxes = bxs def _concat_downward(self, concat_between_pages=True): # count boxes in the same row as a feature for i in range(len(self.boxes)): mh = self.mean_height[self.boxes[i]["page_number"] - 1] self.boxes[i]["in_row"] = 0 j = max(0, i - 12) while j < min(i + 12, len(self.boxes)): if j == i: j += 1 continue ydis = self._y_dis(self.boxes[i], self.boxes[j]) / mh if abs(ydis) < 1: self.boxes[i]["in_row"] += 1 elif ydis > 0: break j += 1 # concat between rows boxes = deepcopy(self.boxes) blocks = [] while boxes: chunks = [] def dfs(up, dp): chunks.append(up) i = dp while i < min(dp + 12, len(boxes)): ydis = self._y_dis(up, boxes[i]) smpg = up["page_number"] == boxes[i]["page_number"] mh = self.mean_height[up["page_number"] - 1] mw = self.mean_width[up["page_number"] - 1] if smpg and ydis > mh * 4: break if not smpg and ydis > mh * 16: break down = boxes[i] if not concat_between_pages and down["page_number"] > up["page_number"]: break if up.get("R", "") != down.get( "R", "") and up["text"][-1] != ",": i += 1 continue if re.match(r"[0-9]{2,3}/[0-9]{3}$", up["text"]) \ or re.match(r"[0-9]{2,3}/[0-9]{3}$", down["text"]) \ or not down["text"].strip(): i += 1 continue if not down["text"].strip() or not up["text"].strip(): i += 1 continue if up["x1"] < down["x0"] - 10 * \ mw or up["x0"] > down["x1"] + 10 * mw: i += 1 continue if i - dp < 5 and up.get("layout_type") == "text": if up.get("layoutno", "1") == down.get( "layoutno", "2"): dfs(down, i + 1) boxes.pop(i) return i += 1 continue fea = self._updown_concat_features(up, down) if self.updown_cnt_mdl.predict( xgb.DMatrix([fea]))[0] <= 0.5: i += 1 continue dfs(down, i + 1) boxes.pop(i) return dfs(boxes[0], 1) boxes.pop(0) if chunks: blocks.append(chunks) # concat within each block boxes = [] for b in blocks: if len(b) == 1: boxes.append(b[0]) continue t = b[0] for c in b[1:]: t["text"] = t["text"].strip() c["text"] = c["text"].strip() if not c["text"]: continue if t["text"] and re.match( r"[0-9\.a-zA-Z]+$", t["text"][-1] + c["text"][-1]): t["text"] += " " t["text"] += c["text"] t["x0"] = min(t["x0"], c["x0"]) t["x1"] = max(t["x1"], c["x1"]) t["page_number"] = min(t["page_number"], c["page_number"]) t["bottom"] = c["bottom"] if not t["layout_type"] \ and c["layout_type"]: t["layout_type"] = c["layout_type"] boxes.append(t) self.boxes = Recognizer.sort_Y_firstly(boxes, 0) def _filter_forpages(self): if not self.boxes: return findit = False i = 0 while i < len(self.boxes): if not re.match(r"(contents|目录|目次|table of contents|致谢|acknowledge)$", re.sub(r"( | |\u3000)+", "", self.boxes[i]["text"].lower())): i += 1 continue findit = True eng = re.match( r"[0-9a-zA-Z :'.-]{5,}", self.boxes[i]["text"].strip()) self.boxes.pop(i) if i >= len(self.boxes): break prefix = self.boxes[i]["text"].strip()[:3] if not eng else " ".join( self.boxes[i]["text"].strip().split()[:2]) while not prefix: self.boxes.pop(i) if i >= len(self.boxes): break prefix = self.boxes[i]["text"].strip()[:3] if not eng else " ".join( self.boxes[i]["text"].strip().split()[:2]) self.boxes.pop(i) if i >= len(self.boxes) or not prefix: break for j in range(i, min(i + 128, len(self.boxes))): if not re.match(prefix, self.boxes[j]["text"]): continue for k in range(i, j): self.boxes.pop(i) break if findit: return page_dirty = [0] * len(self.page_images) for b in self.boxes: if re.search(r"(··|··|··)", b["text"]): page_dirty[b["page_number"] - 1] += 1 page_dirty = set([i + 1 for i, t in enumerate(page_dirty) if t > 3]) if not page_dirty: return i = 0 while i < len(self.boxes): if self.boxes[i]["page_number"] in page_dirty: self.boxes.pop(i) continue i += 1 def _merge_with_same_bullet(self): i = 0 while i + 1 < len(self.boxes): b = self.boxes[i] b_ = self.boxes[i + 1] if not b["text"].strip(): self.boxes.pop(i) continue if not b_["text"].strip(): self.boxes.pop(i + 1) continue if b["text"].strip()[0] != b_["text"].strip()[0] \ or b["text"].strip()[0].lower() in set("qwertyuopasdfghjklzxcvbnm") \ or rag_tokenizer.is_chinese(b["text"].strip()[0]) \ or b["top"] > b_["bottom"]: i += 1 continue b_["text"] = b["text"] + "\n" + b_["text"] b_["x0"] = min(b["x0"], b_["x0"]) b_["x1"] = max(b["x1"], b_["x1"]) b_["top"] = b["top"] self.boxes.pop(i) def _extract_table_figure(self, need_image, ZM, return_html, need_position, separate_tables_figures=False): tables = {} figures = {} # extract figure and table boxes i = 0 lst_lout_no = "" nomerge_lout_no = [] while i < len(self.boxes): if "layoutno" not in self.boxes[i]: i += 1 continue lout_no = str(self.boxes[i]["page_number"]) + \ "-" + str(self.boxes[i]["layoutno"]) if TableStructureRecognizer.is_caption(self.boxes[i]) or self.boxes[i]["layout_type"] in ["table caption", "title", "figure caption", "reference"]: nomerge_lout_no.append(lst_lout_no) if self.boxes[i]["layout_type"] == "table": if re.match(r"(数据|资料|图表)*来源[:: ]", self.boxes[i]["text"]): self.boxes.pop(i) continue if lout_no not in tables: tables[lout_no] = [] tables[lout_no].append(self.boxes[i]) self.boxes.pop(i) lst_lout_no = lout_no continue if need_image and self.boxes[i]["layout_type"] == "figure": if re.match(r"(数据|资料|图表)*来源[:: ]", self.boxes[i]["text"]): self.boxes.pop(i) continue if lout_no not in figures: figures[lout_no] = [] figures[lout_no].append(self.boxes[i]) self.boxes.pop(i) lst_lout_no = lout_no continue i += 1 # merge table on different pages nomerge_lout_no = set(nomerge_lout_no) tbls = sorted([(k, bxs) for k, bxs in tables.items()], key=lambda x: (x[1][0]["top"], x[1][0]["x0"])) i = len(tbls) - 1 while i - 1 >= 0: k0, bxs0 = tbls[i - 1] k, bxs = tbls[i] i -= 1 if k0 in nomerge_lout_no: continue if bxs[0]["page_number"] == bxs0[0]["page_number"]: continue if bxs[0]["page_number"] - bxs0[0]["page_number"] > 1: continue mh = self.mean_height[bxs[0]["page_number"] - 1] if self._y_dis(bxs0[-1], bxs[0]) > mh * 23: continue tables[k0].extend(tables[k]) del tables[k] def x_overlapped(a, b): return not any([a["x1"] < b["x0"], a["x0"] > b["x1"]]) # find captions and pop out i = 0 while i < len(self.boxes): c = self.boxes[i] # mh = self.mean_height[c["page_number"]-1] if not TableStructureRecognizer.is_caption(c): i += 1 continue # find the nearest layouts def nearest(tbls): nonlocal c mink = "" minv = 1000000000 for k, bxs in tbls.items(): for b in bxs: if b.get("layout_type", "").find("caption") >= 0: continue y_dis = self._y_dis(c, b) x_dis = self._x_dis( c, b) if not x_overlapped( c, b) else 0 dis = y_dis * y_dis + x_dis * x_dis if dis < minv: mink = k minv = dis return mink, minv tk, tv = nearest(tables) fk, fv = nearest(figures) # if min(tv, fv) > 2000: # i += 1 # continue if tv < fv and tk: tables[tk].insert(0, c) logging.debug( "TABLE:" + self.boxes[i]["text"] + "; Cap: " + tk) elif fk: figures[fk].insert(0, c) logging.debug( "FIGURE:" + self.boxes[i]["text"] + "; Cap: " + tk) self.boxes.pop(i) def cropout(bxs, ltype, poss): nonlocal ZM pn = set([b["page_number"] - 1 for b in bxs]) if len(pn) < 2: pn = list(pn)[0] ht = self.page_cum_height[pn] b = { "x0": np.min([b["x0"] for b in bxs]), "top": np.min([b["top"] for b in bxs]) - ht, "x1": np.max([b["x1"] for b in bxs]), "bottom": np.max([b["bottom"] for b in bxs]) - ht } louts = [layout for layout in self.page_layout[pn] if layout["type"] == ltype] ii = Recognizer.find_overlapped(b, louts, naive=True) if ii is not None: b = louts[ii] else: logging.warning( f"Missing layout match: {pn + 1},%s" % (bxs[0].get( "layoutno", ""))) left, top, right, bott = b["x0"], b["top"], b["x1"], b["bottom"] if right < left: right = left + 1 poss.append((pn + self.page_from, left, right, top, bott)) return self.page_images[pn] \ .crop((left * ZM, top * ZM, right * ZM, bott * ZM)) pn = {} for b in bxs: p = b["page_number"] - 1 if p not in pn: pn[p] = [] pn[p].append(b) pn = sorted(pn.items(), key=lambda x: x[0]) imgs = [cropout(arr, ltype, poss) for p, arr in pn] pic = Image.new("RGB", (int(np.max([i.size[0] for i in imgs])), int(np.sum([m.size[1] for m in imgs]))), (245, 245, 245)) height = 0 for img in imgs: pic.paste(img, (0, int(height))) height += img.size[1] return pic res = [] positions = [] figure_results = [] figure_positions = [] # crop figure out and add caption for k, bxs in figures.items(): txt = "\n".join([b["text"] for b in bxs]) if not txt: continue poss = [] if separate_tables_figures: figure_results.append( (cropout( bxs, "figure", poss), [txt])) figure_positions.append(poss) else: res.append( (cropout( bxs, "figure", poss), [txt])) positions.append(poss) for k, bxs in tables.items(): if not bxs: continue bxs = Recognizer.sort_Y_firstly(bxs, np.mean( [(b["bottom"] - b["top"]) / 2 for b in bxs])) poss = [] res.append((cropout(bxs, "table", poss), self.tbl_det.construct_table(bxs, html=return_html, is_english=self.is_english))) positions.append(poss) if separate_tables_figures: assert len(positions) + len(figure_positions) == len(res) + len(figure_results) if need_position: return list(zip(res, positions)), list(zip(figure_results, figure_positions)) else: return res, figure_results else: assert len(positions) == len(res) if need_position: return list(zip(res, positions)) else: return res def proj_match(self, line): if len(line) <= 2: return if re.match(r"[0-9 ().,%%+/-]+$", line): return False for p, j in [ (r"第[零一二三四五六七八九十百]+章", 1), (r"第[零一二三四五六七八九十百]+[条节]", 2), (r"[零一二三四五六七八九十百]+[、  ]", 3), (r"[\((][零一二三四五六七八九十百]+[)\)]", 4), (r"[0-9]+(、|\.[  ]|\.[^0-9])", 5), (r"[0-9]+\.[0-9]+(、|[.  ]|[^0-9])", 6), (r"[0-9]+\.[0-9]+\.[0-9]+(、|[  ]|[^0-9])", 7), (r"[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+(、|[  ]|[^0-9])", 8), (r".{,48}[::??]$", 9), (r"[0-9]+)", 10), (r"[\((][0-9]+[)\)]", 11), (r"[零一二三四五六七八九十百]+是", 12), (r"[⚫•➢✓]", 12) ]: if re.match(p, line): return j return def _line_tag(self, bx, ZM): pn = [bx["page_number"]] top = bx["top"] - self.page_cum_height[pn[0] - 1] bott = bx["bottom"] - self.page_cum_height[pn[0] - 1] page_images_cnt = len(self.page_images) if pn[-1] - 1 >= page_images_cnt: return "" while bott * ZM > self.page_images[pn[-1] - 1].size[1]: bott -= self.page_images[pn[-1] - 1].size[1] / ZM pn.append(pn[-1] + 1) if pn[-1] - 1 >= page_images_cnt: return "" return "@@{}\t{:.1f}\t{:.1f}\t{:.1f}\t{:.1f}##" \ .format("-".join([str(p) for p in pn]), bx["x0"], bx["x1"], top, bott) def __filterout_scraps(self, boxes, ZM): def width(b): return b["x1"] - b["x0"] def height(b): return b["bottom"] - b["top"] def usefull(b): if b.get("layout_type"): return True if width( b) > self.page_images[b["page_number"] - 1].size[0] / ZM / 3: return True if b["bottom"] - b["top"] > self.mean_height[b["page_number"] - 1]: return True return False res = [] while boxes: lines = [] widths = [] pw = self.page_images[boxes[0]["page_number"] - 1].size[0] / ZM mh = self.mean_height[boxes[0]["page_number"] - 1] mj = self.proj_match( boxes[0]["text"]) or boxes[0].get( "layout_type", "") == "title" def dfs(line, st): nonlocal mh, pw, lines, widths lines.append(line) widths.append(width(line)) mmj = self.proj_match( line["text"]) or line.get( "layout_type", "") == "title" for i in range(st + 1, min(st + 20, len(boxes))): if (boxes[i]["page_number"] - line["page_number"]) > 0: break if not mmj and self._y_dis( line, boxes[i]) >= 3 * mh and height(line) < 1.5 * mh: break if not usefull(boxes[i]): continue if mmj or \ (self._x_dis(boxes[i], line) < pw / 10): \ # and abs(width(boxes[i])-width_mean)/max(width(boxes[i]),width_mean)<0.5): # concat following dfs(boxes[i], i) boxes.pop(i) break try: if usefull(boxes[0]): dfs(boxes[0], 0) else: logging.debug("WASTE: " + boxes[0]["text"]) except Exception: pass boxes.pop(0) mw = np.mean(widths) if mj or mw / pw >= 0.35 or mw > 200: res.append( "\n".join([c["text"] + self._line_tag(c, ZM) for c in lines])) else: logging.debug("REMOVED: " + "<<".join([c["text"] for c in lines])) return "\n\n".join(res) @staticmethod def total_page_number(fnm, binary=None): try: with sys.modules[LOCK_KEY_pdfplumber]: pdf = pdfplumber.open( fnm) if not binary else pdfplumber.open(BytesIO(binary)) total_page = len(pdf.pages) pdf.close() return total_page except Exception: logging.exception("total_page_number") def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None): self.lefted_chars = [] self.mean_height = [] self.mean_width = [] self.boxes = [] self.garbages = {} self.page_cum_height = [0] self.page_layout = [] self.page_from = page_from start = timer() try: with sys.modules[LOCK_KEY_pdfplumber]: with (pdfplumber.open(fnm) if isinstance(fnm, str) else pdfplumber.open(BytesIO(fnm))) as pdf: self.pdf = pdf self.page_images = [p.to_image(resolution=72 * zoomin).annotated for i, p in enumerate(self.pdf.pages[page_from:page_to])] try: self.page_chars = [[c for c in page.dedupe_chars().chars if self._has_color(c)] for page in self.pdf.pages[page_from:page_to]] except Exception as e: logging.warning(f"Failed to extract characters for pages {page_from}-{page_to}: {str(e)}") self.page_chars = [[] for _ in range(page_to - page_from)] # If failed to extract, using empty list instead. self.total_page = len(self.pdf.pages) except Exception: logging.exception("RAGFlowPdfParser __images__") logging.info(f"__images__ dedupe_chars cost {timer() - start}s") self.outlines = [] try: with (pdf2_read(fnm if isinstance(fnm, str) else BytesIO(fnm))) as pdf: self.pdf = pdf outlines = self.pdf.outline def dfs(arr, depth): for a in arr: if isinstance(a, dict): self.outlines.append((a["/Title"], depth)) continue dfs(a, depth + 1) dfs(outlines, 0) except Exception as e: logging.warning(f"Outlines exception: {e}") if not self.outlines: logging.warning("Miss outlines") logging.debug("Images converted.") self.is_english = [re.search(r"[a-zA-Z0-9,/¸;:'\[\]\(\)!@#$%^&*\"?<>._-]{30,}", "".join( random.choices([c["text"] for c in self.page_chars[i]], k=min(100, len(self.page_chars[i]))))) for i in range(len(self.page_chars))] if sum([1 if e else 0 for e in self.is_english]) > len( self.page_images) / 2: self.is_english = True else: self.is_english = False async def __img_ocr(i, id, img, chars, limiter): j = 0 while j + 1 < len(chars): if chars[j]["text"] and chars[j + 1]["text"] \ and re.match(r"[0-9a-zA-Z,.:;!%]+", chars[j]["text"] + chars[j + 1]["text"]) \ and chars[j + 1]["x0"] - chars[j]["x1"] >= min(chars[j + 1]["width"], chars[j]["width"]) / 2: chars[j]["text"] += " " j += 1 if limiter: async with limiter: await trio.to_thread.run_sync(lambda: self.__ocr(i + 1, img, chars, zoomin, id)) else: self.__ocr(i + 1, img, chars, zoomin, id) if callback and i % 6 == 5: callback(prog=(i + 1) * 0.6 / len(self.page_images), msg="") async def __img_ocr_launcher(): def __ocr_preprocess(): chars = self.page_chars[i] if not self.is_english else [] self.mean_height.append( np.median(sorted([c["height"] for c in chars])) if chars else 0 ) self.mean_width.append( np.median(sorted([c["width"] for c in chars])) if chars else 8 ) self.page_cum_height.append(img.size[1] / zoomin) return chars if self.parallel_limiter: async with trio.open_nursery() as nursery: for i, img in enumerate(self.page_images): chars = __ocr_preprocess() nursery.start_soon(__img_ocr, i, i % PARALLEL_DEVICES, img, chars, self.parallel_limiter[i % PARALLEL_DEVICES]) await trio.sleep(0.1) else: for i, img in enumerate(self.page_images): chars = __ocr_preprocess() await __img_ocr(i, 0, img, chars, None) start = timer() trio.run(__img_ocr_launcher) logging.info(f"__images__ {len(self.page_images)} pages cost {timer() - start}s") if not self.is_english and not any( [c for c in self.page_chars]) and self.boxes: bxes = [b for bxs in self.boxes for b in bxs] self.is_english = re.search(r"[\na-zA-Z0-9,/¸;:'\[\]\(\)!@#$%^&*\"?<>._-]{30,}", "".join([b["text"] for b in random.choices(bxes, k=min(30, len(bxes)))])) logging.debug("Is it English:", self.is_english) self.page_cum_height = np.cumsum(self.page_cum_height) assert len(self.page_cum_height) == len(self.page_images) + 1 if len(self.boxes) == 0 and zoomin < 9: self.__images__(fnm, zoomin * 3, page_from, page_to, callback) def __call__(self, fnm, need_image=True, zoomin=3, return_html=False): self.__images__(fnm, zoomin) self._layouts_rec(zoomin) self._table_transformer_job(zoomin) self._text_merge() self._concat_downward() self._filter_forpages() tbls = self._extract_table_figure( need_image, zoomin, return_html, False) return self.__filterout_scraps(deepcopy(self.boxes), zoomin), tbls def remove_tag(self, txt): return re.sub(r"@@[\t0-9.-]+?##", "", txt) def crop(self, text, ZM=3, need_position=False): imgs = [] poss = [] for tag in re.findall(r"@@[0-9-]+\t[0-9.\t]+##", text): pn, left, right, top, bottom = tag.strip( "#").strip("@").split("\t") left, right, top, bottom = float(left), float( right), float(top), float(bottom) poss.append(([int(p) - 1 for p in pn.split("-")], left, right, top, bottom)) if not poss: if need_position: return None, None return max_width = max( np.max([right - left for (_, left, right, _, _) in poss]), 6) GAP = 6 pos = poss[0] poss.insert(0, ([pos[0][0]], pos[1], pos[2], max( 0, pos[3] - 120), max(pos[3] - GAP, 0))) pos = poss[-1] poss.append(([pos[0][-1]], pos[1], pos[2], min(self.page_images[pos[0][-1]].size[1] / ZM, pos[4] + GAP), min(self.page_images[pos[0][-1]].size[1] / ZM, pos[4] + 120))) positions = [] for ii, (pns, left, right, top, bottom) in enumerate(poss): right = left + max_width bottom *= ZM for pn in pns[1:]: bottom += self.page_images[pn - 1].size[1] imgs.append( self.page_images[pns[0]].crop((left * ZM, top * ZM, right * ZM, min( bottom, self.page_images[pns[0]].size[1]) )) ) if 0 < ii < len(poss) - 1: positions.append((pns[0] + self.page_from, left, right, top, min( bottom, self.page_images[pns[0]].size[1]) / ZM)) bottom -= self.page_images[pns[0]].size[1] for pn in pns[1:]: imgs.append( self.page_images[pn].crop((left * ZM, 0, right * ZM, min(bottom, self.page_images[pn].size[1]) )) ) if 0 < ii < len(poss) - 1: positions.append((pn + self.page_from, left, right, 0, min( bottom, self.page_images[pn].size[1]) / ZM)) bottom -= self.page_images[pn].size[1] if not imgs: if need_position: return None, None return height = 0 for img in imgs: height += img.size[1] + GAP height = int(height) width = int(np.max([i.size[0] for i in imgs])) pic = Image.new("RGB", (width, height), (245, 245, 245)) height = 0 for ii, img in enumerate(imgs): if ii == 0 or ii + 1 == len(imgs): img = img.convert('RGBA') overlay = Image.new('RGBA', img.size, (0, 0, 0, 0)) overlay.putalpha(128) img = Image.alpha_composite(img, overlay).convert("RGB") pic.paste(img, (0, int(height))) height += img.size[1] + GAP if need_position: return pic, positions return pic def get_position(self, bx, ZM): poss = [] pn = bx["page_number"] top = bx["top"] - self.page_cum_height[pn - 1] bott = bx["bottom"] - self.page_cum_height[pn - 1] poss.append((pn, bx["x0"], bx["x1"], top, min( bott, self.page_images[pn - 1].size[1] / ZM))) while bott * ZM > self.page_images[pn - 1].size[1]: bott -= self.page_images[pn - 1].size[1] / ZM top = 0 pn += 1 poss.append((pn, bx["x0"], bx["x1"], top, min( bott, self.page_images[pn - 1].size[1] / ZM))) return poss class PlainParser: def __call__(self, filename, from_page=0, to_page=100000, **kwargs): self.outlines = [] lines = [] try: self.pdf = pdf2_read( filename if isinstance( filename, str) else BytesIO(filename)) for page in self.pdf.pages[from_page:to_page]: lines.extend([t for t in page.extract_text().split("\n")]) outlines = self.pdf.outline def dfs(arr, depth): for a in arr: if isinstance(a, dict): self.outlines.append((a["/Title"], depth)) continue dfs(a, depth + 1) dfs(outlines, 0) except Exception: logging.exception("Outlines exception") if not self.outlines: logging.warning("Miss outlines") return [(line, "") for line in lines], [] def crop(self, ck, need_position): raise NotImplementedError @staticmethod def remove_tag(txt): raise NotImplementedError class VisionParser(RAGFlowPdfParser): def __init__(self, vision_model, *args, **kwargs): super().__init__(*args, **kwargs) self.vision_model = vision_model def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None): try: with sys.modules[LOCK_KEY_pdfplumber]: self.pdf = pdfplumber.open(fnm) if isinstance( fnm, str) else pdfplumber.open(BytesIO(fnm)) self.page_images = [p.to_image(resolution=72 * zoomin).annotated for i, p in enumerate(self.pdf.pages[page_from:page_to])] self.total_page = len(self.pdf.pages) except Exception: self.page_images = None self.total_page = 0 logging.exception("VisionParser __images__") def __call__(self, filename, from_page=0, to_page=100000, **kwargs): callback = kwargs.get("callback", lambda prog, msg: None) self.__images__(fnm=filename, zoomin=3, page_from=from_page, page_to=to_page, **kwargs) total_pdf_pages = self.total_page start_page = max(0, from_page) end_page = min(to_page, total_pdf_pages) all_docs = [] for idx, img_binary in enumerate(self.page_images or []): pdf_page_num = idx # 0-based if pdf_page_num < start_page or pdf_page_num >= end_page: continue docs = picture_vision_llm_chunk( binary=img_binary, vision_model=self.vision_model, prompt=vision_llm_describe_prompt(page=pdf_page_num+1), callback=callback, ) if docs: all_docs.append(docs) return [(doc, "") for doc in all_docs], [] if __name__ == "__main__": pass 改为import fitz # PyMuPDF def extract_vector_images(pdf_path): doc = fitz.open(pdf_path) image_data = [] for page_num in range(len(doc)): page = doc.load_page(page_num) for img_index, img in enumerate(page.get_images(full=True)): xref = img[0] base_image = doc.extract_image(xref) if base_image["ext"] == "svg": # 矢量图识别 svg_data = base_image["image"] image_data.append({ "page": page_num, "type": "vector", "data": svg_data }) return image_data 这个方式
09-29
开发者提供了一个脚本generate_ptm_coordinates.py,请问我要怎么运行? import numpy as np import pandas as pd from ExonPTMapper import mapping from ExonPTMapper import config as mapper_config from ptm_pose import pose_config from Bio import SeqIO import codecs import gzip import os import datetime import pyliftover from tqdm import tqdm def check_constitutive(ptm, nonconstitutive_list): """ For a given list of ptms, check if any of the ptms are found in the list of nonconstitutive ptms, meaning that they have previously been found to be missing from isoforms in Ensembl Parameters ---------- ptm : list List of PTMs to check (each ptm should be in the form of "UniProtID_ResiduePosition" (e.g. "P12345-1_Y100")) nonconstitutive_list : list List of PTMs that have been found to be nonconstitutive (based on data from ptm_info object generated by ExonPTMapper) """ if ptm in nonconstitutive_list: return False else: return True def extract_ptm_position_of_primary_isoform(row): """ For a given row in the ptm_coordinates dataframe, extract the position of the PTM in the primary isoform (canonical if available, else first isoform in list) """ pass def get_unique_ptm_info(ptm_coordinates): """ For a given row in the ptm_coordinates dataframe, isolate PTM entries corresponding to the canonical isoform ID, if available. If not, use the first isoform ID in the list. Further, indicate whether the entry corresponds to a canonical or alternative isoform. Parameters ---------- ptm_coordinates : pd.DataFrame DataFrame containing PTM coordinates, generated by ExonPTMapper package Returns ------- pd.DataFrame DataFrame containing PTM coordinates with additional columns indicating the UniProtKB accession number, the isoform ID, the type of isoform (canonical or alternative), the residue of the PTM, the position of the PTM in the isoform, and any alternative entries that also contain the PTM """ accession_list = [] #list of UniProtKB/Swiss-Prot accessions residue_list = [] position_list = [] isoform_list = [] isoform_type = [] for i,row in ptm_coordinates.iterrows(): ptm_entries = row['Source of PTM'].split(';') residue = ptm_entries[0].split('_')[1][0] #iterate through each PTM entry and extract any that are associated with a canonical isoform. This will usually only be one, but in rare cases a PTM may be associated with multiple genes found_in_canonical = False positions = [] accessions = [] isoform_entries = [] for ptm in ptm_entries: if ptm.split('_')[0] in mapper_config.canonical_isoIDs.values(): #check if the uniprot isoform ID is a canonical isoform. if so, add the position to the list of positions positions.append(ptm.split('_')[1][1:]) accessions.append(ptm.split('-')[0]) #uniprot accession number isoform_entries.append(ptm.split('_')[0]) #uniprot isoform ID found_in_canonical = True #indicate that there is a PTM found in the canonical isoform #check if position in canonical was found. If so, join the positions into a single string. If not, use the position associated with the first listed isoform if found_in_canonical: positions = ';'.join(positions) accessions = ';'.join(accessions) isoform_entries = ';'.join(isoform_entries) isoform_type.append('Canonical') else: positions = ptm_entries[0].split('_')[1][1:] accessions = ptm_entries[0].split('-')[0] isoform_entries = ptm_entries[0].split('_')[0] isoform_type.append('Alternative') accession_list.append(accessions) residue_list.append(residue) position_list.append(positions) isoform_list.append(isoform_entries) ptm_coordinates['UniProtKB Accession'] = accession_list ptm_coordinates['Isoform ID'] = isoform_list ptm_coordinates['Isoform Type'] = isoform_type ptm_coordinates['Residue'] = residue_list ptm_coordinates['PTM Position in Isoform'] = position_list return ptm_coordinates def convert_coordinates(ptm_coordinates, from_coord = 'hg38', to_coord = 'hg19'): """ Given the ptm_coordinates dataframe, convert the genomic location of the PTMs from one coordinate system to another (e.g. hg38 to hg19, hg19 to hg38, etc.) Parameters ---------- ptm_coordinates : pd.DataFrame DataFrame containing PTM coordinates from_coord : str Coordinate system to convert from (e.g. 'hg38', 'hg19', 'hg18') to_coord : str Coordinate system to convert to (e.g. 'hg38', 'hg19', 'hg18') """ # convert coordinates to hg19 and hg38 new_coords = [] liftover_object = pyliftover.LiftOver(f'{from_coord}',f'{to_coord}') for i, row in tqdm(ptm_coordinates.iterrows(), total = ptm_coordinates.shape[0], desc = f'Converting from {from_coord} to {to_coord} coordinates'): new_coords.append(mapping.convert_genomic_coordinates(row[f'Gene Location ({from_coord})'], row['Chromosome/scaffold name'], row['Strand'], from_type = f'{from_coord}', to_type = f'{to_coord}', liftover_object = liftover_object)) return new_coords residue_dict = {'R': 'arginine', 'H':'histidine', 'K':'lysine', 'D':'aspartic acid', 'E':'glutamic acid', 'S': 'serine', 'T':'threonine', 'N':'asparagine', 'Q':'glutamine', 'C':'cysteine', 'U':'selenocystein', 'G':'glycine', 'P':'proline', 'A':'alanine', 'V':'valine', 'I':'isoleucine', 'L':'leucine', 'M':'methionine', 'F':'phenylalanine', 'Y':'tyrosine', 'W':'tryptophan'} def convert_modification_shorthand(mod, res, mod_group_type = 'fine'): if mod_group_type == 'fine': if mod == 'p': mod_name = f'Phospho{residue_dict[res]}' elif mod == 'ub': mod_name = 'Ubiquitination' elif mod == 'sm': mod_name = 'Sumoylation' elif mod == 'ga': if res == 'N': mod_name = 'N-Glycosylation' else: #if you want a more general classification, this and T can be either just O-GalNAc or O-glycosylation mod_name = 'O-GalNAc ' + residue_dict[res].capitalize() elif mod == 'gl': mod_name = 'O-GlcNAc ' + residue_dict[res].capitalize() elif mod == 'm1' or mod == 'me': mod_name = 'Methylation' elif mod == 'm2': mod_name = 'Dimethylation' elif mod == 'm3': mod_name = 'Trimethylation' elif mod == 'ac': mod_name = 'Acetylation' else: raise ValueError("ERROR: don't recognize PTM type %s"%(type)) elif mod_group_type == 'coarse': if mod == 'p': mod_name = 'Phosphorylation' elif mod == 'ub': mod_name = 'Ubiquitination' elif mod == 'sm': mod_name = 'Sumoylation' elif mod == 'ga' or mod == 'gl': mod_name = 'Glycosylation' elif mod == 'm1' or mod == 'me' or mod == 'm2' or mod == 'm3': mod_name = 'Methylation' elif mod == 'ac': mod_name = 'Acetylation' return mod_name def process_PSP_df(file, compressed = False, organism = None, include_flank = False, include_domain = False, include_MS = False, include_LTP = False, mod_group_type = 'fine'): """ Process the PhosphositePlus file into a dataframe with the Uniprot ID, residue, position, and modification, and any extra columns that are requested Parameters ---------- file : str The file to read in compressed : bool If the file is compressed or not organism : str The organism to filter the data by include_flank : bool If True, include the flanking amino acids include_domain : bool If True, include the domain include_MS : bool If True, include the MS_LIT and MS_CST columns include_LTP : bool If True, include the LT column (low throughput) mod_group_type : str The type of modification grouping to use (fine or coarse) """ if compressed: df = pd.read_csv(file, sep='\t', skiprows=3, compression='gzip') else: df = pd.read_csv(file, sep='\t', skiprows=3) df['Residue'] = df['MOD_RSD'].apply(lambda x: x.split('-')[0][0]) df['Position'] = df['MOD_RSD'].apply(lambda x: int(x.split('-')[0][1:])) df['Modification'] = df['MOD_RSD'].str.split('-').str[1] df['Modification'] = df.apply(lambda x: convert_modification_shorthand(x['Modification'], x['Residue'], mod_group_type=mod_group_type), axis=1) if organism is not None: df = df[df['ORGANISM'] == organism] extra_cols = [] if organism is not None: df = df[df['ORGANISM'] == organism] else: extra_cols += ['ORGANISM'] if include_flank: extra_cols += ['SITE_+/-7_AA'] if include_domain: extra_cols += ['DOMAIN'] if include_MS: extra_cols += ['MS_LIT'] extra_cols += ['MS_CST'] if include_LTP: extra_cols += ['LT_LIT'] df = df[['ACC_ID', 'Residue', 'Position', 'Modification'] + extra_cols] return df def combine_PSP_dfs(phosphositeplus_dir, compressed = True, organism = None, include_flank = False, include_domain = False, include_MS = False, include_LTP = False, mod_group_type = 'fine'): extension = '.gz' if compressed else '' phospho = process_PSP_df(phosphositeplus_dir+'Phosphorylation_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS,include_LTP=include_LTP, mod_group_type=mod_group_type) ubiq = process_PSP_df(phosphositeplus_dir+'Ubiquitination_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) sumo = process_PSP_df(phosphositeplus_dir+'Sumoylation_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) galnac = process_PSP_df(phosphositeplus_dir+'O-GalNAc_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) glcnac = process_PSP_df(phosphositeplus_dir+'O-GlcNAc_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) meth = process_PSP_df(phosphositeplus_dir+'Methylation_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) acetyl = process_PSP_df(phosphositeplus_dir+'Acetylation_site_dataset'+extension, compressed = compressed, organism = organism, include_flank = include_flank, include_domain = include_domain, include_MS = include_MS, include_LTP=include_LTP, mod_group_type=mod_group_type) df = pd.concat([phospho, ubiq, sumo, galnac, glcnac, meth, acetyl]) return df def extract_num_studies(psp_dir, pscout_data = None): psp_df = combine_PSP_dfs(psp_dir, compressed = True, organism = 'human', include_MS = True, include_LTP = True, mod_group_type = 'coarse') #get canonical isoform IDs canonical_ids = mapper_config.translator[mapper_config.translator['UniProt Isoform Type'] == 'Canonical'][['UniProtKB/Swiss-Prot ID', 'UniProtKB isoform ID']].set_index('UniProtKB isoform ID').squeeze().to_dict() #couple rare cases where the canonical isoform is listed with its isoform ID, which conflicts with how I processed the data. convert these ideas to base uniprot ID psp_df_isoids = psp_df[psp_df['ACC_ID'].isin(canonical_ids.keys())].copy() psp_correctids = psp_df[~psp_df['ACC_ID'].isin(canonical_ids.keys())].copy() psp_df_isoids['ACC_ID'] = psp_df_isoids['ACC_ID'].map(canonical_ids) psp_df = pd.concat([psp_df_isoids, psp_correctids]) #groupby number of experiments by PTM site and modification, taking the max value (some rare cases where entries are basically identical but have different numbers of experiments) psp_df['PTM'] = psp_df['ACC_ID'] + '_' + psp_df['Residue'] + psp_df['Position'].astype(str) psp_df = psp_df.groupby(['PTM', 'Modification'], as_index = False)[['MS_LIT', 'MS_CST', 'LT_LIT']].max() psp_df = psp_df.rename(columns = {'Modification': 'Modification Class'}) if pscout_data is not None: pscout_experiments = pscout_data[pscout_data['Number of Experiments'] > 0] pscout_experiments = pscout_experiments.dropna(subset = 'Modification Class') psp_df = psp_df.merge(pscout_experiments[['PTM', 'Number of Experiments']], on = 'PTM', how = 'outer') psp_df = psp_df.fillna(0) psp_df['MS_LIT'] = psp_df['MS_LIT'] + psp_df['Number of Experiments'] psp_df = psp_df.drop(columns = 'Number of Experiments') return psp_df def append_num_studies(ptm_coordinates, psp_df): #remove ms columns if present ptm_coordinates = ptm_coordinates.drop(columns = [i for i in ['MS_LIT', 'MS_CST', 'LT_LIT'] if i in ptm_coordinates.columns]) #add PTM column to PTM coordinates, allowing for alternative isoform IDs ptm_coordinates['PTM'] = ptm_coordinates.apply(lambda x: x['Isoform ID'] + '_'+ x['Residue'] + str(int(x['PTM Position in Isoform'])) if x['Isoform Type'] == 'Alternative' else x['UniProtKB Accession'] + '_'+ x['Residue'] + str(int(x['PTM Position in Isoform'])), axis = 1) #combine psp MS info with coordinate data, check to make sure size doesn't change original_shape = ptm_coordinates.shape[0] ptm_coordinates = ptm_coordinates.merge(psp_df[['PTM', 'Modification Class', 'MS_LIT', 'MS_CST', 'LT_LIT']], on = ['PTM', 'Modification Class'], how = 'left') if original_shape != ptm_coordinates.shape[0]: raise ValueError('Size of dataframe changed after merging PhosphoSitePlus data. Please check for duplicates.') #go through entries without info and make sure it is not due to isoform ID issues missing_db_info = ptm_coordinates[ptm_coordinates[['MS_LIT', 'MS_CST', 'LT_LIT']].isna().all(axis = 1)] ptm_coordinates = ptm_coordinates[~ptm_coordinates[['MS_LIT', 'MS_CST', 'LT_LIT']].isna().all(axis = 1)] for i, row in missing_db_info.iterrows(): mod = row['Modification Class'] sources = [s for s in row['Source of PTM'].split(';') if row['Isoform ID'] not in s] for s in sources: tmp_data = psp_df[(psp_df['PTM'] == s) & (psp_df['Modification Class'] == mod)] if tmp_data.shape[0] > 0: tmp_data = tmp_data.squeeze() missing_db_info.loc[i, 'MS_LIT'] = tmp_data['MS_LIT'] missing_db_info.loc[i, 'MS_CST'] = tmp_data['MS_CST'] missing_db_info.loc[i, 'LT_LIT'] = tmp_data['LT_LIT'] break ptm_coordinates = pd.concat([ptm_coordinates, missing_db_info]) return ptm_coordinates def construct_mod_conversion_dict(): """ Reformat modification conversion dataframe to a dictionary for easy conversion between modification site and modification class """ modification_conversion = pose_config.modification_conversion[['Modification', 'Modification Class']].set_index('Modification') modification_conversion = modification_conversion.replace('Dimethylation', 'Methylation') modification_conversion = modification_conversion.replace('Trimethylation', 'Methylation') modification_conversion = modification_conversion.squeeze().to_dict() return modification_conversion def extract_number_compendia(ps_data_file): """ Given a proteomescout data file, extract the number of different compendia that support a PTM site in a format that works with ptm_coordinates file. The following experiment IDs are used to check for database evidence: 1395: HPRD 1790: PhosphoSitePlus 1323: Phospho.ELM 1688,1803: UniProt 1344: O-GlycBase 1575: dbPTM Parameters: ------------ ps_data_file: str Path to proteomescout data file Returns: ------------ ps_data: pd.DataFrame DataFrame with PTM site, modification class, and number of compendia that support the PTM site """ #load proteomescout data ps_data = pd.read_csv(ps_data_file, sep = '\t') ps_data = ps_data[ps_data['species'] == 'homo sapiens'] ps_data['modifications'] = ps_data['modifications'].str.split(';') ps_data['evidence'] = ps_data['evidence'].str.split(';') ps_data = ps_data.explode(['modifications','evidence']) #extract site numbers and modification types into unique columns, then convert modification to broad class to match ptm coordinates dataframe ps_data['site'] = ps_data['modifications'].str.split('-').str[0] ps_data['Modification'] = ps_data['modifications'].apply(lambda x: '-'.join(x.split('-')[1:])) modification_conversion = construct_mod_conversion_dict() ps_data['Modification Class'] = ps_data['Modification'].map(modification_conversion) ps_data = ps_data.drop(columns = ['Modification']) #split evidence into a list, then check to see if which entries correspond to database rather than literature study ps_data['evidence'] = ps_data['evidence'].apply(lambda x: [i.strip(' ') for i in x.split(',')]) ps_data['database evidence'] = ps_data['evidence'].apply(lambda x: set(x).intersection({'1395', '1790', '1323','1688','1803','1344','1575'})) ps_data['experimental evidence'] = ps_data['evidence'].apply(lambda x: list(set(x).difference({'1395', '1790', '1323','1688','1803','1344','1575'}))) #add specific compendia with site compendia_dict = {'1395':'HPRD','1790':'PhosphoSitePlus', '1323':'Phospho.ELM', '1688':'UniProt', '1803':'UniProt', '1344':'O-GlycBase','1575':'dbPTM'} ps_data['Compendia'] = ps_data['database evidence'].apply(lambda x: ['ProteomeScout'] + [compendia_dict[i] for i in x]) #separate all accessions into separate rows for merge ps_data['accessions'] = ps_data['accessions'].str.split(';') ps_data = ps_data.explode('accessions') #convert isoform IDs from having . to a - for consistency ps_data['accessions'] = ps_data['accessions'].str.replace('.','-').str.strip(' ') #create PTM column ps_data['PTM'] = ps_data['accessions'] + '_' + ps_data['site'].str.strip(' ') ps_data = ps_data.dropna(subset = ['PTM']) #some ptms may have multiple entries (due to multiple similar modification types), so combine them ps_data = ps_data.groupby(['PTM', 'Modification Class'], as_index = False)[['Compendia', 'experimental evidence']].agg(sum) ps_data['Number of Compendia'] = ps_data['Compendia'].apply(lambda x: len(set(x))) ps_data['Compendia'] = ps_data['Compendia'].apply(lambda x: ';'.join(set(x))) ps_data['Number of Experiments'] = ps_data['experimental evidence'].apply(lambda x: len(set(x))) ps_data['experimental evidence'] = ps_data['experimental evidence'].apply(lambda x: ';'.join(set(x))) return ps_data def append_num_compendia(ptm_coordinates, pscout_data): """ Given a PTM coordinates dataframe and processed proteomescout data generated by `extract_num_compendia()`, append the number of compendia that support a PTM site to the PTM coordinates dataframe. Check all potential isoform IDs for compendia information. Parameters: ------------ ptm_coordinates: pd.DataFrame DataFrame with PTM site information pscout_data: pd.DataFrame DataFrame with PTM site, modification class, and number of compendia that support the PTM site """ #remove existing columns if present ptm_coordinates = ptm_coordinates.drop(columns = [i for i in ['Compendia', 'Number of Compendia'] if i in ptm_coordinates.columns]) if 'PTM' not in ptm_coordinates.columns: ptm_coordinates['PTM'] = ptm_coordinates.apply(lambda x: x['Isoform ID'] + '_'+ x['Residue'] + str(int(x['PTM Position in Isoform'])) if x['Isoform Type'] == 'Alternative' else x['UniProtKB Accession'] + '_'+ x['Residue'] + str(int(x['PTM Position in Isoform'])), axis = 1) #merge the data ptm_coordinates = ptm_coordinates.merge(pscout_data, on = ['Modification Class', 'PTM'], how = 'left') #go through any missing entries and see if they can be filled in with other isoform IDs missing_db_info = ptm_coordinates[ptm_coordinates['Number of Compendia'].isna()] ptm_coordinates = ptm_coordinates[~ptm_coordinates['Compendia'].isna()] for i, row in tqdm(missing_db_info.iterrows(), total = missing_db_info.shape[0], desc = 'Going through missing compendia data, making sure not due to isoform ID issues'): mod = row['Modification Class'] sources = [s for s in row['Source of PTM'].split(';') if row['Isoform ID'] not in s] for s in sources: tmp_data = pscout_data[(pscout_data['PTM'] == s) & (pscout_data['Modification Class'] == mod)] if tmp_data.shape[0] > 0: tmp_data = tmp_data.squeeze() missing_db_info.loc[i, 'Compendia'] = tmp_data['Compendia'] missing_db_info.loc[i, 'Number of Compendia'] = tmp_data['Number of Compendia'] break ptm_coordinates = pd.concat([ptm_coordinates, missing_db_info]) #fill in remainder entries as PhosphoSitePlus ptm_coordinates['Compendia'] = ptm_coordinates['Compendia'].fillna('PhosphoSitePlus') ptm_coordinates['Number of Compendia'] = ptm_coordinates['Number of Compendia'].fillna(1) return ptm_coordinates def separate_by_modification(ptm_coordinates, mod_mapper): #separate out modification class (important for functional analysis) ptm_coordinates['Modification Class'] = ptm_coordinates['Modification Class'].str.split(';') ptm_coordinates = ptm_coordinates.explode('Modification Class') #reduce modification columns to only those relevant to modification class mod_mapper = mod_mapper.squeeze() mod_mapper = mod_mapper.str.split(';') mod_mapper = mod_mapper.to_dict() def filter_mods(x, mod_class): """ Go through modification list and only keep those that are relevant to the modification class (removes mismatch due to exploded dataframe) """ x_list = x.split(';') return ';'.join([i for i in x_list if i in mod_mapper[mod_class]]) ptm_coordinates['Modification'] = ptm_coordinates.apply(lambda x: filter_mods(x['Modification'], x['Modification Class']), axis = 1) return ptm_coordinates def generate_ptm_coordinates(pscout_data_file, phosphositeplus_filepath, mod_mapper, remap_PTMs = False, output_dir = None, ptm_info = None): mapper = mapping.PTM_mapper() if remap_PTMs: mapper.find_ptms_all(phosphositeplus_file = phosphositeplus_filepath) mapper.mapPTMs_all() #get coordinate info if mapper.ptm_coordinates is None: raise ValueError('No PTM coordinates found. Please set remap_PTMs to True to generate PTM coordinates. If you want to include PhosphoSitePlus data, please also include location of phosphositeplus data file in phosphositeplus_filepath argument.') #copy ptm_coordinates file to be edited for easier use with PTM-POSE ptm_coordinates = mapper.ptm_coordinates.copy() ptm_coordinates = get_unique_ptm_info(ptm_coordinates) # convert coordinates to hg19 and hg38, then from hg19 to hg18 ptm_coordinates['Gene Location (hg19)'] = convert_coordinates(ptm_coordinates, from_coord = 'hg38', to_coord = 'hg19') ptm_coordinates['Gene Location (hg18)'] = convert_coordinates(ptm_coordinates, from_coord = 'hg19', to_coord = 'hg18') #separate out PTMs that are found in UniProt accessions ptm_coordinates['PTM Position in Isoform'] = ptm_coordinates['PTM Position in Isoform'].str.split(';') ptm_coordinates['UniProtKB Accession'] = ptm_coordinates['UniProtKB Accession'].str.split(';') ptm_coordinates['Isoform ID'] = ptm_coordinates['Isoform ID'].str.split(';') ptm_coordinates = ptm_coordinates.explode(['UniProtKB Accession', 'Isoform ID', 'PTM Position in Isoform']).reset_index() #for further analysis ptm_coordinates['PTM'] = ptm_coordinates['Isoform ID'] + '_' + ptm_coordinates['Residue'] + ptm_coordinates['PTM Position in Isoform'] if ptm_info is None: ptm_info = mapper.ptm_info.copy() #add whether or not the PTM is constitutive if 'PTM Conservation Score' not in ptm_info.columns: raise Warning('No PTM conservation score found in ptm_info object. Will not add column indicating whether the PTM is considered to be constitutive or not.') else: #grab nonconstitutive list from ptm_info object nonconstitutive_list = set(ptm_info[ptm_info['PTM Conservation Score'] != 1].index.values) const_list = [] for i, row in tqdm(ptm_coordinates.iterrows(), total = ptm_coordinates.shape[0], desc = 'Checking for constitutive PTMs'): const_list.append(check_constitutive(row['PTM'], nonconstitutive_list)) ptm_coordinates['Constitutive'] = const_list #add flanking sequence information ptm_coordinates['Canonical Flanking Sequence'] = ptm_coordinates['PTM'].apply(lambda x: ptm_info.loc[x, 'Flanking Sequence'] if x == x else np.nan) #drop unnecessary columns and save #ptm_coordinates = ptm_coordinates.drop(columns = ['PTM Position in Canonical Isoform', 'PTM'], axis = 1) ptm_coordinates = ptm_coordinates[['Gene name', 'UniProtKB Accession', 'Isoform ID', 'Isoform Type', 'Residue', 'PTM Position in Isoform', 'Modification', 'Modification Class', 'Chromosome/scaffold name', 'Strand', 'Gene Location (hg38)', 'Gene Location (hg19)', 'Gene Location (hg18)', 'Constitutive', 'Canonical Flanking Sequence', 'Source of PTM']] ptm_coordinates = ptm_coordinates.reset_index() #there will be some non-unique genomic coordinates, so reset index to ensure unique index values #separate out modification class (important for functional analysis) ptm_coordinates = separate_by_modification(ptm_coordinates, mod_mapper) #add filtering criteria ## add number of compendia pscout_data = extract_number_compendia(pscout_data_file) ptm_coordinates = append_num_compendia(ptm_coordinates, pscout_data) ## number of MS experiments (from PhosphoSitePlus, fill in any remainder with ProteomeScout experiments) psp_df = generate_ptm_coordinates.extract_num_studies(phosphositeplus_filepath, pscout_data=pscout_data) ptm_coordinates = generate_ptm_coordinates.append_num_studies(ptm_coordinates, psp_df) #do some final custom editing for outlier examples missing_db_info = ptm_coordinates[ptm_coordinates[['MS_LIT', 'MS_CST', 'LT_LIT', 'Compendia']].isna().all(axis = 1)] ptm_coordinates = ptm_coordinates[~ptm_coordinates[['MS_LIT', 'MS_CST', 'LT_LIT', 'Compendia']].isna().all(axis = 1)] #custom fixes ms_lit = {'Q13765_K108':1, 'Q13765_K100':2} ms_cst = {'Q13765_K108':np.nan, 'Q13765_K100':np.nan} lt_lit = {'Q13765_K108':np.nan, 'Q13765_K100':np.nan} custom_compendia = {'Q13765_K108':'PhosphoSitePlus;ProteomeScout', 'Q13765_K100': 'PhosphoSitePlus;ProteomeScout'} custom_num_compendia = {'Q13765_K108':2, 'Q13765_K100':2} for i,row in missing_db_info.iterrows(): ptm = row['PTM'] missing_db_info.loc[i, 'MS_LIT'] = ms_lit[ptm] missing_db_info.loc[i, 'MS_CST'] = ms_cst[ptm] missing_db_info.loc[i, 'LT_LIT'] = lt_lit[ptm] missing_db_info.loc[i, 'Compendia'] = custom_compendia[ptm] missing_db_info.loc[i, 'Number of Compendia'] = custom_num_compendia[ptm] ptm_coordinates = pd.concat([ptm_coordinates, missing_db_info]) ptm_coordinates[['Number of Compendia', 'Number of Experiments', 'MS_LIT', 'MS_CST', 'LT_LIT']] = ptm_coordinates[['Number of Compendia', 'Number of Experiments', 'MS_LIT', 'MS_CST', 'LT_LIT']].fillna(0).astype(int) if output_dir is not None: ptm_coordinates.to_csv(output_dir + 'ptm_coordinates.csv', index = False) #write to text file indicating when the data was last updated with open(output_dir + 'last_updated.txt', 'w') as f: f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) return ptm_coordinates
07-08
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值