架构_sinat_34080511的博客-优快云博客

架构

关注

关注数：文章数：10 文章阅读量：6150 文章收藏量：0

作者: sinat_34080511

这个作者很懒，什么都没留下…

展开

专栏收录文章

python3.7 pyspider安装

修改两个地方， 1. async关键字造成的错误，把下面代码中的async关键字替换 python3.7/site-packages/pyspider/run.py python3.7/site-packages/pyspider/fetcher/tornado_fetcher.py python3.7/site-packages/pyspider/webui/app.py 2.werkzeug版本错误 pip uninstall werkzeug pip install werkzeu...

原创 2021-12-20 15:08:17 · 648 阅读 · 0 评论
升级gcc

转载输入 yum list | grep gcc 看是否有devtoolset-7-gcc、devtoolset-7-gcc-c++.x86_64 等依赖包可供选择。如果没有，进入步骤2，否则进入步骤3。你可能需要更换镜像源或者将原来的yum卸载，更换为非centos自带的yum。我推荐用后者，因为若只更换镜像源，yum还是无法安装gcc、g++等工具，不方便。在这里，我推荐按照https://blog.youkuaiyun.com/jianm_liu/article/details/78316690 这篇博文

原创 2021-12-18 21:45:00 · 2800 阅读 · 0 评论
go interface类型转换

target是一个[{}]的interface类型。 fmt.Println("type:", reflect.TypeOf(tar)) for _, i := range tar.([]interface{}) { v, ok := i.(map[string]interface{}) if ok { text := v["text"].(string) ...

原创 2020-03-19 11:15:23 · 411 阅读 · 0 评论
http压测工具wrk

https://github.com/wg/wrk git clone https://github.com/giltene/wrk2.git make编译 ./wrk -t2 --latency -c100 -d100s -R10000 "http=encode(中文)" 一般两个线程t2

原创 2019-08-30 10:13:09 · 308 阅读 · 0 评论
lucene tfidf

lucene tfidf score获取 idf indexReader.docFreq(new Term(FIELD, “中国”)) indexReader.maxDoc() tf Terms terms = indexReader.getTermVector(docID, TEXT_FIELD); TermsEnum termsEnum = terms.iter...

原创 2019-03-20 15:42:34 · 481 阅读 · 0 评论
lucene学习

Document：Documents are the unit of indexing and search. A Document is a set of fields. http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/Document.html#add(org.apache.lucene.docume...

原创 2019-02-20 20:31:47 · 238 阅读 · 0 评论
The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATUR

https://stackoverflow.com/questions/42991043/error-xml-sax-saxparseexception-while-parsing-a-xml-file-using-wikixmlj

原创 2019-02-13 14:44:25 · 472 阅读 · 0 评论
java解析xml

package xml.learn; import java.io.File; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml...

原创 2019-02-22 10:46:44 · 100 阅读 · 0 评论
python处理json

打印输出print(json.dumps(data,ensure_ascii=False))写入到文本方法：编码格式统一UTF8，读取文件后，fout=open（”test”,”w”）,fout.write(json.dumps(data,ensure_ascii=False)+”\n”),这种方式会导致写入的文件中文不能显示。解决办法，我们需要以指定的编码方式打开输出文件import codecs

转载 2017-06-07 23:08:49 · 259 阅读 · 0 评论
pyspider

pyspider学习

原创 2017-01-23 19:41:14 · 433 阅读 · 0 评论

架构

作者: sinat_34080511

python3.7 pyspider安装

升级gcc

go interface类型转换

http压测工具wrk

lucene tfidf

lucene学习

The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATUR

java解析xml

python处理json

pyspider