解放军报pdf电子版半自动化批零整合

批量下载与整合PDF
本文介绍了一种批量下载解放军报PDF电子版并整合的方法,包括分析URL结构、使用Excel生成下载地址、利用第三方软件批量下载及通过Python脚本整合PDF文件。
报纸头版头条对于很多公考、事业编考试、军队文职人员、军队转安置考试来讲,决定其时政信息的考试成绩
现在简单说明一下报纸pdf电子版批零整合的方法
1、首先登陆解放军报电子版网站
2、在其中找到第一版的pdf下载地址
3、分析其中的url结构
4、发现按照一定的时间规律进行排序
5、利用excel定义某个字段yyyymmdd   yyyy/mm/dd  或者yyyy-mm/dd
等多种形式就可以批量生成url下载地址
6、利用迅雷等第三方软件进行批量下载
7、下载后,利用cmd命令获取文件名(详见本博客其他页面)
8、启动PyCharm 2020.2 x64
9、输入下面的程序,由于文件名不一致,您可能需要利用excel再次拍一下代码里面重复出现的行
10、受水平限制,暂时还没发实现全自动化采集。




from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

input1 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019083101_pdf.pdf", "rb")
input2 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090101_pdf.pdf", "rb")
input3 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090201_pdf.pdf", "rb")
input4 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090301_pdf.pdf", "rb")
input5 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090401_pdf.pdf", "rb")
input6 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090501_pdf.pdf", "rb")
input7 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090601_pdf.pdf", "rb")
input8 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090701_pdf.pdf", "rb")
input9 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090801_pdf.pdf", "rb")
input10 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019090901_pdf.pdf", "rb")
input11 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091001_pdf.pdf", "rb")
input12 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091101_pdf.pdf", "rb")
input13 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091201_pdf.pdf", "rb")
input14 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091301_pdf.pdf", "rb")
input15 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091401_pdf.pdf", "rb")
input16 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091501_pdf.pdf", "rb")
input17 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091601_pdf.pdf", "rb")
input18 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091701_pdf.pdf", "rb")
input19 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091801_pdf.pdf", "rb")
input20 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019091901_pdf.pdf", "rb")
input21 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092001_pdf.pdf", "rb")
input22 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092101_pdf.pdf", "rb")
input23 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092201_pdf.pdf", "rb")
input24 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092301_pdf.pdf", "rb")
input25 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092401_pdf.pdf", "rb")
input26 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092501_pdf.pdf", "rb")
input27 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092601_pdf.pdf", "rb")
input28 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092701_pdf.pdf", "rb")
input29 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092801_pdf.pdf", "rb")
input30 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019092901_pdf.pdf", "rb")
input31 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019093001_pdf.pdf", "rb")
input32 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100101_pdf.pdf", "rb")
input33 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100201_pdf.pdf", "rb")
input34 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100301_pdf.pdf", "rb")
input35 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100401_pdf.pdf", "rb")
input36 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100501_pdf.pdf", "rb")
input37 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100601_pdf.pdf", "rb")
input38 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100701_pdf.pdf", "rb")
input39 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100801_pdf.pdf", "rb")
input40 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019100901_pdf.pdf", "rb")
input41 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101001_pdf.pdf", "rb")
input42 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101101_pdf.pdf", "rb")
input43 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101201_pdf.pdf", "rb")
input44 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101301_pdf.pdf", "rb")
input45 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101401_pdf.pdf", "rb")
input46 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101501_pdf.pdf", "rb")
input47 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101601_pdf.pdf", "rb")
input48 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101701_pdf.pdf", "rb")
input49 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101801_pdf.pdf", "rb")
input50 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019101901_pdf.pdf", "rb")
input51 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102001_pdf.pdf", "rb")
input52 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102101_pdf.pdf", "rb")
input53 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102201_pdf.pdf", "rb")
input54 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102301_pdf.pdf", "rb")
input55 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102401_pdf.pdf", "rb")
input56 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102501_pdf.pdf", "rb")
input57 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102601_pdf.pdf", "rb")
input58 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102701_pdf.pdf", "rb")
input59 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102801_pdf.pdf", "rb")
input60 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019102901_pdf.pdf", "rb")
input61 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019103001_pdf.pdf", "rb")
input62 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019103101_pdf.pdf", "rb")
input63 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110101_pdf.pdf", "rb")
input64 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110201_pdf.pdf", "rb")
input65 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110301_pdf.pdf", "rb")
input66 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110401_pdf.pdf", "rb")
input67 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110501_pdf.pdf", "rb")
input68 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110601_pdf.pdf", "rb")
input69 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110701_pdf.pdf", "rb")
input70 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110801_pdf.pdf", "rb")
input71 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019110901_pdf.pdf", "rb")
input72 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111001_pdf.pdf", "rb")
input73 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111101_pdf.pdf", "rb")
input74 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111201_pdf.pdf", "rb")
input75 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111301_pdf.pdf", "rb")
input76 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111401_pdf.pdf", "rb")
input77 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111501_pdf.pdf", "rb")
input78 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111601_pdf.pdf", "rb")
input79 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111701_pdf.pdf", "rb")
input80 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111801_pdf.pdf", "rb")
input81 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019111901_pdf.pdf", "rb")
input82 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112001_pdf.pdf", "rb")
input83 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112101_pdf.pdf", "rb")
input84 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112201_pdf.pdf", "rb")
input85 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112301_pdf.pdf", "rb")
input86 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112401_pdf.pdf", "rb")
input87 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112501_pdf.pdf", "rb")
input88 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112601_pdf.pdf", "rb")
input89 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112701_pdf.pdf", "rb")
input90 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112801_pdf.pdf", "rb")
input91 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019112901_pdf.pdf", "rb")
input92 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019113001_pdf.pdf", "rb")
input93 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120101_pdf.pdf", "rb")
input94 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120201_pdf.pdf", "rb")
input95 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120301_pdf.pdf", "rb")
input96 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120401_pdf.pdf", "rb")
input97 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120501_pdf.pdf", "rb")
input98 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120601_pdf.pdf", "rb")
input99 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120701_pdf.pdf", "rb")
input100 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120801_pdf.pdf", "rb")
input101 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019120901_pdf.pdf", "rb")
input102 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121001_pdf.pdf", "rb")
input103 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121101_pdf.pdf", "rb")
input104 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121201_pdf.pdf", "rb")
input105 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121301_pdf.pdf", "rb")
input106 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121401_pdf.pdf", "rb")
input107 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121501_pdf.pdf", "rb")
input108 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121601_pdf.pdf", "rb")
input109 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121701_pdf.pdf", "rb")
input110 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121801_pdf.pdf", "rb")
input111 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019121901_pdf.pdf", "rb")
input112 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019122001_pdf.pdf", "rb")
input113 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019122101_pdf.pdf", "rb")
input114 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019122201_pdf.pdf", "rb")
input115 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019122301_pdf.pdf", "rb")
input116 = open("C:\\Users\\laiwu\\Desktop\jfjb\\2019122401_pdf.pdf", "rb")
in
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值