urllib2和urllib
https://www.pythonforbeginners.com/python-on-the-web/how-to-use-urllib2-in-python/
import urllib2
response = urllib2.urlopen('https://www.pythonforbeginners.com/')
print response.info()
html = response.read()
# do something
response.close() # best practice to close the file
Note: you can also use an URL starting with "ftp:", "file:", etc.).
The remote server accepts the incoming values and formats a plain text response
to send back.
The return value from urlopen() gives access to the headers from the HTTP server
through the info() method, and the data for the remote resource via methods like
read() and readlines().
Additionally, the file object that is returned by urlopen() is iterable.
Simple urllib2 script
import urllib2
response = urllib2.urlopen('http://python.org/')
print "Response:", response
# Get the URL. This gets the real URL.
print "The URL is: ", response.geturl()
# Getting the code
print "This gets the code: ", response.code
# Get the Headers.
# This returns a dictionary-like object that describes the page fetched,
# particularly the headers sent by the server
print "The Headers are: ", response.info()
# Get the date part of the header
print "The Date is: ", response.info()['date']
# Get the server part of the header
print "The Server is: ", response.info()['server']
# Get all data
html = response.read()
print "Get all data: ", html
# Get only the length
print "Get the length :", len(html)
# Showing that the file object is iterable
for line in response:
print line.rstrip()
# Note that the rstrip strips the trailing newlines and carriage returns before
# printing the output.
输出:
Response: <addinfourl at 48879368L whose fp = <socket._fileobject object at 0x0000000002A64228>>
The URL is: https://www.baidu.com
This gets the code: 200
The Headers are: Accept-Ranges: bytes
Cache-Control: no-cache
Content-Length: 227
Content-Type: text/html
Date: Thu, 25 Apr 2019 11:55:23 GMT
Etag: "5cac0cfb-e3"
Last-Modified: Tue, 09 Apr 2019 03:09:47 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Pragma: no-cache
Server: BWS/1.1
Set-Cookie: BD_NOT_HTTPS=1; path=/; Max-Age=300
Set-Cookie: BIDUPSID=51E1E36DF6EFA75957C8FF15C55C3A4E; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1556193323; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Strict-Transport-Security: max-age=0
X-Ua-Compatible: IE=Edge,chrome=1
Connection: close
The Date is: Thu, 25 Apr 2019 11:55:23 GMT
The Server is: BWS/1.1
Get all data: <html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
Get the length : 227
Urllib2 Requests
The Request object represents the HTTP request you are making.
In its simplest form you create a request object that specifies the URL you want
to fetch.
Calling urlopen with this Request object returns a response object for the URL
requested.
The request function under the urllib2 class accepts both url and parameter.
When you don't include the data (and only pass the url), the request being made
is actually a GET request
When you do include the data, the request being made is a POST request, where the
url will be your post url, and the parameter will be http post content.
1)发送Request Object,如果这个请求没有数据,就是一个GET请求;如果有数据,就是一个POST请求.
2)server返回一个response Object.
import urllib2
import urllib
# Specify the url
url = 'https://www.pythonforbeginners.com'
# This packages the request (it doesn't make it)
request = urllib2.Request(url)
# Sends the request and catches the response
response = urllib2.urlopen(request)
# Extracts the response
html = response.read()
# Print it out
print html
MinIO Client
https://docs.min.io/docs/python-client-quickstart-guide.html
You need four items in order to connect to MinIO object storage server.
| Params | Description |
|---|---|
| endpoint | URL to object storage service. |
| access_key | Access key is like user ID that uniquely identifies your account. |
| secret_key | Secret key is the password to your account. |
| secure | Set this value to 'True' to enable secure (HTTPS) access. |
endPoint是你要连接的存储数据的Server的IP
from minio import Minio
from minio.error import ResponseError
minioClient = Minio('play.min.io:9000',
access_key='Q3AM3UQ867SPQQA43P2F',
secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG',
secure=True)
Quick Start Example - File Uploader
This example program connects to a MinIO object storage server, makes a bucket on the server and then uploads a file to the bucket.
We will use the MinIO server running at https://play.min.io:9000 in this example. Feel free to use this service for testing and development. Access credentials shown in this example are open to the public.
file-uploader.py
# Import MinIO library.
from minio import Minio
from minio.error import (ResponseError, BucketAlreadyOwnedByYou,
BucketAlreadyExists)
# Initialize minioClient with an endpoint and access/secret keys.
minioClient = Minio('play.min.io:9000',
access_key='Q3AM3UQ867SPQQA43P2F',
secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG',
secure=True)
# Make a bucket with the make_bucket API call.
try:
minioClient.make_bucket("maylogs", location="us-east-1")
except BucketAlreadyOwnedByYou as err:
pass
except BucketAlreadyExists as err:
pass
except ResponseError as err:
raise
else:
# Put an object 'pumaserver_debug.log' with contents from 'pumaserver_debug.log'.
try:
minioClient.fput_object('maylogs', 'pumaserver_debug.log', '/tmp/pumaserver_debug.log')
except ResponseError as err:
print(err)
python nohup linux 后台运行输出
https://blog.youkuaiyun.com/qq_31821675/article/details/78246808
在后台运行作业时要当心:需要用户交互的命令不要放在后台执行,因为这样你的机器就会在那里傻等。不过,作业在后台运行一样会将结果输出到屏幕上,干扰你的工作。如果放在后台运行的作业会产生大量的输出,最好使用下面的方法把它的输出重定向到某个文件中:
command >out.file 2>&1 &
在上面的例子中,2>&1表示所有的标准输出和错误输出都将被重定向到一个叫做out.file 的文件中。 当你成功地提交进程以后,就会显示出一个进程号,可以用它来监控该进程,或杀死它
shutil.rmtree()
https://blog.youkuaiyun.com/HappyRocking/article/details/79806808
shutil.rmtree() 表示递归删除文件夹下的所有子文件夹和子文件
因此如果想删除E盘下某个文件夹,可以用
shutil.rmtree('E:\\myPython\\image-filter\\test', ignore_errors=True)
这样 test 文件夹内的所有文件(包括 test 本身)都会被删除,并且忽略错误。
hashlib
阅读: 28685
摘要算法简介--hashlib
Python的hashlib提供了常见的摘要算法,如MD5,SHA1等等。
什么是摘要算法呢?摘要算法又称哈希算法、散列算法。它通过一个函数,把任意长度的数据转换为一个长度固定的数据串(通常用16进制的字符串表示)。
举个例子,你写了一篇文章,内容是一个字符串'how to use python hashlib - by Michael',并附上这篇文章的摘要是'2d73d4f15c0db7f5ecb321b6a65e5d6d'。如果有人篡改了你的文章,并发表为'how to use python hashlib - by Bob',你可以一下子指出Bob篡改了你的文章,因为根据'how to use python hashlib - by Bob'计算出的摘要不同于原始文章的摘要。
可见,摘要算法就是通过摘要函数f()对任意长度的数据data计算出固定长度的摘要digest,目的是为了发现原始数据是否被人篡改过。
摘要算法之所以能指出数据是否被篡改过,就是因为摘要函数是一个单向函数,计算f(data)很容易,但通过digest反推data却非常困难。而且,对原始数据做一个bit的修改,都会导致计算出的摘要完全不同。
我们以常见的摘要算法MD5为例,计算出一个字符串的MD5值:
import hashlib
md5 = hashlib.md5()
md5.update('how to use md5 in python hashlib?')
print md5.hexdigest()
计算结果如下:
d26a53750bc40b38b65a520292f69306
如果数据量很大,可以分块多次调用update(),最后计算的结果是一样的:
import hashlib
md5 = hashlib.md5()
md5.update("dxt00...")
print md5.hexdigest()
md5_ = hashlib.md5()
md5_.update("dxt00")
md5_.update("...")
print md5_.hexdigest()
输出:
88d619bc8acd34781edfd01c9bfa26eb
88d619bc8acd34781edfd01c9bfa26eb
python中的os.path模块用法
https://blog.youkuaiyun.com/ziyuzhao123/article/details/8811496
1. dirname() 用于去掉文件名,返回目录所在的路径
如:
>>> import os
>>> os.path.dirname('d:\\library\\book.txt')
'd:\\library'
2. basename() 用于去掉目录的路径,只返回文件名
如:
>>> import os
>>> os.path.basename('d:\\library\\book.txt')
'book.txt'
3. join() 用于将分离的各部分组合成一个路径名
如:
>>> import os
>>> os.path.join('d:\\library','book.txt')
'd:\\library\\book.txt'
4. split() 用于返回目录路径和文件名的元组
如:
>>> import os
>>> os.path.split('d:\\library\\book.txt')
('d:\\library', 'book.txt')
5. splitdrive() 用于返回盘符和路径字符元组
>>> import os
>>> os.path.splitdrive('d:\\library\\book.txt')
('d:', '\\library\\book.txt')
6. splitext() 用于返回文件名和扩展名元组
如:
>>> os.path.splitext('d:\\library\\book.txt')
('d:\\library\\book', '.txt')
>>> os.path.splitext('book.txt')
('book', '.txt')
urllib---urlencode与urldecode
https://blog.youkuaiyun.com/haoni123321/article/details/15814111
urlencode把dict的键值key-value变为:key1=value1&key2=value2......的形式
#-*-coding:utf-8-*-
import urllib
data = {
'filename':'replay_process',
'cloudurl':'jkdsjkj'
}
print urllib.urlencode(data)
输出:
cloudurl=jkdsjkj&filename=replay_process
URL特殊符号及对应的十六进制值编码
+ URL中+号表示空格 %2B
空格 URL中的空格可以用+号或者编码 %20
/ 分隔目录和子目录 %2F
? 分隔实际的 URL 和参数 %3F
% 指定特殊字符 %25
# 表示书签 %23
& URL中指定的参数间的分隔符 %26
= URL中指定参数的值 %3D
sys.path.insert()
https://blog.youkuaiyun.com/qq_27923041/article/details/72878635
注:sys.path模块是动态的修改系统路径
模块要处于Python搜索路径中的目录里才能被导入,但我们不喜欢维护一个永久性的大目录,因为其他所有的Python脚本和应用程序导入模块的时候性能都会被拖累。本节代码动态地在该路径中添加了一个"目录",当然前提是此目录存在而且此前不在sys.path中。
sys.path是个列表,所以在末尾添加目录是很容易的,用sys.path.append就行了。当这个append执行完之后,新目录即时起效,以后的每次import操作都可能会检查这个目录。如同解决方案所示,可以选择用sys.path.insert(0,…,这样新添加的目录会优先于其他目录被import检查。
即使sys.path中存在重复,或者一个不存在的目录被不小心添加进来,也没什么大不了,Python的import语句非常聪明,它会自己应付这类问题。但是,如果每次import时都发生这种错误(比如,重复的不成功搜索,操作系统提示的需要进一步处理的错误),我们会被迫付出一点小小的性能代价。为了避免这种无谓的开销,本节代码在向sys.path添加内容时非常谨慎,绝不加入不存在的目录或者重复的目录。程序向sys.path添加的目录只会在此程序的生命周期之内有效,其他所有的对sys.path的动态操作也是如此。
本文详细介绍Python中urllib2库的使用,包括发送GET和POST请求,处理响应头信息和数据。同时,介绍了MinIO客户端的快速入门,展示了如何连接对象存储服务器,创建桶并上传文件。此外,还讲解了Python中的os.path模块、shutil.rmtree()函数、hashlib库的用法,以及如何在Linux中使用nohup命令后台运行Python程序。
1820

被折叠的 条评论
为什么被折叠?



