python 笔记(二)

本文详细介绍Python中urllib2库的使用,包括发送GET和POST请求,处理响应头信息和数据。同时,介绍了MinIO客户端的快速入门,展示了如何连接对象存储服务器,创建桶并上传文件。此外,还讲解了Python中的os.path模块、shutil.rmtree()函数、hashlib库的用法,以及如何在Linux中使用nohup命令后台运行Python程序。

urllib2和urllib

https://www.pythonforbeginners.com/python-on-the-web/how-to-use-urllib2-in-python/

import urllib2
response = urllib2.urlopen('https://www.pythonforbeginners.com/')
print response.info()
html = response.read()
# do something
response.close()  # best practice to close the file

Note: you can also use an URL starting with "ftp:", "file:", etc.).
The remote server accepts the incoming values and formats a plain text response
to send back. 

The return value from urlopen() gives access to the headers from the HTTP server
through the info() method, and the data for the remote resource via methods like
read() and readlines(). 

Additionally, the file object that is returned by urlopen() is iterable. 

Simple urllib2 script

import urllib2
response = urllib2.urlopen('http://python.org/')
print "Response:", response

# Get the URL. This gets the real URL. 
print "The URL is: ", response.geturl()

# Getting the code
print "This gets the code: ", response.code

# Get the Headers. 
# This returns a dictionary-like object that describes the page fetched, 
# particularly the headers sent by the server
print "The Headers are: ", response.info()

# Get the date part of the header
print "The Date is: ", response.info()['date']

# Get the server part of the header
print "The Server is: ", response.info()['server']

# Get all data
html = response.read()
print "Get all data: ", html

# Get only the length
print "Get the length :", len(html)

# Showing that the file object is iterable
for line in response:
 print line.rstrip()

# Note that the rstrip strips the trailing newlines and carriage returns before
# printing the output.

输出:

Response: <addinfourl at 48879368L whose fp = <socket._fileobject object at 0x0000000002A64228>>
The URL is:  https://www.baidu.com
This gets the code:  200
The Headers are:  Accept-Ranges: bytes
Cache-Control: no-cache
Content-Length: 227
Content-Type: text/html
Date: Thu, 25 Apr 2019 11:55:23 GMT
Etag: "5cac0cfb-e3"
Last-Modified: Tue, 09 Apr 2019 03:09:47 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Pragma: no-cache
Server: BWS/1.1
Set-Cookie: BD_NOT_HTTPS=1; path=/; Max-Age=300
Set-Cookie: BIDUPSID=51E1E36DF6EFA75957C8FF15C55C3A4E; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1556193323; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Strict-Transport-Security: max-age=0
X-Ua-Compatible: IE=Edge,chrome=1
Connection: close

The Date is:  Thu, 25 Apr 2019 11:55:23 GMT
The Server is:  BWS/1.1
Get all data:  <html>
<head>
	<script>
		location.replace(location.href.replace("https://","http://"));
	</script>
</head>
<body>
	<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
Get the length : 227

 

Urllib2 Requests

The Request object represents the HTTP request you are making.

In its simplest form you create a request object that specifies the URL you want
to fetch. 

Calling urlopen with this Request object returns a response object for the URL
requested. 

The request function under the urllib2 class accepts both url and parameter. 

When you don't include the data (and only pass the url), the request being made
is actually a GET request

When you do include the data, the request being made is a POST request, where the
url will be your post url, and the parameter will be http post content.

1)发送Request Object,如果这个请求没有数据,就是一个GET请求;如果有数据,就是一个POST请求.

2)server返回一个response Object.

import urllib2
import urllib

# Specify the url
url = 'https://www.pythonforbeginners.com'

# This packages the request (it doesn't make it) 
request = urllib2.Request(url)

# Sends the request and catches the response
response = urllib2.urlopen(request)

# Extracts the response
html = response.read()

# Print it out
print html 

 MinIO Client

https://docs.min.io/docs/python-client-quickstart-guide.html

You need four items in order to connect to MinIO object storage server.

ParamsDescription
endpointURL to object storage service.
access_keyAccess key is like user ID that uniquely identifies your account.
secret_keySecret key is the password to your account.
secureSet this value to 'True' to enable secure (HTTPS) access.

 

endPoint是你要连接的存储数据的Server的IP

from minio import Minio
from minio.error import ResponseError

minioClient = Minio('play.min.io:9000',
                  access_key='Q3AM3UQ867SPQQA43P2F',
                  secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG',
                  secure=True)

Quick Start Example - File Uploader

This example program connects to a MinIO object storage server, makes a bucket on the server and then uploads a file to the bucket.

We will use the MinIO server running at https://play.min.io:9000 in this example. Feel free to use this service for testing and development. Access credentials shown in this example are open to the public.

file-uploader.py

# Import MinIO library.
from minio import Minio
from minio.error import (ResponseError, BucketAlreadyOwnedByYou,
                         BucketAlreadyExists)

# Initialize minioClient with an endpoint and access/secret keys.
minioClient = Minio('play.min.io:9000',
                    access_key='Q3AM3UQ867SPQQA43P2F',
                    secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG',
                    secure=True)

# Make a bucket with the make_bucket API call.
try:
       minioClient.make_bucket("maylogs", location="us-east-1")
except BucketAlreadyOwnedByYou as err:
       pass
except BucketAlreadyExists as err:
       pass
except ResponseError as err:
       raise
else:
        # Put an object 'pumaserver_debug.log' with contents from 'pumaserver_debug.log'.
        try:
               minioClient.fput_object('maylogs', 'pumaserver_debug.log', '/tmp/pumaserver_debug.log')
        except ResponseError as err:
               print(err)

 

python nohup linux 后台运行输出

https://blog.youkuaiyun.com/qq_31821675/article/details/78246808

在后台运行作业时要当心:需要用户交互的命令不要放在后台执行,因为这样你的机器就会在那里傻等。不过,作业在后台运行一样会将结果输出到屏幕上,干扰你的工作。如果放在后台运行的作业会产生大量的输出,最好使用下面的方法把它的输出重定向到某个文件中:
command >out.file 2>&1 &
在上面的例子中,2>&1表示所有的标准输出和错误输出都将被重定向到一个叫做out.file 的文件中。 当你成功地提交进程以后,就会显示出一个进程号,可以用它来监控该进程,或杀死它
 


 

shutil.rmtree() 

https://blog.youkuaiyun.com/HappyRocking/article/details/79806808

shutil.rmtree() 表示递归删除文件夹下的所有子文件夹和子文件

因此如果想删除E盘下某个文件夹,可以用

shutil.rmtree('E:\\myPython\\image-filter\\test', ignore_errors=True)

这样 test 文件夹内的所有文件(包括 test 本身)都会被删除,并且忽略错误。


hashlib

阅读: 28685


摘要算法简介--hashlib

https://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/0013868328251266d86585fc9514536a638f06b41908d44000

Python的hashlib提供了常见的摘要算法,如MD5,SHA1等等。

什么是摘要算法呢?摘要算法又称哈希算法、散列算法。它通过一个函数,把任意长度的数据转换为一个长度固定的数据串(通常用16进制的字符串表示)。

举个例子,你写了一篇文章,内容是一个字符串'how to use python hashlib - by Michael',并附上这篇文章的摘要是'2d73d4f15c0db7f5ecb321b6a65e5d6d'。如果有人篡改了你的文章,并发表为'how to use python hashlib - by Bob',你可以一下子指出Bob篡改了你的文章,因为根据'how to use python hashlib - by Bob'计算出的摘要不同于原始文章的摘要。

可见,摘要算法就是通过摘要函数f()对任意长度的数据data计算出固定长度的摘要digest,目的是为了发现原始数据是否被人篡改过。

摘要算法之所以能指出数据是否被篡改过,就是因为摘要函数是一个单向函数,计算f(data)很容易,但通过digest反推data却非常困难。而且,对原始数据做一个bit的修改,都会导致计算出的摘要完全不同。

我们以常见的摘要算法MD5为例,计算出一个字符串的MD5值:

import hashlib

md5 = hashlib.md5()
md5.update('how to use md5 in python hashlib?')
print md5.hexdigest()

计算结果如下:

d26a53750bc40b38b65a520292f69306

如果数据量很大,可以分块多次调用update(),最后计算的结果是一样的:

import hashlib

md5 = hashlib.md5()
md5.update("dxt00...")

print md5.hexdigest()

md5_ = hashlib.md5()
md5_.update("dxt00")
md5_.update("...")

print md5_.hexdigest()

 输出:

88d619bc8acd34781edfd01c9bfa26eb
88d619bc8acd34781edfd01c9bfa26eb

python中的os.path模块用法

https://blog.youkuaiyun.com/ziyuzhao123/article/details/8811496

1. dirname()   用于去掉文件名,返回目录所在的路径

如:

>>> import os

>>> os.path.dirname('d:\\library\\book.txt')
'd:\\library'

2. basename()   用于去掉目录的路径,只返回文件名

如:

>>> import os

>>> os.path.basename('d:\\library\\book.txt')
'book.txt'

3.  join()   用于将分离的各部分组合成一个路径名

如:

>>> import os

>>> os.path.join('d:\\library','book.txt')
'd:\\library\\book.txt'

4.   split()  用于返回目录路径和文件名的元组

如:

>>> import os

>>> os.path.split('d:\\library\\book.txt')
('d:\\library', 'book.txt')

5.   splitdrive()    用于返回盘符和路径字符元组

>>> import os

>>> os.path.splitdrive('d:\\library\\book.txt')
('d:', '\\library\\book.txt')

6.   splitext()    用于返回文件名和扩展名元组

如:

>>> os.path.splitext('d:\\library\\book.txt')
('d:\\library\\book', '.txt')

>>> os.path.splitext('book.txt')
('book', '.txt')


urllib---urlencode与urldecode

https://blog.youkuaiyun.com/haoni123321/article/details/15814111

urlencode把dict的键值key-value变为:key1=value1&key2=value2......的形式

#-*-coding:utf-8-*-
import urllib

data = {
	'filename':'replay_process',
	'cloudurl':'jkdsjkj'
}
print urllib.urlencode(data)

 输出:

cloudurl=jkdsjkj&filename=replay_process

 


 

URL特殊符号及对应的十六进制值编码

   
 +       URL中+号表示空格                  %2B   
 空格     URL中的空格可以用+号或者编码        %20   
 /       分隔目录和子目录                   %2F    
 ?       分隔实际的 URL 和参数              %3F    
 %       指定特殊字符                      %25    
 #       表示书签                         %23    
 &       URL中指定的参数间的分隔符           %26    
 =       URL中指定参数的值                 %3D


 sys.path.insert()

https://blog.youkuaiyun.com/qq_27923041/article/details/72878635 

注:sys.path模块是动态的修改系统路径

模块要处于Python搜索路径中的目录里才能被导入,但我们不喜欢维护一个永久性的大目录,因为其他所有的Python脚本和应用程序导入模块的时候性能都会被拖累。本节代码动态地在该路径中添加了一个"目录",当然前提是此目录存在而且此前不在sys.path中。

sys.path是个列表,所以在末尾添加目录是很容易的,用sys.path.append就行了。当这个append执行完之后,新目录即时起效,以后的每次import操作都可能会检查这个目录。如同解决方案所示,可以选择用sys.path.insert(0,…,这样新添加的目录会优先于其他目录被import检查。

即使sys.path中存在重复,或者一个不存在的目录被不小心添加进来,也没什么大不了,Python的import语句非常聪明,它会自己应付这类问题。但是,如果每次import时都发生这种错误(比如,重复的不成功搜索,操作系统提示的需要进一步处理的错误),我们会被迫付出一点小小的性能代价。为了避免这种无谓的开销,本节代码在向sys.path添加内容时非常谨慎,绝不加入不存在的目录或者重复的目录。程序向sys.path添加的目录只会在此程序的生命周期之内有效,其他所有的对sys.path的动态操作也是如此。
 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值