python学习之 requests库

最新推荐文章于 2024-10-28 11:50:28 发布

转载最新推荐文章于 2024-10-28 11:50:28 发布 · 642 阅读

机器学习专栏收录该内容

32 篇文章

订阅专栏

本文介绍如何使用Python的Requests库简化HTTP请求操作，包括GET、POST等请求方式，并讲解了如何处理请求参数、响应内容、自定义头部信息及管理Cookies。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

官方文档http://docs.python-requests.org/en/master/user/quickstart/#make-a-request

requests是urllib的一个封装，能够用简单的html方法来实现我们需要的操作，同时在cookie，headers等方面都能够实现自我定制

Make a Request

Making a request with Requests is very simple.

Begin by importing the Requests module:

 
   >>> import requests

Now, let's try to get a webpage. For this example, let's get GitHub's public timeline:

 
   >>> r = requests.get('https://api.github.com/events')

Now, we have a Response object called r. We can get all the information we need from this object.

Requests' simple API means that all forms of HTTP request are as obvious. For example, this is how you make an HTTP POST request:

 
   >>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})

Nice, right? What about the other HTTP request types: PUT, DELETE, HEAD and OPTIONS? These are all just as simple:

 
   >>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

That's all well and good, but it's also only the start of what Requests can do.

Passing Parameters In URLs

You often want to send some sort of data in the URL's query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g.httpbin.org/get?key=val. Requests allows you to provide these arguments as a dictionary, using theparams keyword argument. As an example, if you wanted to pass key1=value1 and key2=value2 tohttpbin.org/get, you would use the following code:

 
     >>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('http://httpbin.org/get', params=payload)

You can see that the URL has been correctly encoded by printing the URL:

 
     >>> print(r.url)
http://httpbin.org/get?key2=value2&key1=value1

Note that any dictionary key whose value is None will not be added to the URL's query string.

You can also pass a list of items as a value:

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3

Response Content

We can read the content of the server's response. Consider the GitHub timeline again:

 
    >>> import requests

>>> r = requests.get('https://api.github.com/events')
>>> r.text
u'[{"repository":{"open_issues":0,"url":"https://github.com/...

Requests will automatically decode content from the server. Most unicode charsets are seamlessly decoded.

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property:

 
    >>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

If you change the encoding, Requests will use the new value of r.encoding whenever you call r.text. You might want to do this in any situation where you can apply special logic to work out what the encoding of the content will be. For example, HTTP and XML have the ability to specify their encoding in their body. In situations like this, you should use r.content to find the encoding, and then set r.encoding. This will let you use r.text with the correct encoding.

Requests will also use custom encodings in the event that you need them. If you have created your own encoding and registered it with the codecs module, you can simply use the codec name as the value ofr.encoding and Requests will handle the decoding for you.

JSON Response Content

There's also a builtin JSON decoder, in case you're dealing with JSON data:

 
   >>> import requests

>>> r = requests.get('https://api.github.com/events')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...

In case the JSON decoding fails, r.json() raises an exception. For example, if the response gets a 204 (No Content), or if the response contains invalid JSON, attempting r.json() raises ValueError: No JSONobject could be decoded.

It should be noted that the success of the call to r.json() does not indicate the success of the response. Some servers may return a JSON object in a failed response (e.g. error details with HTTP 500). Such JSON will be decoded and returned. To check that a request is successful, use r.raise_for_status() or checkr.status_code is what you expect.

Raw Response Content

In the rare case that you'd like to get the raw socket response from the server, you can access r.raw. If you want to do this, make sure you set stream=True in your initial request. Once you do, you can do this:

 
   >>> r = requests.get('https://api.github.com/events', stream=True)

>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)

Custom Headers

If you'd like to add HTTP headers to a request, simply pass in a dict to the headers parameter.

For example, we didn't specify our user-agent in the previous example:

 
  >>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get(url, headers=headers)

Note: Custom headers are given less precedence than more specific sources of information. For instance:

Authorization headers set with headers= will be overridden if credentials are specified in .netrc, which in turn will be overridden by the auth= parameter.
Authorization headers will be removed if you get redirected off-host.
Proxy-Authorization headers will be overridden by proxy credentials provided in the URL.
Content-Length headers will be overridden when we can determine the length of the content.

Furthermore, Requests does not change its behavior at all based on which custom headers are specified. The headers are simply passed on into the final request.

Note: All header values must be a string, bytestring, or unicode. While permitted, it's advised to avoid passing unicode header values.

Cookies

If a response contains some Cookies, you can quickly access them:

 
   >>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

To send your own cookies to the server, you can use the cookies parameter:

 
   >>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

Cookies are returned in a RequestsCookieJar, which acts like a dict but also offers a more complete interface, suitable for use over multiple domains or paths. Cookie jars can also be passed in to requests:

>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', site='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', site='httpbin.org', path='/elsewhere')
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'