Python下载文件的方法
通过python采集时 ,经常需要从html 中获取图片或文件的URL并下载到本地,这里列举最常用的三种模块下载的方法:urllib模块、urllib2模块、requests模块。具体代码如下:
1import urllib
2import urllib2
3import requests
4url = 'http://www.test.com/wp-content/uploads/2012/06/wxDbViewer.zip'
5print "downloading with urllib"
6urllib.urlretrieve(url, "code.zip")
7print "downloading with urllib2"
8f = urllib2.urlopen(url)
9data = f.read()
10with open("code2.zip", "wb") as code:
11 code.write(data)
12print "downloading with requests"
13r = requests.get(url)
14with open("code3.zip", "wb") as code:
15 code.write(r.content)
看起来使用urllib最为简单,一句语句即可。当然你可以把urllib2缩写成:
1f = urllib2.urlopen(url)
2with open("code2.zip", "wb") as code:
3 code.write(f.read())
上面的方法中,还可以设置timeout参数,避免采集一直阻塞。除上面的介绍外,还可以使用pycurl 模块进行下载文件。
1import pycurl
2import StringIO
3##### init the env ###########
4c = pycurl.Curl()
5c.setopt(pycurl.COOKIEFILE, "cookie_file_name")#把cookie保存在该文件中
6c.setopt(pycurl.COOKIEJAR, "cookie_file_name")
7c.setopt(pycurl.FOLLOWLOCATION, 1) #允许跟踪来源
8c.setopt(pycurl.MAXREDIRS, 5)
9#设置代理 如果有需要请去掉注释,并设置合适的参数
10#c.setopt(pycurl.PROXY, 'http://11.11.11.11:8080')
11#c.setopt(pycurl.PROXYUSERPWD, 'aaa:aaa')
12########### get the data && save to file ###########
13head = ['Accept:*/*','User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0']
14buf = StringIO.StringIO()
15curl.setopt(pycurl.WRITEFUNCTION, buf.write)
16curl.setopt(pycurl.URL, url)
17curl.setopt(pycurl.HTTPHEADER, head)
18curl.perform()
19the_page =buf.getvalue()
20buf.close()
21f = open("./%s" % ("img_filename",), 'wb')
22f.write(the_page)
23f.close()
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/python-download-url-file/3921.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.