Python下载文件的方法

通过python采集时，经常需要从html 中获取图片或文件的URL并下载到本地，这里列举最常用的三种模块下载的方法：urllib模块、urllib2模块、requests模块。具体代码如下：

 1import urllib
 2import urllib2
 3import requests
 4url = 'http://www.test.com/wp-content/uploads/2012/06/wxDbViewer.zip'
 5print "downloading with urllib"
 6urllib.urlretrieve(url, "code.zip")
 7print "downloading with urllib2"
 8f = urllib2.urlopen(url)
 9data = f.read()
10with open("code2.zip", "wb") as code:
11    code.write(data)
12print "downloading with requests"
13r = requests.get(url)
14with open("code3.zip", "wb") as code:
15     code.write(r.content)

看起来使用urllib最为简单，一句语句即可。当然你可以把urllib2缩写成：

1f = urllib2.urlopen(url)
2with open("code2.zip", "wb") as code:
3   code.write(f.read())

上面的方法中，还可以设置timeout参数，避免采集一直阻塞。除上面的介绍外，还可以使用pycurl 模块进行下载文件。

 1import pycurl
 2import StringIO
 3##### init the env ###########
 4c = pycurl.Curl()
 5c.setopt(pycurl.COOKIEFILE, "cookie_file_name")#把cookie保存在该文件中
 6c.setopt(pycurl.COOKIEJAR, "cookie_file_name")
 7c.setopt(pycurl.FOLLOWLOCATION, 1) #允许跟踪来源
 8c.setopt(pycurl.MAXREDIRS, 5)
 9#设置代理 如果有需要请去掉注释，并设置合适的参数
10#c.setopt(pycurl.PROXY, 'http://11.11.11.11:8080')
11#c.setopt(pycurl.PROXYUSERPWD, 'aaa:aaa')
12########### get the data && save to file ###########
13head = ['Accept:*/*','User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0']
14buf = StringIO.StringIO()
15curl.setopt(pycurl.WRITEFUNCTION, buf.write)
16curl.setopt(pycurl.URL, url)
17curl.setopt(pycurl.HTTPHEADER,  head)
18curl.perform()
19the_page =buf.getvalue()
20buf.close()
21f = open("./%s" % ("img_filename",), 'wb')
22f.write(the_page)
23f.close()

如您感觉文章有用，可扫码捐赠本站！(If the article useful, you can scan the QR code to donate))

Python下载文件的方法

捐赠本站(Donate)

See Also

Latest articles

Categories

Tags

Links

Meta