股票量化分析（一）获取A股列表

2015年的股市是当下的热门话题，同事的朋友弄了一个简单的弹股吐槽单页面单日PV就能达30W+ ，相当于本博客一年的PV量。所以站在技术角度，这里也写几篇关于股票技术面的文章。首先本篇先从获取A股列表说起。

目的：获取当前上交所和深交所有A股列表。

一、官方站获取

官方站有两个：

1、上交所官网

2、深交所官网

所不同的是，深交所直接提供了EXCEL导出的方式

而上交所比较蛋疼，没有直接提供下载页面，所以需要通过页面扒取，在进行页面分析后发现其所有的股市列表藏在JS文件中。如下：

http://www.sse.com.cn/js/common/ssesuggestdata.js （A股 + B股）

http://www.sse.com.cn/js/common/ssesuggestEbonddata.js （转债）

由于只关注A股，所以这里只取上面的js文件中以60开头的股票。该js文件可以通过curl 或 wget获取并可以通过简单的shell 处理后获取：

 1# JS文件中的数据格式
 2function get_data(){
 3var _t = new Array();
 4_t.push({val:"600000",val2:"浦发银行",val3:"pfyx"});
 5_t.push({val:"600004",val2:"白云机场",val3:"byjc"});
 6_t.push({val:"600005",val2:"武钢股份",val3:"wggf"});
 7_t.push({val:"600006",val2:"东风汽车",val3:"dfqc"});
 8…………………………
 9#shell 语句处理后的格式
10# by 运维之路（www.361way.com）
11[root@361way ~]# wget http://www.sse.com.cn/js/common/ssesuggestdata.js
12[root@361way ~]# grep push ssesuggestdata.js |sed  s/\[val2,'});',\",val3\]//g|awk -F: '{print $2,$3,$4}'|grep ^60
13600000 浦发银行 pfyx
14600004 白云机场 byjc
15600005 武钢股份 wggf
16600006 东风汽车 dfqc
17……………………

所以这种方法获取相对比较简单也快捷。当然，也可以使用selenium + python 模拟浏览器访问扒取。后面会单独讲到。

二、第三方站点获取

官方站获取的方法，需要从两个官方站上分别取数据，而第三方站很多会向二个官方站交“ 保护费” ，所以可以通过API 直接取到数据，并且可以将深沪两市的A股数据归拢在一起。国内做的相对较好的主要有以下四家：

1、腾讯证券－－ http://stockapp.finance.qq.com/mstats/#mod=list

2、新浪财经－－ http://finance.sina.com.cn/data/#stock-schq-hsgs

3、凤凰财经－－ http://app.finance.ifeng.com/list/stock.php?t=ha

4、东方财富网－－ http://quote.eastmoney.com/center/list.html#33

这四家中企鹅的做的最人性化，除支持各种排序外，还支持excel 导出。直接是沪深两市A股直接导出。虽然一向不喜欢这只肥企鹅，不过事实求是，确实做的不错。另外三家就需要网页爬取了。

三、selenium + python抓数据

相较上面两种方法，这种是最笨的，而且取数据也是最慢的一种方法。不到万不得已，不推荐该方法（能用request、urlib2等模块尽量用），不过由于selenium模块实在牛B，多用于自动化测试和极品抓取环境下，这里权当做学习了。先上代码：

 1[root@localhost stock]# cat get_sh.py
 2# -*- encoding: utf-8 -*-
 3# by 运维之路（361way.com）
 4import sys
 5import cPickle
 6#import pickle
 7import selenium
 8from pyvirtualdisplay import Display
 9display = Display(visible=0, size=(1024, 768))
10display.start()
11from selenium.webdriver.support.ui import WebDriverWait         # available since 2.4.0
12#   from selenium.common.exceptions import TimeoutException
13#   from selenium.webdriver.support import expected_conditions as EC    # available since 2.26.0
14def wait_condition_01(driver):
15    return driver.find_element_by_id('dateList_container_pageid')
16def extract_table(driver, stocklist):
17    tag_table= driver.find_element_by_class_name("tablestyle")
18    tabletext= tag_table.text
19    stocklist.extend(tabletext.split('\n')[1:])
20driver= selenium.webdriver.Firefox()
21driver.get("http://www.sse.com.cn/assortment/stock/list/name/")
22stocklist= []
23extract_table(driver=driver, stocklist=stocklist)
24tag_meta= driver.find_element_by_id("staticPagination")
25attr_total= int(tag_meta.get_attribute("total"))
26attr_pageCount= int(tag_meta.get_attribute("pageCount"))
27# 逐页提取内容
28for pagenr in range(2, attr_pageCount+1):
29    id_input= 'dateList_container_pageid' if pagenr > 2 else 'xsgf_pageid'
30    id_button= 'dateList_container_togo' if pagenr > 2 else 'xsgf_togo'
31    tag_input= driver.find_element_by_id(id_input)
32    tag_button= driver.find_element_by_id(id_button)
33    tag_input.send_keys(str(pagenr))
34    tag_button.click()
35    WebDriverWait(driver, 10).until(wait_condition_01)
36    extract_table(driver=driver, stocklist=stocklist)
37# 向主调进程发送结果
38data= {
39    '个股总数': attr_total,
40    '个股列表': stocklist,
41}
42driver.quit()
43#pdata= pickle.dumps(data, protocol=2)
44pdata= cPickle.dumps(data, protocol=2)
45sys.stdout.write( pdata + b'\n' )

在使用过程中可能会遇到如下问题：

问题1：直接selenium + python报错

使用后报错如下：

 1Traceback (most recent call last):
 2  File "<stdin>", line 1, in <module>
 3  File "/usr/lib/python2.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 64, in __init__
 4    self.binary, timeout),
 5  File "/usr/lib/python2.6/site-packages/selenium/webdriver/firefox/extension_connection.py", line 51, in __init__
 6    self.binary.launch_browser(self.profile)
 7  File "/usr/lib/python2.6/site-packages/selenium/webdriver/firefox/firefox_binary.py", line 70, in launch_browser
 8    self._wait_until_connectable()
 9  File "/usr/lib/python2.6/site-packages/selenium/webdriver/firefox/firefox_binary.py", line 100, in _wait_until_connectable
10    raise WebDriverException("The browser appears to have exited "
11selenium.common.exceptions.WebDriverException: Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary constructor, check it for details.

解决方法是加入pyvirtualdisplay模块，并调用如下：

 1#!/usr/bin/env python
 2from pyvirtualdisplay import Display
 3from selenium import webdriver
 4display = Display(visible=0, size=(1024, 768))
 5display.start()
 6browser = webdriver.Firefox()
 7browser.get('http://www.ubuntu.com/')
 8print browser.page_source
 9browser.close()
10display.stop()

问题2：selenium + python + pyvirtualdisplay报错

报错内容如下：

 1>>> from pyvirtualdisplay import Display
 2>>> from selenium import webdriver
 3>>> display = Display(visible=0, size=(1024, 768))
 4Traceback (most recent call last):
 5  File "<stdin>", line 1, in <module>
 6  File "/usr/lib/python2.6/site-packages/pyvirtualdisplay/display.py", line 33, in __init__
 7    self._obj = self.display_class(
 8  File "/usr/lib/python2.6/site-packages/pyvirtualdisplay/display.py", line 51, in display_class
 9    cls.check_installed()
10  File "/usr/lib/python2.6/site-packages/pyvirtualdisplay/xvfb.py", line 38, in check_installed
11    ubuntu_package=PACKAGE).check_installed()
12  File "/usr/lib/python2.6/site-packages/easyprocess/__init__.py", line 209, in check_installed
13    raise EasyProcessCheckInstalledError(self)
14easyprocess.EasyProcessCheckInstalledError: cmd=['Xvfb', '-help']
15OSError=[Errno 2] No such file or directory
16Program install error!

从pypi 站点上了解到，其后端需要使用Xvfb 、Xephyr 、Xvnc三者任意一个。这里选了第一个，如下方法安装：

1#centos下
2yum -y insatll xorg-x11-server-Xvfb
3#ubuntu下
4sudo apt-get install xvfb

再通过python get_sh.py就可以正常获取数据。取出的列表数据并不直接，还需要近一步处理。

参考页面：

stackoverflow

selenium github doc

selenium readthedocs

如您感觉文章有用，可扫码捐赠本站！(If the article useful, you can scan the QR code to donate))

股票量化分析（一）获取A股列表

一、官方站获取

二、第三方站点获取

三、selenium + python抓数据

问题1：直接selenium + python报错

问题2：selenium + python + pyvirtualdisplay报错

捐赠本站(Donate)

See Also

Latest articles

Categories

Tags

Links

Meta