抓取AWS partner伙伴信息
之前写过一个采集AWS partner信息的 python 脚本,后来忘记放哪了,今天没事又重新写了一个,内容如下:
1# code from blog.361way.com
2import requests,json,xlsxwriter,argparse
3
4def increment(start, end, step):
5 """
6 递增函数
7
8 :param start: 开始值
9 :param end: 结束值
10 :param step: 步长
11 :return: 递增序列
12 """
13 sequence = []
14 for i in range(start, end + 1, step):
15 sequence.append(i)
16 return sequence
17
18def crawl(num,country):
19 url = 'https://api.finder.partners.aws.a2z.com/search?locale=en&highlight=on&sourceFilter=searchPage&size=10&location=' + country + '&from=' + str(num)
20
21 response = requests.get(url)
22 partners = response.text
23 data = json.loads(partners)
24 pdata = data["message"]["results"]
25
26 #print(data)
27 #print(data["message"]["results"][0])
28 for i in range((len(pdata))):
29 dcompany = data["message"]["results"][i]
30 id = dcompany['_id']
31 name = dcompany['_source']['name']
32 country = country
33 brief_description = dcompany['_source']['brief_description']
34 current_program_status = dcompany['_source']['current_program_status']
35 customer_type = dcompany['_source']['current_program_status']
36 description = dcompany['_source']['description']
37 website = dcompany['_source']['website']
38 data_arry.append([id,name,country,brief_description,current_program_status,customer_type,description,website])
39
40def main(country, number):
41 #data_arry = []
42 country = country
43 sequence = increment(0, number, 10)
44 for num in sequence:
45 crawl(num,country)
46
47 workbook = xlsxwriter.Workbook(country + '.xlsx')
48 worksheet = workbook.add_worksheet('Sheet1')
49 bold = workbook.add_format({'bold': 1})
50 headings = ['ID', 'Name', 'Country','Brief_description','Current_program_status','Customer_type','Description','Website']
51 worksheet.write_row('A1', headings, bold)
52 row = 1
53 col = 0
54 for linev in data_arry:
55 #print linev
56 worksheet.write_row(row,col,linev)
57 row += 1
58 workbook.close()
59
60if __name__ == "__main__":
61 data_arry = []
62 # 创建 ArgumentParser 对象
63 parser = argparse.ArgumentParser(description='Input the country name and number, you can get the aws partner informations')
64
65 # 添加命令行参数
66 parser.add_argument('country', type=str, help='Please input the aws partner country name')
67 parser.add_argument('number', type=int, help='Please input how many partners need crawl')
68
69 # 解析命令行参数
70 args = parser.parse_args()
71
72 # 调用 main 函数,并将解析后的参数传递给它
73 main(args.country, args.number)
这里指定了两个参数,一个是国家一个是采集的条目数,这个可以在打开AWS伙伴时可以看到。比如打开 https://partners.amazonaws.com/search/partners/?loc=Brazil 该页面,可以看到有274个条目,这里时候就可以使用 python aws_partner.py Brazil 274 运行获取结果信息了,运行后的信息会存在 brazil.xlsx里。
注:该代码仅用于技术研究,请不要用于非法采集。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/2023/12/crawl-aws-partner.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.