抓取Azure partner伙伴信息
之前写过一个采集Azure partner信息的 python 脚本,后来也忘记放哪了,因为昨天刚研究了下AWS的,今天就研究下Azure的,其查询页面为:https://appsource.microsoft.com/en-us/marketplace/partner-dir?filter=products%3DAzure,分析发现其真正的数据是在 https://main.prod.marketplacepartnerdirectory.azure.com/api/partners?filter= 这个JSON页里,后面会根上不同的过滤条件。
分析发现其对应的过滤规律类似如下:
1products=Azure;sort=1;pageSize=18;pageOffset=18;onlyThisCountry=true;country=BR;radius=100;locname=Brazil;locationNotRequired=true
2products=Azure;sort=1;pageSize=18;pageOffset=36;onlyThisCountry=true;country=BR;radius=100;locname=Brazil;locationNotRequired=true
3products=Azure;sort=1;pageSize=18;pageOffset=54;onlyThisCountry=true;country=BR;radius=100;locname=Brazil;locationNotRequired=true
4products=Azure;sort=1;pageSize=18;pageOffset=72;onlyThisCountry=true;country=BR;radius=100;locname=Brazil;locationNotRequired=true
5products=Azure;sort=1;pageSize=18;pageOffset=90;onlyThisCountry=true;country=BR;radius=100;locname=Brazil;locationNotRequired=true
- sort=1,0 0和1代表两种不同的排序方式,暂未发现区别在哪里,不过输出的结果都是选择的区域的数据
- radius=100 代表距离该地方100公里以内,可以不写
- onlyThisCountry=true;country=BR;radius=100;locname=Brazil 这里的country和locname建议都写,比你在页面上检索brazil,会发现有一个地方指的美国,都写会更精确
了解其大致规律后,对应的脚本内容如下:
1# code from blog.361way.com
2import requests,json,xlsxwriter,argparse
3import urllib.parse
4
5def crawl(shortname,country):
6 num = 0
7 while True:
8 #params = 'products=Azure;sort=1;pageSize=18;pageOffset=' + str(num) + ';onlyThisCountry=true;country=BR;locname=Brazil;locationNotRequired=true'
9 params = 'products=Azure;sort=1;pageSize=18;pageOffset=' + str(num) + ';onlyThisCountry=true;country=' + shortname + ';locname=' + country +';locationNotRequired=true'
10 url = 'https://main.prod.marketplacepartnerdirectory.azure.com/api/partners?filter=' + (urllib.parse.quote(params))
11 num = num + 18
12 response = requests.get(url)
13 partners = response.text
14 data = json.loads(partners)
15 pnum = data['matchingPartners']['totalCount']
16 pdata = data['matchingPartners']['items']
17 for partner in pdata:
18 partnerId = partner['partnerId']
19 name = partner['name']
20 description = partner['description']
21 product = '\n'.join(partner['product'])
22 solutions = '\n'.join(partner['solutions'])
23 serviceType = '\n'.join(partner['serviceType'])
24 address = str(partner['location']['address'])
25 linkedIn = partner['linkedInOrganizationProfile']
26 print(partnerId)
27 data_arry.append([partnerId,name,country,description,product,solutions,serviceType,address,linkedIn])
28
29 if pnum < 18:
30 break
31
32def main(shortname,country):
33 crawl(shortname,country)
34 workbook = xlsxwriter.Workbook(country + '.xlsx')
35 worksheet = workbook.add_worksheet('Sheet1')
36 bold = workbook.add_format({'bold': 1})
37 headings = ['PartnerId', 'Name', 'Country','Description','Product','Solutions','ServiceType','Address','LinkedIn']
38 worksheet.write_row('A1', headings, bold)
39 row = 1
40 col = 0
41 for linev in data_arry:
42 #print linev
43 worksheet.write_row(row,col,linev)
44 row += 1
45 workbook.close()
46
47if __name__ == "__main__":
48 data_arry = []
49 # 创建 ArgumentParser 对象
50 parser = argparse.ArgumentParser(description='Input the country shortname and country name, you can get the azure partner informations')
51
52 # 添加命令行参数
53 parser.add_argument('shortname', type=str, help='Please input country shortname , for example: Brazil is BR')
54 parser.add_argument('country', type=str, help='Please input the aws partner country name, for examle: Brazil')
55
56
57 # 解析命令行参数
58 args = parser.parse_args()
59
60 # 调用 main 函数,并将解析后的参数传递给它
61 main(args.shortname, args.country)
这里指定了两个参数,一个是国家短代码,一个是国家代码,比如墨西哥的国家短代码是MX,国家代码为Mexico,这样传入这两个参数就会采集相关信息。因为是每次JSON取18条数据,这里的判断是,当一次获取的数据条目小于18时,就自动跳出循环 — 最后一页。
多个国家的数据抓取可以写在一个bash里,内容如下:
1python azure_partner.py MX Mexico
2python azure_partner.py CL Chile
3python azure_partner.py AR Argentina
4python azure_partner.py CO Colombia
5python azure_partner.py CR 'Costa Rica'
6python azure_partner.py DO 'Dominican Republic'
7python azure_partner.py GT Guatemala
8python azure_partner.py HN Honduras
9python azure_partner.py PA 'Panama Canal, Panama'
10python azure_partner.py PE Peru
11python azure_partner.py EC Ecuador
因为部分国家中间有空格分隔,这里就使用单引号引起来即可。
注:该代码仅用于技术研究,请不要用于非法采集。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/2023/12/crawl-azure-partner.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.