嵌套json下的pandas操作
在进行json处理的时候,经常会遇到需要提取不同级别的字段内容并合并成表格,见下图所示。本篇就总结下遇到这种情况下如何进行文本内容的提取。在知乎上看到一篇《骚操作!嵌套 JSON 秒变 Dataframe》似乎和我遇到的情况类似,不过在实际处理的时候又有不同。
先看我要处理的原始数据:
1{
2 "code": 0,
3 "error": "",
4 "message": "",
5 "data": {
6 "page": 1,
7 "page_size": 3,
8 "total": 1026,
9 "list": [
10 [
11 {
12 "_object_id": "App",
13 "instanceId": "5b655ab36b9c3",
14 "name": "nginx"
15 },
16 {
17 "_object_id": "VIRTUAL_MACHINE",
18 "instanceId": "5b793e7f8f932",
19 "name": "CSSCU_VDU-18"
20 },
21 {
22 "_object_id": "PHYSICAL_MACHINE",
23 "instanceId": "5b471103effff",
24 "name": "IDC-0D12-S-SRV-41"
25 },
26 {
27 "_object_id": "SWITCH",
28 "instanceId": "5b4714cb11685",
29 "name": "IDC-0D11-M-TOR-01"
30 },
31 {
32 "_object_id": "SWITCH",
33 "instanceId": "5b4714ca14f5d",
34 "name": "IDC-0G06-M-EOR-02"
35 }
36 ],
37 [
38 {
39 "_object_id": "App",
40 "instanceId": "5b655ab36b9c3",
41 "name": "nginx"
42 },
43 {
44 "_object_id": "VIRTUAL_MACHINE",
45 "instanceId": "5b793e7f8f932",
46 "name": "CSSCU_VDU-18"
47 },
48 {
49 "_object_id": "PHYSICAL_MACHINE",
50 "instanceId": "5b471103effff",
51 "name": "IDC-0D12-S-SRV-41"
52 },
53 {
54 "_object_id": "SWITCH",
55 "instanceId": "5b4714cb11685",
56 "name": "IDC-0D11-M-TOR-01"
57 },
58 {
59 "_object_id": "SWITCH",
60 "instanceId": "5b4714c98cbd1",
61 "name": "IDC-0F06-M-EOR-01"
62 }
63 ],
64 [
65 {
66 "_object_id": "App",
67 "instanceId": "5b655ab36b9c3",
68 "name": "nginx"
69 },
70 {
71 "_object_id": "VIRTUAL_MACHINE",
72 "instanceId": "5b793e7f8f932",
73 "name": "CSSCU_VDU-18"
74 },
75 {
76 "_object_id": "PHYSICAL_MACHINE",
77 "instanceId": "5b471103effff",
78 "name": "IDC-0D12-S-SRV-41"
79 },
80 {
81 "_object_id": "SWITCH",
82 "instanceId": "5b4714ca966ae",
83 "name": "IDC-0D12-M-TOR-02"
84 },
85 {
86 "_object_id": "SWITCH",
87 "instanceId": "5b4714ca14f5d",
88 "name": "IDC-0G06-M-EOR-02"
89 }
90 ]
91 ]
92 }
93}
上面是一条CMDB路径处理下的三条数据,对应的路径是 APP1(nginx) — 对应的虚拟机—虚拟机所在的物理机 — TOR交换 — EOR交换机。直接通过pandas进行读取截取时,可以获取到如下信息:
1import json
2import pandas as pd
3f = open("vhost.json")
4results = json.load(f)
5print(results["data"]["list"])
6df = pd.DataFrame(results["data"]["list"],columns = ["APP","VM" , "PY", "TOR", "EOR"])
此时处理过以后,虽然可以正常返回对应的五列数据,不过每列的数据里又有json。
参考w3resource 上关于json的处理方法,多次折腾后,发现还是不对。所以就换笨方法,for循环解决。代码如下:
1for col in ["APP","VM" , "PY", "TOR", "EOR"]:
2 df[col] = df.apply(lambda df: df[col]["name"], axis=1)
处理完可以正常返回每个单元格里的json里的name字段。由于返回的这里是有重复值的,比如APP这里都是nginx,所以还会涉及到汇总去重统计,可以直接调用df.nunique()函数搞定。需要拿到每列的去重具体结果,可以使用DataFrame.drop_duplicates方法搞定。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/pandas-nested-json/6592.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.