python strip_tags实现
Sometimes it is necessary to remove all (or some subset of) xml style tags (eg.) from a string. If you’re familiar with PHP, then you probably already know about the strip_tags() function. Here is a simple equivalent to strip_tags() written in Python.
1## Remove xml style tags from an input string.
2#
3# @param string The input string.
4# @param allowed_tags A string to specify tags which should not be removed.
5def strip_tags(string, allowed_tags=''):
6 if allowed_tags != '':
7 # Get a list of all allowed tag names.
8 allowed_tags_list = re.sub(r'[\/<> ]+', '', allowed_tags).split(',')
9 allowed_pattern = ''
10 for s in allowed_tags_list:
11 if s == '':
12 continue;
13 # Add all possible patterns for this tag to the regex.
14 if allowed_pattern != '':
15 allowed_pattern += '|'
16 allowed_pattern += '<' + s + ' [^><]*>$|<' + s + '>|<!--' + s + '-->'
17 # Get all tags included in the string.
18 all_tags = re.findall(r'<!--?[^--><]+>', string, re.I)
19 for tag in all_tags:
20 # If not allowed, replace it.
21 if not re.match(allowed_pattern, tag, re.I):
22 string = string.replace(tag, '')
23 else:
24 # If no allowed tags, remove all.
25 string = re.sub(r'<[^>]*?>', '', string)
26 return string
Sample output
1>>> strip_tags('<b>Hello</b> <i>World!</i> <hr />')
2'Hello World! '
3>>> strip_tags('<b>Hello</b> <i>World!</i> <hr />', '<b>')
4'<b>Hello</b> World!'
5>>> strip_tags('<b>Hello</b> <i>World!</i> <hr />', '<b>,<hr>')
6'<b>Hello</b> World! <hr />'
7>>>
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/python-strip_tags/3937.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.