第五章.文本过滤

1.正则表达式

一种用来描述文本模式的特殊语法，由普通字符以及特殊字符（元字符）组成

 1^    ----只匹配行首
 2$    ----只匹配行尾
 3*    ----匹配0个或多个此单字符
 4[]   ----只匹配[]内字符，可以使用-表示序列范围[1-5]
 5    ----屏蔽一个元字符的特殊含义
 6.    ----匹配任意单字符
 7pattern{n} 只用来匹配前面pattern出现的次数，n为次数
 8pattern{n，}只用来匹配前面pattern出现的次数，至少为n
 9pattern{n，m}只用来匹配前面pattern出现的次数，次数在n-m之间
10eg：
11A{3}B   AAAB
12A{3,}B AAAB AAAAB ...
13A{3,5}B AAAB AAAAB AAAAAB

2.find命令

find 查找文件和目录

 1find pathname -options [-print -exec -ok]
 2pathname --查找的目录路径. .--表示当前目录，/表示根目录
 3-print 输出
 4-exec 对匹配的文件执行该参数所给出的shell命令，相应命令形式为'command'{} ;'    注意{}和;之间的空格
 5-ok    与-exec相同，不过执行命令前会有提示
 6
 7options   ：
 8-name
 9-perm
10-user
11-group
12-mtime -n +n (atime,-ctime) 修改时间（访问时间，创建时间）
13-size n[c]
14-type 查找某一类型的文件
15eg.
16[test@szbirdora 1]$ find ./ -mtime +5
17./helloworld.sh
18./nohup.out

查看./目录（当前）下修改时间超过5天的文件

3.grep介绍

 1grep -c 输出匹配行计数
 2grep -i 不区分大小写
 3grep -h 查询多文件时不显示文件名
 4grep -H 显示文件名
 5grep -l 查询多文件时只输出包含匹配字符的文件名
 6grep -n 显示匹配行及行号
 7grep -s 不显示不存在或不匹配文本的错误信息
 8grep -v 显示不包含匹配文本的所有行（过滤文本）
 9eg.
10[test@szbirdora 1]$ grep -n 's.a' myfile
112:/dev/sda1              20G 3.3G   16G 18% /
124:/dev/sda2              79G   18G   58G 23% /u01
135:/dev/sda4              28G 3.9G   22G 15% /u02
14[test@szbirdora 1]$ grep -n '2$' myfile
155:/dev/sda4              28G 3.9G   22G 15% /u02

grep -options ‘正则表达式’ filename

4.sed介绍

sed不与初始化文件打交道，它操作的只是一个拷贝，然后所有的改动如果没有重定向到一个文件将输出到屏幕
sed是一种重要的文本过滤工具，使用一行命令或使用管道与grep与awk相结合。

  1sed调用：
  21.命令 sed [options] '正则表达式sedcommand' input-files
  32.script :sed [options] -f sedscript input-files
  4sed在文本中查询文本的方式
  5   -行号，可以是简单数字，或一个行号范围
  6   -使用正则表达式
  7x ----行号
  8x,y ----行号范围从x到y
  9x,y! ---不包含行号x到y
 10sed命令选项：
 11-n 不打印
 12-c 下一个命令是编辑命令
 13-f 如果正在调用sed脚本文件
 14基本sed命令
 15p 打印匹配行
 16= 显示文本行号
 17a 在定位行号后附加新文本信息
 18i在定位行号前插入新文本信息
 19d 删除定位行
 20c用新文本替换定位文本
 21s 使用替换模式替换相应模式
 22r 从另一个文件中读文本
 23w 写文本到一个文件
 24q 第一个模式匹配完成后退去
 25l 显示与八进制ascii代码等价的控制字符
 26{}在定位行执行命令组
 27n 从一个文件中读文本下一行，并附加在下一行
 28g 将模式2粘贴到/pattern n/
 29y 传送字符
 30eg.
 31[test@szbirdora 1]$ sed -n '2p' myfile
 32c
 33打印myfile第2行
 34[test@szbirdora 1]$ sed -n '2,4p' myfile
 35c
 36f
 37b
 38打印第二行到第四行
 39[test@szbirdora 1]$ sed -n '/a/p' myfile
 40a
 41打印匹配a的行
 42[test@szbirdora 1]$ sed -n '2,/2/p' myfile
 43c
 44f
 45b
 461
 472
 48打印第二行到匹配'2'的行
 49 
 50s命令替换
 51[test@szbirdora 1]$ sed 's/b/a/p' myfile
 52a
 53a
 54a
 55c
 56d
 57e
 58替换b为a
 59多点编辑 -e
 60eg. （myfile包含a-e）
 61[test@szbirdora 1]$ sed -e '2d' -e 's/c/d/' myfile 11
 62a
 63d
 64d
 65e
 66sed命令r ---从文件中读取选定的行，读入输入文件中，显示在匹配的行后面
 67eg.
 68[test@szbirdora 1]$ cat 11
 69*******************Alaska***************
 70[test@szbirdora 1]$ sed '/a/r 11' myfile
 71a
 72*******************Alaska***************
 73b
 74c
 75d
 76e
 77写入命令：w   将输入文件中的匹配行写入到指定文件中
 78eg.
 79[test@szbirdora 1]$ cat 11
 80b
 81[test@szbirdora 1]$ sed -n '/a/w 11' myfile
 82[test@szbirdora 1]$ cat 11
 83a
 84追加：a   将文本追加到匹配行的后面。sed要求在a后加,不止一行的以连接
 85eg.
 86[test@szbirdora 1]$ sed '/b/a****************hello*************-------------china---------' myfile
 87a
 88b
 89****************hello*************-------------china---------
 90c
 91d
 92e
 93插入命令：i   将文本插入到匹配行的前面。sed要求在a后加,不止一行的以连接
 94eg.
 95[test@szbirdora 1]$ sed '/b/i
 96> THE CHARACTER B IS BEST
 97> *******************************' myfile
 98a
 99THE CHARACTER B IS BEST
100*******************************
101b
102c
103d
104e
105下一个：n 从一个文件中读文本下一行，并附加在下一行
106退出命令 q 打印多少行后退出
107eg.
108[test@szbirdora 1]$ sed '3q' myfile
109a alert
110b best
111c cook
112sed script:
113sed -f scriptfile myfile

5.awk

awk可从文件或字符串值基于指定规则浏览和抽取信息。awk三种调用方式：

1.命令行方式

1awk [-F field-sperator]'pattern{active}' input-files
2awk [-F field-sperator]'command' input-files

awk脚本
所有awk命令插入一个文件，并使awk程序可执行，然后用awk命令解析器作为脚本的首行，以便通过键入脚本名称来调用。
awk命令插入一个单独文件

1awk -f awk-script-file input-files

awk脚本由模式和动作组成。分隔符、域、记录

  1注意这里的$1,$2是域与位置变量$1,$2不一样。$0文件中的所有记录
  2eg：
  3awk '{print $0}' myfile
  4awk 'BEGIN {print "IP DATE ----"}{print $1"t"$4}END{print "end-of -report"}
  5[test@szbirdora 1]$ df |awk '$1!~"dev"'|grep -v Filesystem
  6none                   1992400         0   1992400   0% /dev/shm
  7[test@szbirdora 1]$ df |awk '{if ($1=="/dev/sda1") print $0}'
  8/dev/sda1             20641788   3367972 16225176 18% /
  9[test@szbirdora shelltest]$ cat employee
 10Tom Jones       4424    5/12/66 543354
 11Mary Adams      5346    11/4/63 28765
 12Sally Chang     1654    7/22/54 650000
 13Billy Black     1683    9/23/44 336500
 14[test@szbirdora shelltest]$ awk '/[Aa]dams/' employee
 15Mary Adams      5346    11/4/63 28765
 16[test@szbirdora shelltest]$ sed -n '/[Aa]dams/p' employee
 17Mary Adams      5346    11/4/63 28765
 18[test@szbirdora shelltest]$ grep '[Aa]dams' employee
 19Mary Adams      5346    11/4/63 28765
 20三种命令方式下，使用模式匹配查询
 21[test@szbirdora shelltest]$ awk '{print $1}' employee
 22Tom
 23Mary
 24Sally
 25Billy
 26打印文件第一列
 27[test@szbirdora shelltest]$ awk '/Sally/{print $1"t"$2}' employee
 28Sally   Chang
 29打印匹配Sally的行的第一列和第二列
 30[test@szbirdora shelltest]$ df |awk '$4>20884623'
 31Filesystem           1K-blocks      Used Available Use% Mounted on
 32/dev/sda2             82567220 17488436 60884616 23% /u01
 33/dev/sda4             28494620   4589172 22457992 17% /u02
 34打印df输出第四列大于××的行
 35格式输出：
 36打印函数—
 37[test@szbirdora shelltest]$ date
 38Mon Mar 10 15:15:47 CST 2008
 39[test@szbirdora shelltest]$ date |awk '{print "Month:" $2"nYear:" $6}'
 40Month:Mar
 41Year:2008
 42[test@szbirdora shelltest]$ awk '/Sally/{print "ttHave a nice day,"$1"t"$2}' employee
 43                Have a nice day,Sally   Chang
 44printf函数
 45[test@szbirdora shelltest]$ echo "LINUX"|awk '{printf "|%-10s|n",$1}'
 46|LINUX     |
 47[test@szbirdora shelltest]$ echo "LINUX"|awk '{printf "|%10s|n",$1}'
 48|     LINUX|
 49～匹配符
 50[test@szbirdora shelltest]$ awk '$1~/Tom/{print $1,$2}' employee
 51Tom Jones
 52awk 给表达式赋值
 53关系运算符：
 54<             小于  
 55>             大于
 56==           等于
 57!=            不等于
 58>=           大于等于
 59<=           小于等于
 60~              匹配
 61!~            不匹配
 62eg.
 63[test@szbirdora shelltest]$ cat employee
 64Tom Jones       4424    5/12/66 543354
 65Mary Adams      5346    11/4/63 28765
 66Sally Chang     1654    7/22/54 650000
 67Billy Black     1683    9/23/44 336500
 68[test@szbirdora shelltest]$ awk '$2~/Adams/' employee
 69Mary Adams      5346    11/4/63 28765
 70条件表达式：
 71condition   expression1?expression2:expression3
 72eg.
 73awk '{max=($1>$2) ? $1:$2;print max}' filename
 74
 75运算符: +，-，*，/,%,^,&&,||,!
 76
 77[test@szbirdora shelltest]$ cat /etc/passwd |awk -F: '
 78NF!=7{
 79printf("line %d does not have 7 fields:%sn",NR,$0)}
 80$1!~/[A-Za-z0-9]/{printf("line %d,nonalphanumberic user id:%sn",NR,$0)}
 81$2=="*"{printf("line %d,no password:%sn",NR,$0)}'
 82awk编程
 83递增操作符 x++，++x
 84递减操作符 x--，--x
 85BEGIN模块
 86BEGIN模块后面紧跟着动作块，在读入文件前执行。通常被用来改变内建变量的值，如：FSRSOFS,初始化变量的值和打印输出标题。
 87[test@szbirdora shelltest]$ awk 'BEGIN{print "HELLO WORLD"}'
 88HELLO WORLD
 89[test@szbirdora shelltest]$ awk 'BEGIN{print "---------LIST---------"}{print}END{print "------END--------"}' donors
 90---------LIST---------
 91Mike Harrington:(510) 548-1278:250:100:175
 92Christian Dobbins:(408) 538-2358:155:90:201
 93Susan Dalsass:(206) 654-6279:250:60:50
 94Archie McNichol:(206) 548-1348:250:100:175
 95Jody Savage:(206) 548-1278:15:188:150
 96Guy Quigley:(916) 343-6410:250:100:175
 97Dan Savage:(406) 298-7744:450:300:275
 98Nancy McNeil:(206) 548-1278:250:80:75
 99John Goldenrod:(916) 348-4278:250:100:175
100Chet Main:(510) 548-5258:50:95:135
101Tom Savage:(408) 926-3456:250:168:200
102Elizabeth Stachelin:(916) 440-1763:175:75:300
103------END--------
104重定向和管道
105输出重定向
106awk输出重定向到一个文件需要使用输出重定向符，输出文件名需要用双引号括起来。
107[test@szbirdora shelltest]$ awk -F: '{print $1,$2>"note"}' donors
108[test@szbirdora shelltest]$ cat note
109Mike Harrington (510) 548-1278
110Christian Dobbins (408) 538-2358
111Susan Dalsass (206) 654-6279
112Archie McNichol (206) 548-1348
113Jody Savage (206) 548-1278
114Guy Quigley (916) 343-6410
115Dan Savage (406) 298-7744
116Nancy McNeil (206) 548-1278
117John Goldenrod (916) 348-4278
118Chet Main (510) 548-5258
119Tom Savage (408) 926-3456
120Elizabeth Stachelin (916) 440-1763
121输入重定向
122getline函数
123[test@szbirdora shelltest]$ awk 'BEGIN{"date +%Y"|getline d;print d}'
1242008
125[test@szbirdora shelltest]$ awk -F"[ :]" 'BEGIN{printf "What is your name?";
126getline name<"/dev/tty"}
127$1~ name{print "Foundt" name "ton line",NR"."}
128END{print "see ya," name "."}' donors
129What is your name?Jody
130Found   Jody    on line 5.
131see ya,Jody.
132[test@szbirdora shelltest]$ awk 'BEGIN{while(getline<"/etc/passwd">0)lc++;print lc}'
13336
134从文件中输入，如果得到一个记录，getline函数就返回1，如果文件已经到了末尾，则返回0，如果文件名错误则返回-1.
135管道：
136awk命令打开一个管道后要打开下一个管道需要关闭前一个管道，管道符右边可以使用“”关闭管道。在同一时间只有一个管道存在
137[test@szbirdora shelltest]$ awk '{print $1,$2|"sort -r +1 -2 +0 -1"}' names
138tony tram
139john smith
140dan savage
141john oldenrod
142barbara nguyen
143elizabeth lone
144susan goldberg
145george goldberg
146eliza goldberg
147alice cheba
148|后用""关闭管道
149system函数
150system（"LINUX command"）
151system("cat" $1)
152system("clear")
153条件语句
1541.if（）{}
1552.if(){}
156else{}
1573.if(){}
158else if(){}
159else if(){}
160else{}
161[test@szbirdora shelltest]$ awk -F: '{if ($3>250){printf "%-2s%13sn",$1,"-----------good partman"}else{print $1}}' donors
162循环语句
163[test@szbirdora shelltest]$ awk -F: '{i=1;while(i<=NF){print NF,$i;i++}}' donors
164循环控制语句break、continue
165程序控制语句
166next从输入文件中读取下一行，然后从头开始执行awk脚本
167{if($1~/Peter/){next}
168else{print}}
169exit 结束awk语句，但不会结束END模块的处理。
170数组：
171awk '{name[x++]=$1;for(i=0;i<NR;i++){print i,name[i]}}' donors
172(P177)---2008.3.11
173awk内建函数
174sub（正则表达式，替换字符[，$n]） ---域n匹配正则表达式的字符串将被替换。
175[test@szbirdora shelltest]$ awk '{sub(/Tom/,"Jack",$1);print}' employee
176Jack Jones 4424 5/12/66 543354
177Mary Adams      5346    11/4/63 28765
178Sally Chang     1654    7/22/54 650000
179Billy Black     1683    9/23/44 336500
180Jack He 3000 8/22/44 320000
181index函数 index（字符串，子字符串） 子字符串在字符串中的位置
182[test@szbirdora shelltest]$ awk 'BEGIN{a=index("hello","llo");print a}'
1833.
184length函数 length（string） 字符串的长度
185[test@szbirdora shelltest]$ awk 'BEGIN{a=length("hello world");print a}'
18611
187substr函数 substr（字符串，开始位置[，子字符串长度]）
188[test@szbirdora shelltest]$ awk 'BEGIN{a=substr("hello world",7);print a}'
189world
190[test@szbirdora shelltest]$ awk 'BEGIN{a=substr("hello world",7,3);print a}'
191wor
192match(string,正则表达式) 找出字符串中第一个匹配正则表达式的位置,其内建变量RSTART为匹配开始位置，RLENGTH为匹配开始后字符数
193[test@szbirdora shelltest]$ awk '{a=match($0,/Jon/);if (a!=0){print NR,a}}' employee
1941 5
195[test@szbirdora shelltest]$ awk '{a=match($0,/Jon/);if (a!=0){print NR,a,RSTART,RLENGTH}}' employee
1961 5 5 3
197toupper和tolower函数
198[test@szbirdora shelltest]$ awk 'BEGIN{a=toupper("hello");print a}'
199HELLO
200split函数 split（string,array,fieldseperator）
201[test@szbirdora shelltest]$ awk 'BEGIN{"date"|getline d;split(d,date);print date[2]}'
202Mar
203时间函数
204systime（） ----1970年1月1日到当前忽略闰年得出的秒数。
205strftime(格式描述，时间戳)
206[test@szbirdora shelltest]$ awk 'BEGIN{d=strftime("%T",systime());print d}'
20713:08:09
208[test@szbirdora shelltest]$ awk 'BEGIN{d=strftime("%D",systime());print d}'
20903/12/08
210[test@szbirdora shelltest]$ awk 'BEGIN{d=strftime("%Y",systime());print d}'
2112008

6.sort介绍

 1sort：
 2     -c 测试文件是否已经排序
 3     -m 合并两个排序文件
 4     -u 删除所有复制行
 5     -o 存储sort结果的输出文件名
 6     -t 域分隔符；用非空格或tab键分割域
 7     +n n为域号，使用此域号开始排序   （注意0是第一列）
 8     -r 逆序排序
 9     n 指定排序是域上的数字排序项
10[test@szbirdora 1]$ df -lh|grep -v 'Filesystem'|sort +1
11none                  2.0G     0 2.0G   0% /dev/shm
12/dev/sda1              20G 3.3G   16G 18% /
13/dev/sda4              28G 3.9G   22G 15% /u02
14/dev/sda2              79G   17G   59G 23% /u01

uniq [option]files 从一个文本文件中去除或禁止重复行
-u 只显示不重复行
-d 只显示有重复数据行，每重复行只显示其中一行
-c 打印每一重复行出现次数
-f n为数字，前n个域被忽略

7.split cut join 分割和合并文件命令

1[test@szbirdora 1]$ split -l 2 myfile split   （每两行分割为一个以split名称开头的文件）
2[test@szbirdora 1]$ ls
3case.sh df.out helloworld.sh iftest.sh myfile nohup.out nullfile.txt parm.sh splitaa splitab splitac splitad splitae
4[test@szbirdora 1]$ cat splitaa
5Filesystem            Size Used Avail Use% Mounted on
6/dev/sda1              20G 3.3G   16G 18% /

如您感觉文章有用，可扫码捐赠本站！(If the article useful, you can scan the QR code to donate))

1.正则表达式

2.find命令

3.grep介绍

4.sed介绍

5.awk

1.命令行方式

6.sort介绍

7.split cut join 分割和合并文件命令

捐赠本站(Donate)

See Also

Latest articles

Categories

Tags

Links

Meta