该采集链接是从Snoopy中提取出来的,也是一个很好的函数,可以根据URL是相对链接还是绝对链接采集到链接,如果是相对链接会根据相对链接和主域名,返回绝对链接,也支持不同端口。

 1<?php
 2/*===================================================================*
 3    Function:   _expandlinks
 4    Purpose:    expand each link into a fully qualified URL
 5    Input:      $links          the links to qualify
 6                $URI            the full URI to get the base from
 7    Output:     $expandedLinks  the expanded links
 8*===================================================================*/
 9function _expandlinks($links,$URI)
10{
11    $URI_PARTS = parse_url($URI);
12    $host = $URI_PARTS["host"];
13    preg_match("/^[^?]+/",$URI,$match);
14    $match = preg_replace("|/[^/.]+.[^/.]+$|","",$match[0]);
15    $match = preg_replace("|/$|","",$match);
16    $match_part = parse_url($match);
17    $match_root =
18    $match_part["scheme"]."://".$match_part["host"];
19    $search = array(    "|^http://".preg_quote($host)."|i",
20                        "|^(/)|i",
21                        "|^(?!http://)(?!mailto:)|i",
22                        "|/./|",
23                        "|/[^/]+/../|"
24                    );
25    $replace = array(   "",
26                        $match_root."/",
27                        $match."/",
28                        "/",
29                        "/"
30                    );
31    $expandedLinks = preg_replace($search,$replace,$links);
32    return $expandedLinks;
33}
34//以下是测试内容
35$r = _expandlinks('asd/asd.html','https://blog.361way.com/');
36echo $r;
37//output https://blog.361way.com/asd/asd.html
38echo '<br />';
39$r = _expandlinks('https://blog.361way.com/asd.html','https://blog.361way.com/');
40echo $r;
41//output https://blog.361way.com/asd.html
42echo '<br />';
43$r = _expandlinks('asd.html','https://blog.361way.com:8080/');
44echo $r;
45//output https://blog.361way.com:8080/asd.html
46?>

经过测试,可以知道:第一个参数$links是链接的url 比较你采到网站中链接是[测试](asd.html) ,主站域名是http://www.test.com/ 此函数会根据相对路径关系,反回绝对路径http://www.test.com/asd.html 。