手写 URL 解析工具函数

IT枫斗者

于 2024-11-06 05:15:00 发布

阅读量484

点赞数 5

CC 4.0 BY-SA版权

分类专栏：编程学习 JAVA基础工作中实际总结文章标签：分布式物联网网络 java 服务器

本文链接：https://blog.youkuaiyun.com/Andrew_Chenwq/article/details/143471333

编程学习同时被 2 个专栏收录

795 篇文章

订阅专栏

JAVA基础工作中实际总结

714 篇文章

订阅专栏

手写 URL 解析工具函数

背景

在日常开发中，经常遇到一些需要解析路由参数的场景，这个需求就属于一看就会，一写就废的题目，接下来实现一个解析函数

思路梳理

需要先梳理一下完整的 URL 由哪些部分组成

protocol，比如 http，https，ws，wss
host，比如 localhost、localhost:3000
port，比如 3000
pathname，比如 /test/list
query，比如 ?id=18298
hash，比如 #comment

可以初步观察到一个规律，每一个部分都有其独特的开头标识，比如 query 以问号开头，hash 以井号开头，这样看可能还不明显，先给出本次的用例

const a = "http://baidu.com?query=edu&id=12897#comments";
const b = "http://baidu.com/search?query=edu&id=12897#comments";
const c = "http://baidu.com/search/list?query=edu&id=12897#comments";
const d = "http://baidu.com:8080/search/list?query=edu&id=12897#comments";
const e = "http://baidu.com#comments";
const f = "http://baidu.com?query=edu#comments";
const g = "baidu.com?query=edu#comments";
const arr = [a, b, c, d, e, f, g];

因为有些部分不一定存在，比如 port，query，pathname，hash，所以初步思路是，从前往后解析，每完成一部分的解析，就剔除掉这部分内容

代码实现

先搭建一下初步的框架

const analysisUrl = (url) => {
  const res = {
      protocol: "",
      host: "",
      port: "",
      pathname: "",
      query: "",
      hash: "",
  };
  // ...
  return res
}

然后第一步是对协议的解析，比较简单，对 url 进行切割，然后赋值，代码如下

if (protocolIndex > -1) {
    res.protocol = url.slice(0, protocolIndex).toLowerCase();
    url = url.slice(protocolIndex + 3);
  }

接下来是比较麻烦的地方，也就是对于 host，port，pathname，query，hash 的切割，因为有些部分可有可无，这样会导致分隔符不同，但是整体思路是从前往后解析，所以，首要任务是，先切割出 host，那么就需要计算出切割结束的下标，由于协议部分已经被移除，所以切割开始下标为 0，无需计算，host 切割结束代码如下

 const slashIndex = url.indexOf("/");
  const queryIndex = url.indexOf("?");
  const hashIndex = url.indexOf("#");
  const hostEnd = Math.min(
    slashIndex === -1 ? Infinity : slashIndex,
    queryIndex === -1 ? Infinity : queryIndex,
    hashIndex === -1 ? Infinity : hashIndex
  );
  // 解析 host
  if (hostEnd !== Infinity) {
    res.host = url.slice(0, hostEnd);
    url = url.slice(hostEnd);
  } else {
    res.host = url;
    url = "";
  }

该如何理解呢，从上面的用例中可以看到，从 host 开始，最先出现的就是 pathname（port 后续单独分割），然后是 query，最后是 hash，所以可以写出第四行的判断，将这三个分隔符的索引取最小值，无论每个部分存在与否，这个结果一定是 host 的结尾下标，所以可以先分割出 host，顺便计算得出 port，代码如下

 const portIndex = res.host.indexOf(":");
  if (portIndex > -1) {
    res.port = res.host.slice(portIndex + 1);
  }

接下来就是 pathname 的解析，还是一样的套路，此时需要判断 query 和 hash 的分隔符，也就是问号和井号，代码如下

if (url.startsWith("/")) {
    const queryIndex = url.indexOf("?");
    const hashIndex = url.indexOf("#");
    const pathEnd = Math.min(
      queryIndex === -1 ? Infinity : queryIndex,
      hashIndex === -1 ? Infinity : hashIndex
    );
    if (pathEnd !== Infinity) {
      res.pathname = url.slice(0, pathEnd);
      url = url.slice(pathEnd);
    } else {
      res.pathname = url;
      url = "";
    }
  }

每次解析完成后，都要记得更新 url 的值，防止对后续的解析产生干扰，接下来是 query 的解析，因为 query 之后，只会存在 hash，所以这次只需要判断当前 url 是否包含井号，代码如下

 if (url.startsWith("?")) {
    const hashIndex = url.indexOf("#");
    const queryEnd = hashIndex !== -1 ? hashIndex : url.length ;
    res.query = url.slice(0, queryEnd);
    url = url.slice(queryEnd);
  }

最后来到了 hash 的解析，如果走到这里，而且 url 依然不为空，那么就可以直接得到 hash，代码如下

 if (url.startsWith("#")) {
    res.hash = url.slice(0);
  }

到这里，就已经实现了一个包含核心解析逻辑的工具函数，如果在生产环境使用，还需要添加一个特殊情况的校验、处理，完整代码如下

const analysisUrl = (url) => {
  const res = {
    protocol: "",
    host: "",
    port: "",
    pathname: "",
    query: "",
    hash: "",
  };
  const protocolIndex = url.indexOf("://");
  if (protocolIndex > -1) {
    res.protocol = url.slice(0, protocolIndex).toLowerCase();
    url = url.slice(protocolIndex + 3);
  }
  const slashIndex = url.indexOf("/");
  const queryIndex = url.indexOf("?");
  const hashIndex = url.indexOf("#");
  const hostEnd = Math.min(
    slashIndex === -1 ? Infinity : slashIndex,
    queryIndex === -1 ? Infinity : queryIndex,
    hashIndex === -1 ? Infinity : hashIndex
  );
  // 解析 host
  if (hostEnd !== Infinity) {
    res.host = url.slice(0, hostEnd);
    url = url.slice(hostEnd);
  } else {
    res.host = url;
    url = "";
  }

  // 从 host 中解析端口
  const portIndex = res.host.indexOf(":");
  if (portIndex > -1) {
    res.port = res.host.slice(portIndex + 1);
  }

  // 解析 pathname
  if (url.startsWith("/")) {
    const queryIndex = url.indexOf("?");
    const hashIndex = url.indexOf("#");
    const pathEnd = Math.min(
      queryIndex === -1 ? Infinity : queryIndex,
      hashIndex === -1 ? Infinity : hashIndex
    );
    if (pathEnd !== Infinity) {
      res.pathname = url.slice(0, pathEnd);
      url = url.slice(pathEnd);
    } else {
      res.pathname = url;
      url = "";
    }
  }

  // 解析 query
  if (url.startsWith("?")) {
    const hashIndex = url.indexOf("#");
    const queryEnd = hashIndex !== -1 ? hashIndex : url.length ;
    res.query = url.slice(0, queryEnd);
    url = url.slice(queryEnd);
  }

  // 解析锚点
  if (url.startsWith("#")) {
    res.hash = url.slice(0);
  }
  return res;
};
const a = "http://baidu.com?query=edu&id=12897#comments";
const b = "http://baidu.com/search?query=edu&id=12897#comments";
const c = "http://baidu.com/search/list?query=edu&id=12897#comments";
const d = "http://baidu.com:8080/search/list?query=edu&id=12897#comments";
const e = "http://baidu.com#comments";
const f = "http://baidu.com?query=edu#comments";
const g = "baidu.com?query=edu#comments";
const arr = [a, b, c, d, e, f, g];
arr.map(analysisUrl);

其他方案

解析 url 当然不止这一种方案，如果追求极致的代码简洁程度，可以使用正则，不过这种方式在面试中，不一定可以一次写对，但是可以证明你的正则能力，代码如下

function parseURL(url) {
  // 正则表达式匹配 URL 的各个部分
  const regex = /^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/;
  const matches = regex.exec(url);

  if (!matches) {
    throw new Error('Invalid URL');
  }

  const result = {
    protocol: matches[2] ? matches[2].toLowerCase() : null, // 协议
    host: matches[4] || null,                               // 主机和端口
    hostname: matches[4] ? matches[4].split(':')[0] : null, // 主机名
    port: matches[4] ? (matches[4].split(':')[1] || null) : null, // 端口
    pathname: matches[5] || '/',                            // 路径
    search: matches[6] || '',                               // 查询字符串
    hash: matches[8] || '',                                 // 片段标识符
    origin: matches[2] && matches[4] ? `${matches[2]}://${matches[4]}` : null // 原始地址
  };

  return result;
}

还有一种借助 a 标签来实现的，也是一种思路，但是局限于环境，代码如下

function parseURL(url) {
    const parser = document.createElement('a');
    parser.href = url;

    const result = {
        protocol: parser.protocol,       // 协议，例如 "http:"
        host: parser.host,               // 主机和端口，例如 "zhaowa.com:9000"
        hostname: parser.hostname,       // 主机名，例如 "zhaowa.com"
        port: parser.port,               // 端口，例如 "9000"
        pathname: parser.pathname,       // 路径，例如 "/search/index"
        search: parser.search,           // 查询字符串，例如 "?query=edu"
        hash: parser.hash,               // 片段标识符，例如 "#comment"
        origin: parser.origin            // 原始地址，例如 "http://zhaowa.com:9000"
    };

    return result;
}