Java使用正则表达式抓取Bing首页每日图片

最新推荐文章于 2022-04-23 10:13:20 发布

原创最新推荐文章于 2022-04-23 10:13:20 发布 · 690 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#正则表达式 #java #bing #图片 #网络

本文介绍了使用Java通过正则表达式从Bing首页抓取每日特色图片的过程。首先分析网页结构，发现图片链接隐藏在JS脚本中。通过正则表达式`g_img={url: "(.+?)"}`提取链接，并添加前缀`http://cn.bing.com`获取完整URL。然后学习并应用Java文件操作，如判断文件是否存在、新建文件及写入二进制内容，将图片保存至本地。作者也提到，用Python的字符串查找方法能更简洁地实现相同功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Java学习到了正则表达式，总想做点有用的东西，这两天想给电脑换壁纸，看到Bing每天的主页图片挺好看的，就寻思着抓下来。
第一步就是分析主页的结构了这个Bing的主页图片直接使用小箭头抓是抓不到的，在Network的Img里我们可以找到图片所在处：
找图片
把链接copy下来，在Element里面搜索我们边可以看到链接是在一个JS脚本里面的，这个时候就比较清楚我们要怎么搞了，
链接所在处
链接所在的那一片弄出来就是这个样子了

g_img={url: “/az/hprichbg/rb/LoxodontaAfricana_ZH-CN10434704249_1920x1080.jpg”}
把这个东东里的链接搞出来加上 http://cn.bing.com 就是我们需要的图片链接了，那么这个正则表达式写出来就是

"g_img=\\{url: \"([\\w_\\-/]+?\\.jpg)\""

我开始找的时候把后面的”}”加上去发现找不到链接，只好使用这个了，在找到链接后我们就可以获取到图片的二进制内容，写入Java的文件中保存起来了。这里又学习了一点Java的文件知识，如何判断一个文件是否已经存在，新建文件，写入二进制等东西。
具体的代码就是下面了，也不是很多，就几十行

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.FileAlreadyExistsException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GetBingPicture {
    public static void main(String[] args) throws Exception {
        GetBingPicture getBingPicture = new GetBingPicture();
        String  home = "http://cn.bing.com";
        // 获取链接
        String url = getBingPicture.GetUrl(home);
        // 保存图片
        getBingPicture.SavePicture(home + url);
    }

    private String GetUrl(String home_url) throws Exception {
        InputStream is = new URL(home_url).openStream();
        byte[] buff = new byte[1024];
        StringBuilder builder = new StringBuilder();
        // 得到界面的字符串
        while (is.read(buff, 0, buff.length) > 0) {
            // 需要使用String的编码解码
            builder.append(new String(buff, "UTF-8"));
        }
        is.close();
        // 开始正则匹配
        Matcher matcher = Pattern.compile("g_img=\\{url: \"([\\w_\\-/]+?\\.jpg)\"").matcher(builder.toString());
        // 找链接
        if (matcher.find()) {
            System.out.println("Find the url: " + matcher.group(1));
            return matcher.group(1);
        } else {
            throw new Exception("Not found the url");
        }
    }
    // 保存函数
    private void SavePicture(String url) throws IOException {
        // 打开链接
        InputStream is = new URL(url).openStream();
        // 链接处理一下得到名字
        int start = url.lastIndexOf("/") + 1;
        int end = url.indexOf("_");
        // 拼接出名字，substring函数前闭后开
        String name = url.substring(start, end) + ".jpg";
        File file = new File(name);
        // 判断是否已经存在
        if (file.exists())  {
            throw new  FileAlreadyExistsException(name + " has existed");
        } else {
            // 创建文件
            file.createNewFile();
            FileOutputStream fileOutputStream = new FileOutputStream(file);
            byte[] buff = new byte[1024];
            int len = 0;
            while ((len = is.read(buff, 0, buff.length)) > 0) {
                fileOutputStream.write(buff, 0, len);
            }
            System.out.println(name + " was downloaded successfully");
            // 关掉才能保存到磁盘里
            fileOutputStream.close();
        }
        is.close();
    }
}