一、思路
通过浏览器驱动加载页面截图,然后获得图片流,将其上传到图片存储服务器返回图片地址,最后将其存到数据库。
二、浏览器驱动选择
本地测试时,需要根据本地谷歌浏览器版本进行选择
然后去下载浏览器驱动
https://registry.npmmirror.com/binary.html?path=chromedriver/
下载对应系统的驱动包到自己指定的文件目录下,准备工作就完成了。
三、代码实现
依赖准备
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0-jre</version>
</dependency>
import org.openqa.selenium.OutputType;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.concurrent.TimeUnit;
/**
* @Version: 1.0
* @Date: 2023/2/8
*/
public class CutPicture {
private static final Logger LOGGER = LoggerFactory.getLogger(CutPicture.class);
//解决如下: 模拟浏览器滚动滚动条 解决懒加载问题
public static byte[] guge(String url){
//这里设置下载的驱动路径,Windows对应chromedriver.exe Linux对应chromedriver,具体路径看你把驱动放在哪
System.setProperty("webdriver.chrome.driver", "下载的驱动路径");
ChromeOptions options = new ChromeOptions();
//ssl证书支持
options.setCapability("acceptSslCerts", true);
//截屏支持
options.setCapability("takesScreenshot", true);
//css搜索支持
options.setCapability("cssSelectorsEnabled", true);
//设置浏览器参数
options.addArguments("--headless");
options.addArguments("--no-sandbox");
options.addArguments("--disable-gpu");
options.addArguments("--disable-dev-shm-usage");
options.setHeadless(true);
ChromeDriver driver = new ChromeDriver(options);
//设置超时,避免有些内容加载过慢导致截不到图
driver.manage().timeouts().pageLoadTimeout(1, TimeUnit.MINUTES);
driver.manage().timeouts().implicitlyWait(1, TimeUnit.MINUTES);
driver.manage().timeouts().setScriptTimeout(1, TimeUnit.MINUTES);
try {
//设置需要访问的地址
driver.get(url);
//获取高度和宽度一定要在设置URL之后,不然会导致获取不到页面真实的宽高;
Long width = (Long)driver.executeScript("return document.documentElement.scrollWidth");
Long height =(Long) driver.executeScript("return document.documentElement.scrollHeight");
LOGGER.info("获取截图高度和宽度============================");
//这里需要模拟滑动,有些是滑动的时候才加在的
long temp_height = 0;
while (true) {
//每次滚动500个像素,因为懒加载所以每次等待2S 具体时间可以根据具体业务场景去设置
Thread.sleep(2000);
driver.executeScript("window.scrollBy(0,500)");
temp_height += 500;
if(temp_height>=height){
break;
}
}
//设置窗口宽高,设置后才能截全
driver.manage().window().setSize(new org.openqa.selenium.Dimension(width.intValue(), height.intValue()));
//设置截图文件本地保存的路径,这里我们上传到服务器
// String screenshotPath = "E:/work/imgGG.png";
byte[] srcBytes = driver.getScreenshotAs(OutputType.BYTES);
// FileUtils.copyFile(srcFile, new File(screenshotPath));
LOGGER.info("截图成功============================");
return srcBytes;
}catch (Exception e){
LOGGER.error("截图失败",e);
}finally {
driver.quit();
}
return null;
}
然后将byte数组转为文件进行上传,因为项目是用的若依框架,直接用的若依上传方法,但是需要将byte数组转为MultipartFile。
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//
package com.ruoyi.web.controller.tool;
import org.springframework.lang.Nullable;
import org.springframework.util.Assert;
import org.springframework.util.FileCopyUtils;
import org.springframework.web.multipart.MultipartFile;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
public class MockMultipartFile implements MultipartFile {
private final String name;
private String originalFilename;
@Nullable
private String contentType;
private final byte[] content;
public MockMultipartFile(String name, @Nullable byte[] content) {
this(name, "", (String)null, (byte[])content);
}
public MockMultipartFile(String name, InputStream contentStream) throws IOException {
this(name, "", (String)null, (byte[])FileCopyUtils.copyToByteArray(contentStream));
}
public MockMultipartFile(String name, @Nullable String originalFilename, @Nullable String contentType, @Nullable byte[] content) {
Assert.hasLength(name, "Name must not be null");
this.name = name;
this.originalFilename = originalFilename != null ? originalFilename : "";
this.contentType = contentType;
this.content = content != null ? content : new byte[0];
}
public MockMultipartFile(String name, @Nullable String originalFilename, @Nullable String contentType, InputStream contentStream) throws IOException {
this(name, originalFilename, contentType, FileCopyUtils.copyToByteArray(contentStream));
}
public String getName() {
return this.name;
}
public String getOriginalFilename() {
return this.originalFilename;
}
@Nullable
public String getContentType() {
return this.contentType;
}
public boolean isEmpty() {
return this.content.length == 0;
}
public long getSize() {
return (long)this.content.length;
}
public byte[] getBytes() throws IOException {
return this.content;
}
public InputStream getInputStream() throws IOException {
return new ByteArrayInputStream(this.content);
}
public void transferTo(File dest) throws IOException, IllegalStateException {
FileCopyUtils.copy(this.content, dest);
}
}
本地测试完成之后上服务器
上服务器使用
准备Linux环境安装Chrome浏览器以及依赖
//chrome程序
yum install https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
//chrome依赖库
yum install pango.x86_64 libXcomposite.x86_64 libXcursor.x86_64 libXdamage.x86_64 libXext.x86_64 libXi.x86_64 libXtst.x86_64 cups-libs.x86_64 libXScrnSaver.x86_64 libXrandr.x86_64 GConf2.x86_64 alsa-lib.x86_64 atk.x86_64 gtk3.x86_64 -y
同样需要下载linux版本的驱动包放到自己指定的目录下
如果报错如下:
java.lang.IllegalStateException: The driver is not executable: /home/soft/chrome/chromedriver
at com.google.common.base.Preconditions.checkState(Preconditions.java:588)
at org.openqa.selenium.remote.service.DriverService.checkExecutable(DriverService.java:150)
at org.openqa.selenium.remote.service.DriverService.findExecutable(DriverService.java:141)
at org.openqa.selenium.chrome.ChromeDriverService.access$000(ChromeDriverService.java:35)
at org.openqa.selenium.chrome.ChromeDriverService$Builder.findDefaultExecutable(ChromeDriverService.java:159)
at org.openqa.selenium.remote.service.DriverService$Builder.build(DriverService.java:355)
at org.openqa.selenium.chrome.ChromeDriverService.createDefaultService(ChromeDriverService.java:94)
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157)
at com.bmsoft.evidence.service.impl.ArticleServiceImpl.getFilePathByUrl(ArticleServiceImpl.java:734)
at com.bmsoft.evidence.service.impl.ArticleServiceImpl.screenshot(ArticleServiceImpl.java:287)
at com.bmsoft.evidence.service.impl.ArticleServiceImpl$FastClassBySpringCGLIB$dafff6e9.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:752)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
at com.bmsoft.evidence.service.impl.ArticleServiceImpl$EnhancerBySpringCGLIB$52855f88.screenshot(<generated>)
at com.bmsoft.evidence.service.impl.AsyncServiceImpl.saveWebSiteDetail(AsyncServiceImpl.java:139)
at com.bmsoft.evidence.service.impl.AsyncServiceImpl.saveWebSite(AsyncServiceImpl.java:124)
at com.bmsoft.evidence.service.impl.AsyncServiceImpl$FastClassBySpringCGLIB$d0ae7c23.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:752)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
进入驱动所在文件夹后执行以下命令:
chmod a+x chromedriver
还可能报错是你下载的驱动版本不对,去更换一个对应版本的就行。
最后还有可能图片中的文字出现方框的问题:
通过查询资料发现是因为系统没有中文字体导致的,然后开始安装系统中文字体
yum install ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc -y
安装上面字体之后 发现还是有些字变方框,这时候我就把Windows上的一些中文字体移到Linux系统中去了,window字体位置自行百度
#1.在/usr/share/fonts目录下创建 chinese目录进行存放中文字体
mkdir -p /usr/share/fonts/chinese/
#2.为刚加入的字体设置缓存使之有效
fc-cache -fv
#3.查看系统中的字体,是否已包含songti
fc-list
到这里就可以完美生成了。
文章参考了https://blog.youkuaiyun.com/Mli_Mi/article/details/116259669?spm=1001.2014.3001.5506
大家可以去关注一波,感谢原作者的分享。
该文章介绍了如何通过Java利用Chrome浏览器驱动进行网页截图,处理懒加载页面,将图片流上传至服务器并存储到数据库。涉及的关键技术包括SeleniumWebDriver,Guava库,以及在Linux环境下解决字体问题以避免文字显示为方框。
733

被折叠的 条评论
为什么被折叠?



