前段时间在技术交流群,大家有探讨稳定获取淘宝商品主图、价格、标题,及sku的完整解决方案。这个引起了我技术挑战的兴趣。
目前,自己做了压测,QPS高、出滑块概率极低,API整体稳定,可满足业务场景的性能需求。
另外,之前评论区也有人私信过我。
那么,我就在优快云给大家提供一个技术思路,用Java版代码给大家提供一个演示。
含请求与模拟:
//联系平台获取
String itemUrl="{平台获取}";
//构建get请求
HttpURLConnection connection = null;
InputStream is = null;
BufferedReader br = null;
String result = null;// 返回结果字符串
try {
// 创建远程url连接对象
URL url = new URL(itemUrl);
// 通过远程url连接对象打开一个连接,强转成httpURLConnection类
connection = (HttpURLConnection) url.openConnection();
// 设置连接方式:get
connection.setRequestMethod("GET");
// 设置连接主机服务器的超时时间:15000毫秒
connection.setConnectTimeout(15000);
// 设置读取远程返回的数据时间:60000毫秒
connection.setReadTimeout(60000);
// 发送请求
connection.connect();
// 通过connection连接,获取输入流
if (connection.getResponseCode() == 200) {
is = connection.getInputStream();
// 封装输入流is,并指定字符集
br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
// 存放数据
StringBuffer sbf = new StringBuffer();
String temp = null;
while ((temp = br.readLine()) != null) {
sbf.append(temp);
sbf.append("\r\n");
}
result = sbf.toString();
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
// 关闭资源
if (null != br) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (null != is) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
connection.disconnect();// 关闭远程连接
}
return result;
}
webDriver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
//1.进入淘宝
webDriver.get(testUrl);
webDriver.manage().addCookie(new Cookie("thw","cn"));
webDriver.manage().addCookie(new Cookie("_l_g_","Ug%3D%3D"));
webDriver.manage().addCookie(new Cookie("lgc","%5Cu6731%5Cu5FD7%5Cu677E88"));
webDriver.manage().addCookie(new Cookie("cookie1","UoNoTo%2FTdEXMCnhnlgHclN7PZN284TnOPEj92rBNYTE%3D"));
webDriver.manage().addCookie(new Cookie("existShop","MTYyMTU4NzgzOQ%3D%3D"));
webDriver.manage().addCookie(new Cookie("cookie2","14f49530bf8330d6b22eaf3acdc24251"));
webDriver.manage().addCookie(new Cookie("sg","837"));
webDriver.manage().addCookie(new Cookie("cna","e2UuGXr8AFICAXFFqm2pMDQL"));
webDriver.manage().addCookie(new Cookie("skt","694d43f333700ce1"));
webDriver.manage().addCookie(new Cookie("_tb_token_","e5d98e35b7ee3"));
webDriver.manage().addCookie(new Cookie("xlly_s","1"));
webDriver.manage().addCookie(new Cookie("dnk","%5Cu6731%5Cu5FD7%5Cu677E88"));
webDriver.manage().addCookie(new Cookie("uc1","existShop=true&cookie14=Uoe2zEJWu0%2B7Iw%3D%3D&pas=0&cookie16=WqG3DMC9UpAPBHGz5QBErFxlCA%3D%3D&cookie15=VFC%2FuZ9ayeYq2g%3D%3D&cookie21=VT5L2FSpdeCjwGS%2FFqZpWg%3D%3D"));
webDriver.manage().addCookie(new Cookie("uc3","nk2=tacDJDHV1%2Fc%3D&id2=UoYcAK2oRM6BeA%3D%3D&lg2=U%2BGCWk%2F75gdr5Q%3D%3D&vt3=F8dCuw%2B%2BUgXqLx3riVk%3D"));
webDriver.manage().addCookie(new Cookie("tracknick","%5Cu6731%5Cu5FD7%5Cu677E88"));
webDriver.manage().addCookie(new Cookie("mt","ci=5_1"));
webDriver.manage().addCookie(new Cookie("uc4","id4=0%40UO6VjxMTD4dlqn3KIVPnTkBcXgrQ&nk4=0%40txMIDHit%2BSCJ5W5%2F1fajRpQ%2Fcw%3D%3D"));
webDriver.manage().addCookie(new Cookie("unb","1723573803"));
webDriver.manage().addCookie(new Cookie("tfstk","c7ccBn0NvusQjnwodINjkQ3aAYAdwIQzDcojafJQwn-Fa_f0ETWU6G9uqmcOC"));
webDriver.manage().addCookie(new Cookie("_samesite_flag_","true"));
webDriver.manage().addCookie(new Cookie("l","eBrffORnj6qP19W9BOfanurza77OSIRYYuPzaNbMiOCPOBfB5AkeX6skHXL6C3GVh6SDR3uh7KIMBeYBc7Vonxv9w8VMULkmn"));
webDriver.manage().addCookie(new Cookie("_cc_","V32FPkk%2Fhw%3D%3D"));
webDriver.manage().addCookie(new Cookie("cookie17","UoYcAK2oRM6BeA%3D%3D"));
webDriver.manage().addCookie(new Cookie("_nk_","%5Cu6731%5Cu5FD7%5Cu677E88"));
webDriver.manage().addCookie(new Cookie("sgcookie","E100zgp6%2FfkWrLApPdO9bSq5bShP0y6SrjiCUVn%2BGELKNlOjwwYSdcKaWxVHSu1XYUmxE%2BklKp86woHeFrq0qC65Tw%3D%3D"));
webDriver.manage().addCookie(new Cookie("t","f875ad8be099868f7620a96050fc4fb7"));
webDriver.manage().addCookie(new Cookie("csg","fd0548e9"));
webDriver.manage().addCookie(new Cookie("isg","BP7-BeR2Eg7z8kYqrwyo6sArTxJAP8K5FobJFagHaME8S5wlH8zXyIu5xReH6LrR"));
webDriver.get(testUrl);
Thread.sleep(2000);
返回的数据字段如下,我在response的json对各字段做了注释说明。
{
"msg":"获取成功",
"code":0,
"data":{
"productId":"65272193xxx", //商品ID
"shopLogo":null,
"productImg":"xxxxx", //商品主图
"shopName":"xxxxx",//商品所属店铺
"rootCategoryId":null,
"productTitle":"",//商品标题
"defPrice":"949",//商品价格
"sellerId":"890482188",//卖家ID
"brandId":null,
"shopId":"71955116",//店铺ID
"shopType":"B",
"shopWw":"xxxxx",
"categoryId":null},
"state":true
}
有疑问的或者对爬虫感兴趣的同学,欢迎在评论区交流。