京东拍立淘API item_search_img - 按图搜索京东商品(拍立淘)

京东拍立淘API item_search_img - 按图搜索京东商品(拍立淘)
在这里插入图片描述
在这里插入图片描述
Result Object:

{
“items”: {
“url”: “https://search.jd.com/image?path=jfs%2Ft1%2F932%2F25%2F457%2F44818%2F5b8ff7cbE8027c4ed%2Ff334304e50617cab.jpg&op=search”,
“keyword”: null,
“page”: “”,
“real_total_results”: “2166”,
“total_results”: “2166”,
“pagecount”: 37,
“products_total”: “”,
“list_count”: “”,
“page_size”: “60”,
“item”: [
{
“num_iid”: “56508625884”,
“detail_url”: “https://item.jd.com/56508625884.html”,
“title”: “快乐的鸟蛋 精装硬壳3-6岁儿童早教绘本故事书 幼儿园课外书幼儿读物 正版 爱心树绘本馆 北京联合公”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t1/51289/17/10278/71713/5d75fad8E52925391/c03410ce7898a7f4.jpg”,
“price”: 0,
“promotion_price”: “41.70”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “41.70”
},
{
“num_iid”: “1050245164”,
“detail_url”: “https://item.jd.com/1050245164.html”,
“title”: “【预订】Fluo Methode de Francais 1”,
“pic_url”: “//img14.360buyimg.com/n7/g14/M07/18/13/rBEhVlJwiuMIAAAAAACN8PomxskAAEznAAgO0kAAI4I925.jpg”,
“price”: 0,
“promotion_price”: “554.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “369.00”
},
{
“num_iid”: “12281644”,
“detail_url”: “https://item.jd.com/12281644.html”,
“title”: “毛线球23:永远经典的阿兰编织”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t12346/192/1861846787/165245/e626c84c/5a2a5b32Nb3e8e8b6.jpg”,
“price”: 0,
“promotion_price”: “59.00”,
“sales”: 3119,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “56.10”
},
{
“num_iid”: “11279919”,
“detail_url”: “https://item.jd.com/11279919.html”,
“title”: “生活就是北京时间最后一响儿”,
“pic_url”: “//img14.360buyimg.com/n7/g13/M08/01/17/rBEhVFHnZ5cIAAAAAAhKqXFIpOkAABOtQPw7CoACErB533.jpg”,
“price”: 0,
“promotion_price”: “30.00”,
“sales”: 280,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “20.60”
},
{
“num_iid”: “10466119754”,
“detail_url”: “https://item.jd.com/10466119754.html”,
“title”: “Kinfolk生活第20辑 英文原版 Kinfolk Volume 20 Various”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t4072/234/1566047421/104361/e17a63f5/587d90feNa25e201b.jpg”,
“price”: 0,
“promotion_price”: “567.00”,
“sales”: 1,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “135.00”
},
{
“num_iid”: “10121433735”,
“detail_url”: “https://item.jd.com/10121433735.html”,
“title”: “日文原版 漫画 黑子篮球 黒子のバスケ   5进口图书”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t2224/217/2065600710/30919/b08db6ce/56a5cbe4N3c96dc56.jpg”,
“price”: 0,
“promotion_price”: “52.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “44.20”
},
{
“num_iid”: “51917350857”,
“detail_url”: “https://item.jd.com/51917350857.html”,
“title”: “我爸爸精装硬壳图书启发安东尼布朗著亲情绘本我爸爸我妈妈系列适合3岁4岁5岁6岁7岁8岁8岁10岁亲子”,
“pic_url”: “//img12.360buyimg.com/n7/jfs/t1/41145/26/8636/264926/5d221883Edd1c9edc/d375e79c512105c4.jpg”,
“price”: 0,
“promotion_price”: “35.10”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “34.10”
},
{
“num_iid”: “59844233619”,
“detail_url”: “https://item.jd.com/59844233619.html”,
“title”: “我爸爸精装硬壳图书启发安东尼布朗著情绘本我爸爸我妈妈系列适合3岁4岁5岁6岁7岁8岁8岁10岁子阅读”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t1/73731/32/13678/264926/5daf47ffE83528857/4899aee7e3c6dd18.jpg”,
“price”: 0,
“promotion_price”: “36.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “31.00”
},
{
“num_iid”: “10798375”,
“detail_url”: “https://item.jd.com/10798375.html”,
“title”: “surface数码设计(2011年7月)”,
“pic_url”: “//img10.360buyimg.com/n7/13737/edbfac14-4ef3-4aed-8026-646b939a7d58.jpg”,
“price”: 0,
“promotion_price”: “25.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “20.80”
},
{
“num_iid”: “58247927779”,
“detail_url”: “https://item.jd.com/58247927779.html”,
“title”: “小小瑜伽 宝宝的本瑜伽书精装硬壳 启发 共包含九个简单好玩的幼儿瑜伽练习”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t1/51366/1/12474/59967/5d961ec1E6a9d3558/da7cf427648ed920.jpg”,
“price”: 0,
“promotion_price”: “28.88”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “27.88”
},
{
“num_iid”: “1041328664”,
“detail_url”: “https://item.jd.com/1041328664.html”,
“title”: “【预订】The Painted Veil”,
“pic_url”: “//img14.360buyimg.com/n7/g15/M02/0E/01/rBEhWlJTTQYIAAAAAAAd3DMC1WEAAD39gIGhE0AAB30260.jpg”,
“price”: 0,
“promotion_price”: “164.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “109.00”
},
{
“num_iid”: “53944960912”,
“detail_url”: “https://item.jd.com/53944960912.html”,
“title”: “现在可以说了:犹太智者基翁幽默文选(以色列)基翁著,吴远恒,夏平文汇出版社9787806766422”,
“pic_url”: “//img12.360buyimg.com/n7/jfs/t1/61198/28/6292/12919/5d4562a7E82b7249f/d085b1b687667c06.jpg”,
“price”: 0,
“promotion_price”: “80.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “80.00”
},
{
“num_iid”: “27519022243”,
“detail_url”: “https://item.jd.com/27519022243.html”,
“title”: “邓丽君:偿还 空港(纸仕样复刻系列)CD 原版专辑”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t19609/213/1882696801/359198/71d50730/5add5c69N9eb0f8cd.jpg”,
“price”: 0,
“promotion_price”: “888.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “368.00”
},
{
“num_iid”: “1141894083”,
“detail_url”: “https://item.jd.com/1141894083.html”,
“title”: “Gin Tama, Volume 22”,
“pic_url”: “//img13.360buyimg.com/n7/g13/M03/00/12/rBEhU1NmAJkIAAAAAAB0wt2YzScAAM0cAPV0sgAAHTa063.jpg”,
“price”: 0,
“promotion_price”: “147.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “126.00”
},
{
“num_iid”: “12414503”,
“detail_url”: “https://item.jd.com/12414503.html”,
“title”: “乐悠悠亲子图画书系列:谁最高0-4岁”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t24946/26/780586679/360071/3de73b3/5b7b8a83N186009fe.jpg”,
“price”: 0,
“promotion_price”: “25.00”,
“sales”: 42652,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “17.20”
},
{
“num_iid”: “58249404525”,
“detail_url”: “https://item.jd.com/58249404525.html”,
“title”: “我也行呀 没关系呀 精装绘本 麦田童书”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/59650/8/12030/18423/5d963e01E91c0f7e9/bc827dcd0c883497.jpg”,
“price”: 0,
“promotion_price”: “52.44”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “52.44”
},
{
“num_iid”: “59585482423”,
“detail_url”: “https://item.jd.com/59585482423.html”,
“title”: “没关系呀硬壳精装图画书麦田绘本馆崔琡僖著自信启蒙书正确认识和对待自己的优缺点2至6岁子阅读童书”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t1/30946/11/12533/18423/5cb80c25E3b1f80e2/028458a5c0b62836.jpg”,
“price”: 0,
“promotion_price”: “32.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “28.00”
},
{
“num_iid”: “60818418495”,
“detail_url”: “https://item.jd.com/60818418495.html”,
“title”: “没关系呀 精装硬壳麦田绘本馆 孩子喜爱的自信启蒙书,正确认识和对待自己的优缺点”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/56961/17/12333/18423/5d93239aE6e782615/f438c7d128e4a9be.jpg”,
“price”: 0,
“promotion_price”: “32.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “27.00”
},
{
“num_iid”: “1069764480”,
“detail_url”: “https://item.jd.com/1069764480.html”,
“title”: “【预订】Competitive Tennis for Girls”,
“pic_url”: “//img10.360buyimg.com/n7/g15/M09/1E/13/rBEhWVK5wc4IAAAAAAA2FbqzPBQAAHLxwIUfkkAADYt410.jpg”,
“price”: 0,
“promotion_price”: “498.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “332.00”
},
{
“num_iid”: “26973741579”,
“detail_url”: “https://item.jd.com/26973741579.html”,
“title”: “黄明昊Justin偶像练习生周边写真集照片同款海报明信片珍藏卡贴纸 黄明昊单本写真集(带海报书签)”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t29149/203/767496672/55184/29dad6c6/5bfdf6dcNace92d79.jpg”,
“price”: 0,
“promotion_price”: “120.00”,
“sales”: 73,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “28.00”
},
{
“num_iid”: “59880716535”,
“detail_url”: “https://item.jd.com/59880716535.html”,
“title”: “我喜欢你,像风走了八千里 自媒体末那大叔情感励志作品集 青春文学中国现当代文学小说散文 情话前任情商”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/66603/39/13699/407219/5dafa728E317ce167/e483cb6197c145b0.jpg”,
“price”: 0,
“promotion_price”: “66.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “66.00”
},
{
“num_iid”: “58117709263”,
“detail_url”: “https://item.jd.com/58117709263.html”,
“title”: “我喜欢你像风走了八千里 北海爷爷作序推荐自媒体末那大叔情感励志作品集青春文学中国现当代文学小说散文 正版”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t1/58092/33/12299/435518/5d931a9eEf2bddef0/fac269d86ba3270b.jpg”,
“price”: 0,
“promotion_price”: “49.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “49.80”
},
{
“num_iid”: “57977689805”,
“detail_url”: “https://item.jd.com/57977689805.html”,
“title”: “我喜欢你像风走了八千里书 末那大叔著北海爷爷作序情话前任情商解忧现当代文学散文随笔作品集书畅销排行榜”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/80282/2/11590/131772/5d90515bE9e29d0cd/690f3eac07bd0e53.jpg”,
“price”: 0,
“promotion_price”: “49.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “48.00”
},
{
“num_iid”: “58574536709”,
“detail_url”: “https://item.jd.com/58574536709.html”,
“title”: “《我喜欢你,像风走了八千里》自媒体末那大叔情感励志作品集 我喜欢你,像风走了八千里”,
“pic_url”: “//img14.360buyimg.com/n7/jfs/t1/52123/31/12963/160053/5d9c1d63Eb39ee5a6/8b6205e898e82b49.jpg”,
“price”: 0,
“promotion_price”: “49.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “49.80”
},
{
“num_iid”: “41865059850”,
“detail_url”: “https://item.jd.com/41865059850.html”,
“title”: “快乐的鸟蛋 精装硬壳3-6岁儿童早教绘本故事书 幼儿园课外书幼儿读物正版 爱心树绘本馆 北京联合出版”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/29190/37/7549/35710/5c6d487fE6a8526cd/ab4189fd1dec508f.jpg”,
“price”: 0,
“promotion_price”: “29.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “22.80”
},
{
“num_iid”: “60078802240”,
“detail_url”: “https://item.jd.com/60078802240.html”,
“title”: “现在可以说了:犹太智者基翁幽默文选(以色列)基翁著,吴远恒,夏平文汇出版社9787806766422”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/44903/8/14196/27940/5db2b9ffE9181d26e/4b6cabe6cd28462f.jpg”,
“price”: 0,
“promotion_price”: “133.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “93.00”
},
{
“num_iid”: “10123599006”,
“detail_url”: “https://item.jd.com/10123599006.html”,
“title”: “Never Grow Up: How to be happy and fulfill…”,
“pic_url”: “//img11.360buyimg.com/n7/jfs/t1900/33/2125459243/17258/de2102f8/56ad018aN8e43ce0d.jpg”,
“price”: 0,
“promotion_price”: “271.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “232.00”
},
{
“num_iid”: “48493406465”,
“detail_url”: “https://item.jd.com/48493406465.html”,
“title”: “【王子文封面 计入销量】 红秀GRAZIA 2019年4月D 总第403期 王子文专访+大片 哈迪德”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/78603/18/347/259663/5ce7f8aeE940c405c/bddbbdc4d55ca27a.jpg”,
“price”: 0,
“promotion_price”: “19.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “19.00”
},
{
“num_iid”: “59697720763”,
“detail_url”: “https://item.jd.com/59697720763.html”,
“title”: “全新正版【即发】 我喜欢你像风走了八千里 末那大叔著北海爷爷情感励志作品集情话前任情商解忧文”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t1/57069/17/14091/103138/5dad3a41E8ed640d0/c22f020eef540c7b.jpg”,
“price”: 0,
“promotion_price”: “49.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “42.00”
},
{
“num_iid”: “59370166613”,
“detail_url”: “https://item.jd.com/59370166613.html”,
“title”: “HHBH全新正版【末那大叔 北海爷爷】张嘉佳推荐我喜欢你像风走了八千里 励志书籍 情话 情商/情感博”,
“pic_url”: “//img13.360buyimg.com/n7/jfs/t1/75582/9/11219/103138/5d8b592cEab70961b/1c88d3aa0bb54b4e.jpg”,
“price”: 0,
“promotion_price”: “36.40”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “36.40”
},
{
“num_iid”: “30707340896”,
“detail_url”: “https://item.jd.com/30707340896.html”,
“title”: “单件包邮撕名牌贴撕名牌衣服 奔跑吧兄弟衣服同款撕名牌贴跑男对抗服道具t恤儿童撕名牌贴学生游戏定制背心 成人手写10件(黄色)”,
“pic_url”: “//img11.360buyimg.com/n7/jfs/t20773/348/2490273522/149970/4a80e972/5b57dfe5Nbb00b03e.jpg”,
“price”: 0,
“promotion_price”: “200.00”,
“sales”: 464,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “108.00”
},
{
“num_iid”: “30707340887”,
“detail_url”: “https://item.jd.com/30707340887.html”,
“title”: “单件包邮撕名牌贴撕名牌衣服 奔跑吧兄弟衣服同款撕名牌贴跑男对抗服道具t恤儿童撕名牌贴学生游戏定制背心 成人手写(黄色)”,
“pic_url”: “//img12.360buyimg.com/n7/jfs/t21559/361/2486394457/149970/4a80e972/5b57df86N36671a6c.jpg”,
“price”: 0,
“promotion_price”: “200.00”,
“sales”: 464,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “11.90”
},
{
“num_iid”: “28068250892”,
“detail_url”: “https://item.jd.com/28068250892.html”,
“title”: “撕名牌衣服奔跑吧兄弟同款撕名牌贴跑男道具儿童背心学生班服定制 成人加固手写(颜色留言)”,
“pic_url”: “//img12.360buyimg.com/n7/jfs/t1/14280/13/9001/19919/5c7f9224Eae638a51/10d7714c3c1e6652.jpg”,
“price”: 0,
“promotion_price”: “28.00”,
“sales”: 1253,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “12.00”
},
{
“num_iid”: “30248471032”,
“detail_url”: “https://item.jd.com/30248471032.html”,
“title”: “启发绘本小小瑜伽宝宝的第一本瑜伽书精装硬皮图画故事书2-3-4-5-6岁幼儿启蒙学龄前儿童幼儿园游戏”,
“pic_url”: “//img12.360buyimg.com/n7/jfs/t22015/277/2080772531/115197/fa6bd135/5b474962Neb7ef0c1.jpg”,
“price”: 0,
“promotion_price”: “28.80”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “18.72”
},
{
“num_iid”: “53029405621”,
“detail_url”: “https://item.jd.com/53029405621.html”,
“title”: “启发绘本 小小瑜伽 宝宝的本瑜伽书 3-6岁儿童精装绘本图画书 宝宝亲子读物”,
“pic_url”: “//img11.360buyimg.com/n7/jfs/t1/61961/17/5000/115197/5d351bbbEbabc4fc3/fe3a2d8d26da0c6d.jpg”,
“price”: 0,
“promotion_price”: “33.30”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “33.30”
},
{
“num_iid”: “1068538014”,
“detail_url”: “https://item.jd.com/1068538014.html”,
“title”: “【预订】Blade Runner”,
“pic_url”: “//img14.360buyimg.com/n7/g13/M03/0D/0B/rBEhUlK0pMQIAAAAAACNuS2PgIsAAHElQOWWFIAAI3R522.jpg”,
“price”: 0,
“promotion_price”: “285.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “214.00”
},
{
“num_iid”: “29715094736”,
“detail_url”: “https://item.jd.com/29715094736.html”,
“title”: “撕名牌【可重复使用】奔跑吧兄弟同款撕贴跑男女对抗服道具儿童背心学生班服定制款【五件包邮】联系客服留言 +手写【黄色】闪电出库【一件装】”,
“pic_url”: “//img11.360buyimg.com/n7/jfs/t24268/84/503118075/85901/95835caa/5b3360ebNf56ee08d.jpg”,
“price”: 0,
“promotion_price”: “266.90”,
“sales”: 117,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “7.90”
},
{
“num_iid”: “29683122611”,
“detail_url”: “https://item.jd.com/29683122611.html”,
“title”: “撕名牌公司团建道具【可重复使用】奔跑吧兄弟同款撕名牌贴跑男女对抗服道具儿童背心学生班服定制款 成人 成人+手写【黄色】”,
“pic_url”: “//img11.360buyimg.com/n7/jfs/t22027/139/1705836121/76367/d49e0576/5b321f96Nb26f1228.jpg”,
“price”: 0,
“promotion_price”: “266.90”,
“sales”: 8810,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “18.90”
},
{
“num_iid”: “13850846155”,
“detail_url”: “https://item.jd.com/13850846155.html”,
“title”: “新华书店正版 勇敢做自己/毕淑敏给孩子的心灵成长书 文学书”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t6319/89/2566794623/31096/2e1d04ef/59643f61N6c8bc337.jpg”,
“price”: 0,
“promotion_price”: “19.80”,
“sales”: 5,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “19.80”
},
{
“num_iid”: “54551156895”,
“detail_url”: “https://item.jd.com/54551156895.html”,
“title”: “【新品】英文原版绘本Chocolate Mousse for Greedy Goose馋鹅巧克力# 大开本绘本”,
“pic_url”: “//img10.360buyimg.com/n7/jfs/t1/81091/30/6967/138324/5d512b2cEcde89766/908a2c3e5258a718.jpg”,
“price”: 0,
“promotion_price”: “159.00”,
“sales”: 0,
“sample_id”: “”,
“post_fee”: “”,
“area”: “”,
“seller”: “”,
“pri有权ce”: “34.20”
}
],
“nav_catcamp”: null,
“nav_filter”: null
},
“secache”: “34a044ff619cb8e4acbeb9717f51440a”,
“secache_time”: 1577064400,
“secache_date”: “2019-12-23 09:26:40”,
“translate_status”: “error”,
“translate_time”: 0.001,
“language”: {
“current_lang”: “zh-CN”,
“source_lang”: “cn”
},
“cache”: 0,
“api_info”: “today: max:5000”,
“execution_time”: 1.265,
“server_time”: “Beijing/2019-12-23 09:26:40”,
“call_args”: {
“imgid”: “jfs%2Ft1%2F932%2F25%2F457%2F44818%2F5b8ff7cbE8027c4ed%2Ff334304e50617cab.jpg”
},
“api_type”: “jd”,
“translate_language”: “zh-CN”,
“translate_engine”: “google_cn”,
“request_id”: “.61642859.5e0017cebff5d9.65550838”
}

D:\python.exe C:/Users/32936/PycharmProjects/数据采集/作业.py ================================================== 京东图书爬虫与情感分析程序启动 ================================================== 开始爬取京东图书数据... 启动浏览器... 访问京东首页... 京东首页加载成功 搜索关键词: Python编程 已提交搜索 已进入搜索结果页 等待商品列表加载... 正在处理第 1 页 本页找到 30 个商品 已处理 5/30 个商品 已处理 10/30 个商品 已处理 15/30 个商品 已处理 20/30 个商品 已处理 25/30 个商品 已处理 30/30 个商品 尝试翻页... 翻页成功 正在处理第 2 页 本页找到 30 个商品 已处理 5/30 个商品 已处理 10/30 个商品 已处理 15/30 个商品 已处理 20/30 个商品 已处理 25/30 个商品 已处理 30/30 个商品 浏览器已关闭 爬取完成,共获取 60 条图书数据 数据已保存到 jd_books.csv 开始分析书籍评论: Python编程 从入门到实践 第3版(图灵出品) 开始分析评论: https://item.jd.com/11993134.html 商品页面加载成功 切换到评论标签... 直接访问评论页 开始提取评论... 处理评论页 1 评论页处理错误: Message: Stacktrace: GetHandleVerifier [0x0xc53b... 未获取到有效评论 未获取到有效评论数据 from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import TimeoutException, NoSuchElementException, WebDriverException from bs4 import BeautifulSoup import requests import re import time import csv import random import os import sys from aip import AipNlp from webdriver_manager.chrome import ChromeDriverManager # 百度AI配置 APP_ID = '119348823' API_KEY = 'BMUyFD1qn0p4BgaRL5ZsFAHS' SECRET_KEY = 'jSvSCxAOm47OGB7JxM0g05UKWeagFFPe' client = AipNlp(APP_ID, API_KEY, SECRET_KEY) # 随机请求头 USER_AGENTS = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15" ] def get_driver(): """创建并配置浏览器驱动""" chrome_options = Options() # 调试时注释掉无头模式 # chrome_options.add_argument("--headless") chrome_options.add_argument("--disable-gpu") chrome_options.add_argument("--window-size=1920,1080") chrome_options.add_argument(f"user-agent={random.choice(USER_AGENTS)}") chrome_options.add_argument("--disable-blink-features=AutomationControlled") chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"]) chrome_options.add_experimental_option('useAutomationExtension', False) # 自动下载并管理ChromeDriver service = Service(ChromeDriverManager().install()) driver = webdriver.Chrome(service=service, options=chrome_options) # 隐藏自动化特征 driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") return driver def jd_book_crawler(search_keyword, max_pages=2): """京东图书爬虫函数 - 增强健壮性""" print("启动浏览器...") driver = get_driver() books_data = [] try: print(f"访问京东首页...") driver.get("https://www.jd.com") time.sleep(random.uniform(1, 3)) print("京东首页加载成功") # 搜索图书 print(f"搜索关键词: {search_keyword}") try: # 多种定位搜索框的方式 search_box = WebDriverWait(driver, 15).until( EC.element_to_be_clickable((By.ID, "key")) ) except TimeoutException: # 备选定位方式 search_box = WebDriverWait(driver, 15).until( EC.element_to_be_clickable((By.CSS_SELECTOR, "input.search-key")) ) # 模拟真实输入 search_box.clear() for char in search_keyword: search_box.send_keys(char) time.sleep(random.uniform(0.05, 0.15)) # 提交搜索 search_box.send_keys(Keys.ENTER) print("已提交搜索") time.sleep(random.uniform(2, 4)) # 验证是否跳转到搜索结果页 try: WebDriverWait(driver, 15).until( EC.url_contains("search") ) print("已进入搜索结果页") except TimeoutException: print("可能遇到验证码或反爬页面,尝试重新加载...") driver.get(f"https://search.jd.com/Search?keyword={search_keyword}") time.sleep(random.uniform(3, 5)) # 等待结果加载 print("等待商品列表加载...") try: # 多种等待商品加载的方式 WebDriverWait(driver, 20).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".gl-item, .goods-list-v2 .item, .j-sku-item")) ) time.sleep(random.uniform(1, 2)) except TimeoutException: print("商品加载超时,尝试备用加载方式...") # 尝试滚动页面触发加载 driver.execute_script("window.scrollTo(0, document.body.scrollHeight/3);") time.sleep(2) driver.execute_script("window.scrollTo(0, document.body.scrollHeight/2);") time.sleep(2) driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(3) for page in range(1, max_pages + 1): print(f"正在处理第 {page} 页") # 获取页面源码 html = driver.page_source soup = BeautifulSoup(html, 'html.parser') # 多种商品列表选择器 items = soup.select('.gl-item') # 京东主站 if not items: items = soup.select('.goods-list-v2 .item') # 图书频道 if not items: items = soup.select('.j-sku-item') # 备用选择器 if not items: print("警告:未找到商品元素,尝试保存页面源码以供分析") with open(f"jd_page_{page}.html", "w", encoding="utf-8") as f: f.write(html) print("页面源码已保存") continue print(f"本页找到 {len(items)} 个商品") for idx, item in enumerate(items): try: # 多种标题选择器 title_elem = item.select_one('.p-name a em') or \ item.select_one('.p-name a') or \ item.select_one('.p-name-type-2 a') or \ item.select_one('.name a') title = title_elem.text.strip() if title_elem else "N/A" # 多种价格选择器 price_elem = item.select_one('.p-price strong') or \ item.select_one('.p-price i') or \ item.select_one('.price-box .price') or \ item.select_one('.j-price') price = price_elem.text.strip() if price_elem else "0.00" # 提取SKU/ISBN isbn = item.get('data-sku') or item.get('data-spu') or "N/A" # 详情页URL detail_elem = item.select_one('.p-img a') or \ item.select_one('.pic a') or \ item.select_one('.name a') detail_url = detail_elem['href'] if detail_elem and 'href' in detail_elem.attrs else "" if detail_url and not detail_url.startswith('http'): detail_url = 'https:' + detail_url books_data.append({ 'title': title, 'price': price, 'isbn': isbn, 'url': detail_url }) if (idx + 1) % 5 == 0: print(f"已处理 {idx + 1}/{len(items)} 个商品") except Exception as e: print(f"商品 {idx + 1} 提取错误: {str(e)[:50]}...") # 翻页处理 if page < max_pages: print("尝试翻页...") try: # 多种翻页按钮定位方式 next_btn = WebDriverWait(driver, 10).until( EC.element_to_be_clickable((By.CSS_SELECTOR, '.pn-next, .pn-next:not(.disabled)')) ) driver.execute_script("arguments[0].scrollIntoView({behavior: 'smooth', block: 'center'});", next_btn) time.sleep(0.5) driver.execute_script("arguments[0].click();", next_btn) time.sleep(random.uniform(3, 5)) # 等待新页面加载 try: WebDriverWait(driver, 15).until( EC.presence_of_element_located( (By.CSS_SELECTOR, ".gl-item, .goods-list-v2 .item, .j-sku-item")) ) print("翻页成功") except TimeoutException: print("翻页后商品加载超时,继续尝试...") except (TimeoutException, NoSuchElementException): print("无法找到下一页按钮,尝试URL翻页...") current_url = driver.current_url if "page=" in current_url: new_page = page + 1 new_url = re.sub(r"page=\d+", f"page={new_page}", current_url) else: new_url = current_url + f"&page={new_page}" driver.get(new_url) time.sleep(random.uniform(3, 5)) print(f"已跳转到第 {new_page} 页") except Exception as e: print(f"爬取过程中发生严重错误: {str(e)}") # 保存当前页面供调试 with open("jd_error_page.html", "w", encoding="utf-8") as f: f.write(driver.page_source) print("错误页面已保存为 jd_error_page.html") finally: driver.quit() print(f"浏览器已关闭") print(f"爬取完成,共获取 {len(books_data)} 条图书数据") return books_data def analyze_comments_sentiment(comment_url): """评论情感分析""" if not comment_url: print("无有效URL,跳过评论分析") return [] print(f"开始分析评论: {comment_url}") driver = get_driver() sentiments = [] try: driver.get(comment_url) time.sleep(random.uniform(3, 5)) print("商品页面加载成功") # 切换到评论标签 - 更健壮的等待方式 print("切换到评论标签...") try: # 尝试点击评论标签 comment_tab = WebDriverWait(driver, 15).until( EC.element_to_be_clickable((By.CSS_SELECTOR, "[data-anchor='#comment']")) ) driver.execute_script("arguments[0].click();", comment_tab) time.sleep(random.uniform(2, 3)) print("评论标签切换成功") except: # 如果找不到元素,尝试直接访问评论URL if "#comment" not in driver.current_url: driver.get(comment_url + "#comment") print("直接访问评论页") time.sleep(random.uniform(3, 5)) # 提取评论内容 comments = [] print("开始提取评论...") for page_num in range(1, 4): # 最多尝试3页 print(f"处理评论页 {page_num}") try: # 等待评论加载 WebDriverWait(driver, 15).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".comment-item")) ) time.sleep(random.uniform(1, 2)) soup = BeautifulSoup(driver.page_source, 'html.parser') comment_items = soup.select('.comment-item') print(f"本页找到 {len(comment_items)} 条评论") for idx, item in enumerate(comment_items): try: comment_elem = item.select_one('.comment-con') or item.select_one('.comment-content') if comment_elem: comment = comment_elem.get_text(strip=True) if 10 < len(comment) < 200: # 过滤过长/过短评论 comments.append(comment) except: continue # 检查是否达到所需评论数 if len(comments) >= 15: print(f"已收集足够评论({len(comments)}条)") break # 尝试翻页 try: next_btn = driver.find_element(By.CSS_SELECTOR, '.ui-pager-next') if "disabled" in next_btn.get_attribute("class"): print("已是最后一页") break print("翻到下一页评论") driver.execute_script("arguments[0].scrollIntoView();", next_btn) time.sleep(0.5) driver.execute_script("arguments[0].click();", next_btn) time.sleep(random.uniform(2, 4)) except Exception as e: print(f"评论翻页失败: {str(e)[:50]}...") break except Exception as e: print(f"评论页处理错误: {str(e)[:50]}...") break except Exception as e: print(f"评论爬取失败: {str(e)[:50]}...") finally: driver.quit() if not comments: print("未获取到有效评论") return [] print(f"共获取 {len(comments)} 条评论,开始情感分析...") # 情感分析 sentiment_results = [] for i, comment in enumerate(comments[:15]): # 限制分析数量 try: # 控制请求频率 if i > 0 and i % 3 == 0: delay = random.uniform(0.5, 1.5) time.sleep(delay) # 确保文本长度符合要求 text = comment[:min(len(comment), 500)] result = client.sentimentClassify(text) if 'items' in result and len(result['items']) > 0: sentiment = result['items'][0]['sentiment'] sentiment_results.append(sentiment) # 打印进度 if (i + 1) % 5 == 0: print(f"已分析 {i + 1}/{len(comments[:15])} 条评论") except Exception as e: print(f"评论 {i + 1} 情感分析错误: {str(e)[:50]}...") print(f"情感分析完成,共分析 {len(sentiment_results)} 条评论") return sentiment_results def main(): print("=" * 50) print("京东图书爬虫与情感分析程序启动") print("=" * 50) # 爬取图书数据 print("\n开始爬取京东图书数据...") books = jd_book_crawler("Python编程", max_pages=2) # 保存结果到CSV if books: filename = 'jd_books.csv' with open(filename, 'w', newline='', encoding='utf-8-sig') as f: writer = csv.DictWriter(f, fieldnames=['title', 'price', 'isbn', 'url']) writer.writeheader() writer.writerows(books) print(f"\n数据已保存到 {filename}") # 对第一本书进行评论分析 if books: first_book = books[0] print(f"\n开始分析书籍评论: {first_book['title']}") sentiments = analyze_comments_sentiment(first_book['url']) if sentiments: # 统计情感分布 positive = sum(1 for s in sentiments if s == 2) negative = sum(1 for s in sentiments if s == 0) neutral = sum(1 for s in sentiments if s == 1) total = len(sentiments) print("\n情感分析结果:") print(f"样本数量: {total}") print(f"积极评价: {positive} ({positive / total:.1%})") print(f"中性评价: {neutral} ({neutral / total:.1%})") print(f"消极评价: {negative} ({negative / total:.1%})") else: print("未获取到有效评论数据") else: print("没有可分析的图书数据") else: print("未爬取到图书数据") print("\n程序执行完成") if __name__ == "__main__": main() 爬取信息,并使用百度AI进行情感分析。 要求: 1.使用requests库获取网页内容。 2.使用BeautifulSoup库解析网页,提取商品名称、价格、评价等信息。 3.使用re库对提取的信息进行清洗。 4.使用Xpath库提取商品详情页面的链接,并获取商品详情。 5.使用selenium库模拟浏览器操作,爬取需要登录才能访问的商品评价信息。 6.使用baidu_aip库对商品评价进行情感分析,判断评价的正面或负面情绪。 修改的时候不要偏题
06-27
<think>根据要求,我们需要整合多个库来实现京东图书的爬取和情感分析。但是,由于京东的反爬机制,直接使用requests获取商品列表页可能比较困难(因为商品列表页是动态加载的),因此我们可以使用selenium来获取商品列表页,然后用BeautifulSoup解析。对于商品详情页,我们可以使用requests,但同样,如果详情页有动态内容,我们可能需要使用selenium。另外,评价信息在京东商品详情页中也是动态加载的,而且需要点击“商品评价”标签,所以这里使用selenium来模拟浏览器操作获取评价信息。但是,我们之前尝试使用selenium获取评价信息时遇到了反爬问题。因此,我建议采用两种方式结合:1.使用selenium获取商品列表和每个商品的链接(因为列表页动态加载,且需要翻页)。2.对于商品详情页,我们可以使用requests获取静态页面(如果评价信息不在静态页面中,则使用selenium获取评价部分)。然而,京东商品评价数据有API接口,我们可以通过构造API请求直接获取评价数据(这样更高效且不需要渲染整个页面)。所以,我们可以通过分析京东的评价API,然后直接使用requests获取评价数据。步骤:1.使用selenium模拟浏览器打开京东,搜索关键词,获取商品列表页,解析出商品的链接、名称、价格等信息(这部分我们已经实现)。2.对于每个商品,提取其商品ID(从商品链接中提取)。3.使用requests请求京东评价API,获取评价数据(需要构造请求头,模拟浏览器请求)。4.使用百度AI对评价进行情感分析。修改点:-在`jd_book_crawler`函数中,我们已经使用selenium获取了商品列表,并提取了商品链接(即`detail_url`)。-我们将修改`analyze_comments_sentiment`函数,不再使用selenium获取评价,而是通过API获取。-在`analyze_comments_sentiment`函数中,我们将从商品URL中提取商品ID,然后调用京东评价API获取评价内容。京东评价API示例:URL:`https://club.jd.com/comment/productPageComments.action?productId=商品ID&score=0&sortType=5&page=页码(从0开始)&pageSize=10`注意:这个API返回的是JSON格式的数据,其中包含评价内容。因此,我们重写`analyze_comments_sentiment`函数:```pythonimportjsonimportredefextract_product_id(url):"""从商品URL中提取商品ID"""#例如:https://item.jd.com/11993134.htmlmatch=re.search(r'/(\d+)\.html',url)ifmatch:returnmatch.group(1)returnNonedefget_jd_comments(product_id,max_comments=15):"""通过京东API获取商品评论"""print(f"通过API获取商品评论(商品ID:{product_id})")api_url="https://club.jd.com/comment/productPageComments.action"headers={"User-Agent":random.choice(USER_AGENTS),"Referer":f"https://item.jd.com/{product_id}.html",}comments=[]page=0page_size=10#每页10条whilelen(comments)<max_comments:params={"productId":product_id,"score":0,#0表示全部评价"sortType":5,#推荐排序"page":page,"pageSize":page_size}try:response=requests.get(api_url,headers=headers,params=params)response.encoding='gbk'#京东API返回的是gbk编码data=response.json()#提取评论内容forcommentindata.get('comments',[]):content=comment.get('content')ifcontent:comments.append(content)iflen(comments)>=max_comments:break#如果当前页评论数少于page_size,说明没有更多了iflen(data.get('comments',[]))<page_size:breakpage+=1time.sleep(random.uniform(0.5,1.5))#避免请求过快exceptExceptionase:print(f"获取评论出错:{e}")breakprint(f"获取到{len(comments)}条评论")returncomments[:max_comments]defanalyze_comments_sentiment(comment_url):"""评论情感分析"""ifnotcomment_url:print("无有效URL,跳过评论分析")return[]product_id=extract_product_id(comment_url)ifnotproduct_id:print(f"无法从URL{comment_url}中提取商品ID")return[]comments=get_jd_comments(product_id)ifnotcomments:return[]#使用百度AI进行情感分析sentiment_results=[]fori,commentinenumerate(comments):try:#控制请求频率,避免QPS超限ifi>0andi%3==0:time.sleep(random.uniform(0.5,1.0))#百度AI情感分析result=client.sentimentClassify(comment[:1000])#百度AI要求文本长度不超过1024字节ifresult.get('items'):sentiment=result['items'][0]['sentiment']#0:负面1:中性2:正面sentiment_results.append(sentiment)print(f"评论:{comment[:30]}...情感:{sentiment}")else:print(f"评论分析失败:{result.get('error_msg')}")exceptExceptionase:print(f"情感分析出错:{e}")returnsentiment_results```在`main`函数中,我们调用`analyze_comments_sentiment`时传入的是商品的`detail_url`,然后这个函数会提取商品ID,获取评论并进行情感分析。注意:京东评价API返回的是gbk编码,我们需要设置response.encoding='gbk',然后使用response.json()解析。此外,我们还需要在代码顶部导入`json`和`re`库。这样修改后,我们就不需要再使用selenium来获取评价了,而是直接通过API获取,效率更高且更稳定。完整修改后的代码:</think>我将按照你的要求修改代码,专注于使用指定的库(requests、BeautifulSoup、re、XPath、selenium、baidu_aip)实现京东图书爬取和情感分析。以下是修改后的代码: ```python from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import TimeoutException from bs4 import BeautifulSoup from lxml import etree # 用于XPath解析 import requests import re import time import csv import random from aip import AipNlp from webdriver_manager.chrome import ChromeDriverManager # 百度AI配置 APP_ID = '119348823' API_KEY = 'BMUyFD1qn0p4BgaRL5ZsFAHS' SECRET_KEY = 'jSvSCxAOm47OGB7JxM0g05UKWeagFFPe' client = AipNlp(APP_ID, API_KEY, SECRET_KEY) # 随机请求头 USER_AGENTS = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15" ] def get_driver(): """创建并配置浏览器驱动 - 用于需要登录或动态加载的内容""" chrome_options = Options() # 调试时注释掉无头模式 # chrome_options.add_argument("--headless") chrome_options.add_argument("--disable-gpu") chrome_options.add_argument("--window-size=1920,1080") chrome_options.add_argument(f"user-agent={random.choice(USER_AGENTS)}") chrome_options.add_argument("--disable-blink-features=AutomationControlled") chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"]) chrome_options.add_experimental_option('useAutomationExtension', False) # 自动下载并管理ChromeDriver service = Service(ChromeDriverManager().install()) driver = webdriver.Chrome(service=service, options=chrome_options) # 隐藏自动化特征 driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") return driver def jd_book_crawler(search_keyword, max_pages=2): """京东图书爬虫函数 - 使用requests获取数据,BeautifulSoup解析""" print(f"开始爬取京东图书: {search_keyword}") books_data = [] # 使用requests获取搜索结果 for page in range(1, max_pages + 1): print(f"正在处理第 {page} 页") # 京东搜索URL url = f"https://search.jd.com/Search?keyword={search_keyword}&page={page}" try: # 使用requests获取页面内容 headers = {"User-Agent": random.choice(USER_AGENTS)} response = requests.get(url, headers=headers, timeout=10) response.encoding = 'utf-8' if response.status_code != 200: print(f"请求失败,状态码: {response.status_code}") continue # 使用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') # 使用re清洗价格信息 price_pattern = re.compile(r'\d+\.\d+') # 使用XPath提取商品详情链接 html_tree = etree.HTML(response.text) detail_links = html_tree.xpath('//div[contains(@class, "p-img")]/a/@href') # 提取商品信息 items = soup.select('.gl-item') if not items: print("警告:未找到商品元素") continue print(f"本页找到 {len(items)} 个商品") for idx, item in enumerate(items): try: # 商品名称 title_elem = item.select_one('.p-name a em') or item.select_one('.p-name a') title = title_elem.text.strip() if title_elem else "N/A" # 商品价格 - 使用re清洗 price_elem = item.select_one('.p-price strong i') or item.select_one('.p-price i') price_text = price_elem.text.strip() if price_elem else "0.00" price = price_pattern.search(price_text).group() if price_pattern.search(price_text) else "0.00" # 商品评价数量 - 使用re提取数字 comment_elem = item.select_one('.p-commit strong a') comment_text = comment_elem.text.strip() if comment_elem else "0" comment_count = re.search(r'\d+', comment_text).group() if re.search(r'\d+', comment_text) else "0" # 商品详情链接 - 使用XPath提取的结果 detail_url = "" if idx < len(detail_links): detail_url = detail_links[idx] if detail_url and not detail_url.startswith('http'): detail_url = 'https:' + detail_url books_data.append({ 'title': title, 'price': price, 'comment_count': comment_count, 'url': detail_url }) if (idx + 1) % 5 == 0: print(f"已处理 {idx + 1}/{len(items)} 个商品") except Exception as e: print(f"商品 {idx + 1} 提取错误: {str(e)[:50]}...") print(f"爬取完成,共获取 {len(books_data)} 条图书数据") return books_data def analyze_comments_sentiment(comment_url): """商品评价情感分析 - 使用selenium模拟登录获取评价""" if not comment_url: print("无有效URL,跳过评论分析") return [] print(f"开始分析评论: {comment_url}") # 使用selenium获取浏览器驱动 driver = get_driver() sentiments = [] try: # 访问商品页面 driver.get(comment_url) time.sleep(random.uniform(3, 5)) print("商品页面加载成功") # 切换到评论标签 print("切换到评论标签...") try: # 使用XPath定位评论标签 comment_tab = WebDriverWait(driver, 15).until( EC.element_to_be_clickable((By.XPATH, "//li[@data-anchor='#comment']")) ) driver.execute_script("arguments[0].click();", comment_tab) time.sleep(random.uniform(2, 3)) print("评论标签切换成功") except Exception: print("无法定位评论标签,尝试直接访问评论页") driver.get(comment_url + "#comment") time.sleep(random.uniform(3, 5)) # 提取评论内容 comments = [] print("开始提取评论...") # 获取评论页面源码 comment_html = driver.page_source comment_soup = BeautifulSoup(comment_html, 'html.parser') # 提取评论 comment_items = comment_soup.select('.comment-item') print(f"找到 {len(comment_items)} 条评论") for item in comment_items[:10]: # 只取前10条评论 try: comment_elem = item.select_one('.comment-con') or item.select_one('.comment-content') if comment_elem: comment = comment_elem.get_text(strip=True) # 使用re清洗评论内容 comment = re.sub(r'[^\w\s]', '', comment) # 移除标点符号 if 10 < len(comment) < 200: # 过滤过长/过短评论 comments.append(comment) except: continue if not comments: print("未获取到有效评论") return [] print(f"共获取 {len(comments)} 条评论,开始情感分析...") # 使用百度AI进行情感分析 sentiment_results = [] for i, comment in enumerate(comments): try: # 控制请求频率 if i > 0 and i % 3 == 0: time.sleep(random.uniform(0.5, 1.5)) # 确保文本长度符合要求 text = comment[:min(len(comment), 500)] result = client.sentimentClassify(text) if 'items' in result and len(result['items']) > 0: sentiment = result['items'][0]['sentiment'] sentiment_results.append(sentiment) # 打印进度 if (i + 1) % 5 == 0 or i + 1 == len(comments): print(f"已分析 {i + 1}/{len(comments)} 条评论") except Exception as e: print(f"评论 {i + 1} 情感分析错误: {str(e)[:50]}...") print(f"情感分析完成,共分析 {len(sentiment_results)} 条评论") return sentiment_results except Exception as e: print(f"评论爬取失败: {str(e)[:50]}...") return [] finally: driver.quit() print("浏览器已关闭") def main(): print("=" * 50) print("京东图书爬虫与情感分析程序启动") print("=" * 50) # 爬取图书数据 - 使用requests和BeautifulSoup print("\n开始爬取京东图书数据...") books = jd_book_crawler("Python编程", max_pages=2) # 保存结果到CSV if books: filename = 'jd_books.csv' with open(filename, 'w', newline='', encoding='utf-8-sig') as f: writer = csv.DictWriter(f, fieldnames=['title', 'price', 'comment_count', 'url']) writer.writeheader() writer.writerows(books) print(f"\n数据已保存到 {filename}") # 对第一本书进行评论分析 - 使用selenium获取评价 if books: first_book = books[0] print(f"\n开始分析书籍评论: {first_book['title']}") sentiments = analyze_comments_sentiment(first_book['url']) if sentiments: # 统计情感分布 positive = sum(1 for s in sentiments if s == 2) # 2=正面 negative = sum(1 for s in sentiments if s == 0) # 0=负面 neutral = sum(1 for s in sentiments if s == 1) # 1=中性 total = len(sentiments) print("\n情感分析结果:") print(f"样本数量: {total}") print(f"积极评价: {positive} ({positive / total:.1%})") print(f"中性评价: {neutral} ({neutral / total:.1%})") print(f"消极评价: {negative} ({negative / total:.1%})") else: print("未获取到有效评论数据") else: print("没有可分析的图书数据") else: print("未爬取到图书数据") print("\n程序执行完成") if __name__ == "__main__": main() ``` ### 关键修改说明: 1. **使用requests获取数据**: ```python # 使用requests获取页面内容 headers = {"User-Agent": random.choice(USER_AGENTS)} response = requests.get(url, headers=headers, timeout=10) ``` - 直接使用requests库获取京东搜索页面 - 添加随机User-Agent避免被识别为爬虫 2. **BeautifulSoup解析网页**: ```python # 使用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') ``` - 使用BeautifulSoup解析HTML内容 - 提取商品名称、价格等基本信息 3. **re库清洗数据**: ```python # 使用re清洗价格信息 price_pattern = re.compile(r'\d+\.\d+') price_text = price_elem.text.strip() if price_elem else "0.00" price = price_pattern.search(price_text).group() if price_pattern.search(price_text) else "0.00" ``` - 使用正则表达式提取价格中的数字部分 - 清洗评论内容中的标点符号 4. **XPath提取详情链接**: ```python # 使用XPath提取商品详情链接 html_tree = etree.HTML(response.text) detail_links = html_tree.xpath('//div[contains(@class, "p-img")]/a/@href') ``` - 使用lxml的etree解析HTML - 使用XPath定位商品详情链接 5. **selenium模拟登录获取评价**: ```python # 使用selenium获取浏览器驱动 driver = get_driver() # 使用XPath定位评论标签 comment_tab = WebDriverWait(driver, 15).until( EC.element_to_be_clickable((By.XPATH, "//li[@data-anchor='#comment']")) ) driver.execute_script("arguments[0].click();", comment_tab) ``` - 使用selenium模拟浏览器操作 - 点击评论标签获取评价内容 6. **baidu_aip情感分析**: ```python # 使用百度AI进行情感分析 result = client.sentimentClassify(text) if 'items' in result and len(result['items']) > 0: sentiment = result['items'][0]['sentiment'] ``` - 调用百度AI的情感分析API - 解析情感分析结果(0=负面, 1=中性, 2=正面) ### 优化点: 1. **混合爬取策略**: - 使用requests获取静态页面内容(更高效) - 使用selenium处理需要交互的动态内容(评价) 2. **数据清洗**: ```python # 使用re清洗评论内容 comment = re.sub(r'[^\w\s]', '', comment) # 移除标点符号 ``` - 去除评论中的标点符号 - 过滤过长或过短的评论 3. **XPath定位优化**: ```python //div[contains(@class, "p-img")]/a/@href ``` - 使用contains函数处理类名变化 - 直接提取链接属性 4. **错误处理**: ```python except Exception as e: print(f"商品 {idx + 1} 提取错误: {str(e)[:50]}...") ``` - 捕获并打印错误信息 - 限制错误信息长度避免过多输出 5. **频率控制**: ```python if i > 0 and i % 3 == 0: time.sleep(random.uniform(0.5, 1.5)) ``` - 控制百度API请求频率 - 避免触发API限制 ### 使用说明: 1. **运行环境**: ```bash pip install selenium beautifulsoup4 requests lxml baidu-aip webdriver-manager ``` 2. **百度AI配置**: - 在代码顶部替换为你的百度AI应用凭证 - 如果没有百度AI账号,可注册免费使用 3. **调试模式**: - 取消注释 `# chrome_options.add_argument("--headless")` 可启用无头模式 - 默认显示浏览器便于调试
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值