java抓取京东省市区县数据

本文介绍了一个用于抓取和解析地区信息数据的Java工具类,该工具类通过发送HTTP请求来获取不同层级(省份、城市、区县)的地区信息,并能够解析返回的JSON数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一般的系统都会有地区信息数据,如果要你一个个录取全国的地区信息,你可能会抓狂!下面的程序或许能帮到你:

public class AreaUtils {

	private final static Map<Integer,String> provinces=new HashMap<Integer,String>();
	
	static{
		provinces.put(1, "北京");
		provinces.put(2, "上海");
		provinces.put(3, "天津");
		provinces.put(4, "重庆");
		provinces.put(5, "河北");
		provinces.put(6, "山西");
		provinces.put(7, "河南");
		provinces.put(8, "辽宁");
		provinces.put(9, "吉林");
		provinces.put(10, "黑龙江");
		provinces.put(11, "内蒙古");
		provinces.put(12, "江苏");
		provinces.put(13, "山东");
		provinces.put(14, "安徽");
		provinces.put(15, "浙江");
		provinces.put(16, "福建");
		provinces.put(17, "湖北");
		provinces.put(18, "湖南");
		provinces.put(19, "广东");
		provinces.put(20, "广西");
		provinces.put(21, "江西");
		provinces.put(22, "四川");
		provinces.put(23, "海南");
		provinces.put(24, "贵州");
		provinces.put(25, "云南");
		provinces.put(26, "西藏");
		provinces.put(27, "陕西");
		provinces.put(28, "甘肃");
		provinces.put(29, "青海");
		provinces.put(30, "宁夏");
		provinces.put(31, "新疆");
		provinces.put(32, "台湾");
		provinces.put(42, "香港");
		provinces.put(43, "澳门");
		provinces.put(84, "钓鱼岛");
	}
	private static final String area_pattern="\\[.+?\\]";
	public static String areaUrl="http://passport.jd.com/emReg/AjaxService.aspx?action=GetAreas&level=[level]&parentId=[parentId]";
	/**
	 * 
	 * @author YLPan
	 * @date 2013-5-15
	 * @param level 1 获取市 2获取区县
	 * @param parentId
	 * @return
	 * @throws Exception
	 */
	public static List<Map<String,Object>> getAreas(Integer level,Integer parentId) throws Exception{
		String cityUrl=areaUrl.replaceAll("\\[level\\]",String.valueOf(level)).replaceAll("\\[parentId\\]", String.valueOf(parentId));
		System.out.println("cityUrl:"+cityUrl);
		String cityJson=NetTool.getTextContent(cityUrl, "gbk");
		Pattern pattern = Pattern.compile(area_pattern);
		Matcher matcher = pattern.matcher(cityJson);
		if(matcher.find()){
			cityJson=matcher.group();
			List<Map<String,Object>> cityList=JsonUtils.readJson2ListMap(cityJson);
			return cityList;
		}
		return null;
	}
	public static void areaInit() throws Exception{
		for(Entry<Integer,String> entry : provinces.entrySet()){
			System.out.println("province:"+entry.getValue());
				List<Map<String,Object>> cityList=getAreas(1,entry.getKey());
				if(cityList==null)continue;
				for(Map<String,Object> citymap : cityList){
					Integer cityId=(Integer)citymap.get("Id");
					String cityName=(String)citymap.get("Name");
					System.out.println("--cityName:"+cityName);
					List<Map<String,Object>> countyList=getAreas(2,cityId);
					if(countyList==null)continue;
						for(Map<String,Object> countyMap : countyList){
							Integer countyId=(Integer)countyMap.get("Id");
							String countyName=(String)countyMap.get("Name");
							System.out.println("----countyName:"+countyName);
					}
			}
		}
	}
	public static void main(String[] args) {
		try {
			areaInit();
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

 输出数据:

province:北京
cityUrl:http://passport.jd.com/emReg/AjaxService.aspx?action=GetAreas&level=1&parentId=1
--cityName:朝阳区
cityUrl:http://passport.jd.com/emReg/AjaxService.aspx?action=GetAreas&level=2&parentId=72
----countyName:三环以内
----countyName:三环到四环之间
----countyName:四环到五环之间
----countyName:五环到六环之间
----countyName:管庄
----countyName:北苑
----countyName:定福庄
--cityName:海淀区
cityUrl:http://passport.jd.com/emReg/AjaxService.aspx?action=GetAreas&level=2&parentId=2800
----countyName:三环以内
----countyName:三环到四环之间
----countyName:四环到五环之间
----countyName:五环到六环之间
----countyName:六环以外
----countyName:上地
----countyName:西三旗
----countyName:清河
----countyName:圆明园西路
----countyName:农业大学西校区
----countyName:西二旗
........................................

 可能要浏览器是访问http://passport.jd.com/emReg/AjaxService.aspx?action=GetAreas&level=1&parentId=1,返回 的数据格式如下:

({"Areas":[{"Id":72,"Name":"朝阳区"},{"Id":2800,"Name":"海淀区"},{"Id":2801,"Name":"西城区"},{"Id":2802,"Name":"东城区"},{"Id":2803,"Name":"崇文区"},{"Id":2804,"Name":"宣武区"},{"Id":2805,"Name":"丰台区"},{"Id":2806,"Name":"石景山区"},{"Id":2807,"Name":"门头沟"},{"Id":2808,"Name":"房山区"},{"Id":2809,"Name":"通州区"},{"Id":2810,"Name":"大兴区"},{"Id":2812,"Name":"顺义区"},{"Id":2814,"Name":"怀柔区"},{"Id":2816,"Name":"密云区"},{"Id":2901,"Name":"昌平区"},{"Id":2953,"Name":"平谷区"},{"Id":3065,"Name":"延庆县"}]})

 说明:其他NetTool,JsonUtils是封装好的工具类,已上传

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值