solrj分词Java使用

本文介绍如何使用Solrj在Java环境中进行分词操作,针对搭建好的Solr环境,探讨将分词结果整合到Java类中的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

     小狼最近在看solr分词,环境是搭好了,但是小狼想把这个算出来的分词放到Java端,很纠结

     



怎么把下面分出来的结果放到Java类中

public static  String  testSolrLocal2() throws SolrServerException{
		StringUtill util=new StringUtill();
		HttpSolrServer solr = new HttpSolrServer("http://localhost:8888/solr/collection1");
		try {
			solr.setConnectionTimeout(1000);
			solr.setDefaultMaxConnectionsPerHost(100);
			solr.setMaxTotalConnections(100);
		} catch (Exception e) {
			e.printStackTrace();
		}
		SolrQuery query = new SolrQuery();

		query.add(CommonParams.QT, "/analysis/field"); // query type

		query.add(AnalysisParams.FIELD_VALUE, "杜淳,我爱你");

		query.add(AnalysisParams.FIELD_TYPE, "text_it");
		QueryResponse response=solr.query(query);
		
		NamedList<Object> analysis =  (NamedList<Object>) response.getResponse().get("analysis");// analysis node

		NamedList<Object> field_types =  (NamedList<Object>) analysis.get("field_types");// field_types node

		NamedList<Object> text_it =  (NamedList<Object>) field_types.get("text_it");// text_chinese node

		NamedList<Object> index =  (NamedList<Object>) text_it.get("index");// index node

		List<SimpleOrderedMap<String>> list =  (ArrayList<SimpleOrderedMap<String>>) index.get("org.apache.lucene.analysis.standard.StandardTokenizer");// tokenizer node

		String nextQuery="";
		for(Iterator<SimpleOrderedMap<String>> iter = list.iterator(); iter.hasNext();)

		{

		nextQuery += iter.next().get("text") + " ";

		}

		
		return nextQuery.trim();
	}

其中QueryResponse 对象的值是一堆json

   analysis={
        field_types={
            text_it={
                index={
                    org.apache.lucene.analysis.standard.StandardTokenizer=[
                        {
                            text=杜,
                            raw_bytes=[
                                e69d9c
                            ],
                            start=0,
                            end=1,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=1,
                            positionHistory=[
                                1
                            ]
                        },
                        {
                            text=淳,
                            raw_bytes=[
                                e6b7b3
                            ],
                            start=1,
                            end=2,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=2,
                            positionHistory=[
                                2
                            ]
                        },
                        {
                            text=我,
                            raw_bytes=[
                                e68891
                            ],
                            start=3,
                            end=4,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=3,
                            positionHistory=[
                                3
                            ]
                        },
                        {
                            text=爱,
                            raw_bytes=[
                                e788b1
                            ],
                            start=4,
                            end=5,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=4,
                            positionHistory=[
                                4
                            ]
                        },
                        {
                            text=你,
                            raw_bytes=[
                                e4bda0
                            ],
                            start=5,
                            end=6,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=5,
                            positionHistory=[
                                5
                            ]
                        }
                    ],
                    org.apache.lucene.analysis.util.ElisionFilter=[
                        {
                            text=杜,
                            raw_bytes=[
                                e69d9c
                            ],
                            start=0,
                            end=1,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=1,
                            positionHistory=[
                                1,
                                1
                            ]
                        },
                        {
                            text=淳,
                            raw_bytes=[
                                e6b7b3
                            ],
                            start=1,
                            end=2,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=2,
                            positionHistory=[
                                2,
                                2
                            ]
                        },
                        {
                            text=我,
                            raw_bytes=[
                                e68891
                            ],
                            start=3,
                            end=4,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=3,
                            positionHistory=[
                                3,
                                3
                            ]
                        },
                        {
                            text=爱,
                            raw_bytes=[
                                e788b1
                            ],
                            start=4,
                            end=5,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=4,
                            positionHistory=[
                                4,
                                4
                            ]
                        },
                        {
                            text=你,
                            raw_bytes=[
                                e4bda0
                            ],
                            start=5,
                            end=6,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=5,
                            positionHistory=[
                                5,
                                5
                            ]
                        }
                    ],
                    org.apache.lucene.analysis.core.LowerCaseFilter=[
                        {
                            text=杜,
                            raw_bytes=[
                                e69d9c
                            ],
                            start=0,
                            end=1,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=1,
                            positionHistory=[
                                1,
                                1,
                                1
                            ]
                        },
                        {
                            text=淳,
                            raw_bytes=[
                                e6b7b3
                            ],
                            start=1,
                            end=2,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=2,
                            positionHistory=[
                                2,
                                2,
                                2
                            ]
                        },
                        {
                            text=我,
                            raw_bytes=[
                                e68891
                            ],
                            start=3,
                            end=4,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=3,
                            positionHistory=[
                                3,
                                3,
                                3
                            ]
                        },
                        {
                            text=爱,
                            raw_bytes=[
                                e788b1
                            ],
                            start=4,
                            end=5,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=4,
                            positionHistory=[
                                4,
                                4,
                                4
                            ]
                        },
                        {
                            text=你,
                            raw_bytes=[
                                e4bda0
                            ],
                            start=5,
                            end=6,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=5,
                            positionHistory=[
                                5,
                                5,
                                5
                            ]
                        }
                    ],
                    org.apache.lucene.analysis.core.StopFilter=[
                        {
                            text=杜,
                            raw_bytes=[
                                e69d9c
                            ],
                            start=0,
                            end=1,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=1,
                            positionHistory=[
                                1,
                                1,
                                1,
                                1
                            ]
                        },
                        {
                            text=淳,
                            raw_bytes=[
                                e6b7b3
                            ],
                            start=1,
                            end=2,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=2,
                            positionHistory=[
                                2,
                                2,
                                2,
                                2
                            ]
                        },
                        {
                            text=我,
                            raw_bytes=[
                                e68891
                            ],
                            start=3,
                            end=4,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=3,
                            positionHistory=[
                                3,
                                3,
                                3,
                                3
                            ]
                        },
                        {
                            text=爱,
                            raw_bytes=[
                                e788b1
                            ],
                            start=4,
                            end=5,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=4,
                            positionHistory=[
                                4,
                                4,
                                4,
                                4
                            ]
                        },
                        {
                            text=你,
                            raw_bytes=[
                                e4bda0
                            ],
                            start=5,
                            end=6,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            position=5,
                            positionHistory=[
                                5,
                                5,
                                5,
                                5
                            ]
                        }
                    ],
                    org.apache.lucene.analysis.it.ItalianLightStemFilter=[
                        {
                            text=杜,
                            raw_bytes=[
                                e69d9c
                            ],
                            start=0,
                            end=1,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false,
                            position=1,
                            positionHistory=[
                                1,
                                1,
                                1,
                                1,
                                1
                            ]
                        },
                        {
                            text=淳,
                            raw_bytes=[
                                e6b7b3
                            ],
                            start=1,
                            end=2,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false,
                            position=2,
                            positionHistory=[
                                2,
                                2,
                                2,
                                2,
                                2
                            ]
                        },
                        {
                            text=我,
                            raw_bytes=[
                                e68891
                            ],
                            start=3,
                            end=4,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false,
                            position=3,
                            positionHistory=[
                                3,
                                3,
                                3,
                                3,
                                3
                            ]
                        },
                        {
                            text=爱,
                            raw_bytes=[
                                e788b1
                            ],
                            start=4,
                            end=5,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false,
                            position=4,
                            positionHistory=[
                                4,
                                4,
                                4,
                                4,
                                4
                            ]
                        },
                        {
                            text=你,
                            raw_bytes=[
                                e4bda0
                            ],
                            start=5,
                            end=6,
                            org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute#positionLength=1,
                            type=<IDEOGRAPHIC>,
                            org.apache.lucene.analysis.tokenattributes.KeywordAttribute#keyword=false,
                            position=5,
                            positionHistory=[
                                5,
                                5,
                                5,
                                5,
                                5
                            ]
                        }
                    ]
                }
            }
        },
        field_names={
            
        }
    }
}



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值