聊聊Spring AI Alibaba的BilibiliDocumentReader

本文主要研究一下Spring AI Alibaba的BilibiliDocumentReader

BilibiliDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-bilibili/src/main/java/com/alibaba/cloud/ai/reader/bilibili/BilibiliDocumentReader.java

public class BilibiliDocumentReader implements DocumentReader {

	private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

	private static final String API_BASE_URL = "https://api.bilibili.com/x/web-interface/view?bvid=";

	private final String resourcePath;

	private final ObjectMapper objectMapper;

	private static final int MEMORY_SIZE = 5;

	private static final int BYTE_SIZE = 1024;

	private static final int MAX_MEMORY_SIZE = MEMORY_SIZE * BYTE_SIZE * BYTE_SIZE;

	private static final WebClient WEB_CLIENT = WebClient.builder()
		.defaultHeader(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
		.codecs(configurer -> configurer.defaultCodecs().maxInMemorySize(MAX_MEMORY_SIZE))
		.build();

	public BilibiliDocumentReader(String resourcePath) {
		Assert.hasText(resourcePath, "Query string must not be empty");
		this.resourcePath = resourcePath;
		this.objectMapper = new ObjectMapper();
	}

	@Override
	public List<Document> get() {
		List<Document> documents = new ArrayList<>();
		try {
			String bvid = extractBvid(resourcePath);
			String videoInfoResponse = fetchVideoInfo(bvid);
			JsonNode videoData = parseJson(videoInfoResponse).path("data");
			String title = videoData.path("title").asText();
			String description = videoData.path("desc").asText();
			Document infoDoc = new Document("Video information", Map.of("title", title, "description", description));
			documents.add(infoDoc);
			String documentContent = fetchAndProcessSubtitles(videoData, title, description);
			documents.add(new Document(documentContent));
		}
		catch (IllegalArgumentException e) {
			logger.error("Invalid input: {}", e.getMessage());
			documents.add(new Document("Error: Invalid input"));
		}
		catch (IOException e) {
			logger.error("Error parsing JSON: {}", e.getMessage(), e);
			documents.add(new Document("Error parsing JSON: " + e.getMessage()));
		}
		catch (Exception e) {
			logger.error("Unexpected error: {}", e.getMessage(), e);
			documents.add(new Document("Unexpected error: " + e.getMessage()));
		}
		return documents;
	}

	private String extractBvid(String resourcePath) {
		return resourcePath.replaceAll(".*(BV\\w+).*", "$1");
	}

	private String fetchVideoInfo(String bvid) {
		return WEB_CLIENT.get().uri(API_BASE_URL + bvid).retrieve().bodyToMono(String.class).block();
	}

	private JsonNode parseJson(String jsonResponse) throws IOException {
		return objectMapper.readTree(jsonResponse);
	}

	private String fetchAndProcessSubtitles(JsonNode videoData, String title, String description) throws IOException {
		JsonNode subtitleList = videoData.path("subtitle").path("list");
		if (subtitleList.isArray() && subtitleList.size() > 0) {
			String subtitleUrl = subtitleList.get(0).path("subtitle_url").asText();
			String subtitleResponse = WEB_CLIENT.get().uri(subtitleUrl).retrieve().bodyToMono(String.class).block();

			JsonNode subtitleJson = parseJson(subtitleResponse);
			StringBuilder rawTranscript = new StringBuilder();
			subtitleJson.path("body").forEach(node -> rawTranscript.append(node.path("content").asText()).append(" "));

			return String.format("Video Title: %s, Description: %s\nTranscript: %s", title, description,
					rawTranscript.toString().trim());
		}
		else {
			return String.format("No subtitles found for video: %s. Returning an empty transcript.", resourcePath);
		}
	}

}

BilibiliDocumentReader使用WebClient去请求B站接口,它从url解析bvid,再根据bvid去请求接口,解析json获取title、description,通过fetchAndProcessSubtitles再去请求subtitle_url获取字幕内容作为document的内容

示例

public class BilibiliDocumentReaderTest {

	private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

	@Test
	void bilibiliDocumentReaderTest() {
		BilibiliDocumentReader bilibiliDocumentReader = new BilibiliDocumentReader(
				"https://www.bilibili.com/video/BV1KMwgeKECx/?t=7&vd_source=3069f51b168ac07a9e3c4ba94ae26af5");
		List<Document> documents = bilibiliDocumentReader.get();
		logger.info("documents: {}", documents);
	}

}

小结

spring-ai-alibaba-starter-document-reader-bilibili提供了BilibiliDocumentReader用于解析B站的视频url,它请求两次接口,一次获取title和description,一次获取字幕。

doc

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值