Microsoft牛津计划——文本分析

最新推荐文章于 2024-08-10 22:31:17 发布

晗时

最新推荐文章于 2024-08-10 22:31:17 发布

阅读量460

点赞数

CC 4.0 BY-SA版权

分类专栏： api 文章标签：微软 api

本文链接：https://blog.youkuaiyun.com/hz371071798/article/details/72675725

api 专栏收录该内容

1 篇文章

订阅专栏

本文介绍了Microsoft Text Analytics API的功能和服务，包括语言识别、关键词提取和情感分析等，并提供了使用Java语言实现的具体调用示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

综述

众所周知，Microsoft发起了一项牛津计划，该计划旨在为开发者提供一系列非常完善的API服务，涵盖范围包括影像、语音、语言、知识、搜索及实验室。开发者通过调用这些API，即可将这些功能应用到自己开发的应用上，可以说这一系列的服务大大简化了各个小型开发团队的工作。

本博客中，我挑选了Microsoft api中的文本分析（Text Analytics）API，对其各个功能的调用方法进行了研究。

语言识别（Language Detection）

该API用于识别一段话是以哪种语言构成的，所支持的语言数量一共达120中，但在当前的预览版中，仅支持16种语言且不包括中文，相信Microsoft会在之后将其添加进支持的语言库。

调用方法

在调用Microsoft的任何API之前，需要先订阅对应的服务，订阅地址为https://azure.microsoft.com/zh-cn/try/cognitive-services/?productId=%2Fproducts%2F56ea5ba3778daf0194250600，订阅完成后会获得属于自己的subscribe key，需要注意的是Microsoft只提供一定期限的服务试用权限，超出期限后需要交纳费用。

完成订阅后完成调用代码，此处笔者使用java语言如下

private static void post(String paras){
    HttpClient httpclient = HttpClients.createDefault();
    try{
        URIBuilder builder = new URIBuilder("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages");

        //Optimal
        //builder.setParameter("numberOfLanguagesToDetect", "2");

        URI uri = builder.build();
        HttpPost request = new HttpPost(uri);
        request.setHeader("Content-Type", "application/json");
        request.setHeader("Ocp-Apim-Subscription-Key", "your-subscribe-key");

        // Request body
        StringEntity reqEntity = new StringEntity(paras);
        request.setEntity(reqEntity);

        HttpResponse response = httpclient.execute(request);
        HttpEntity entity = response.getEntity();

        if (entity != null){
            System.out.println(EntityUtils.toString(entity));
        }
    } catch (Exception e){
        System.out.println(e.getMessage());
    }
}

此段代码使用HttpClient进行post请求，相关引用包请读者自行搜索下载。

该方法的参数为String类型，是借用JSONObject进行生成

private static String convertParas(Map<String, String> paras){
    JSONObject res = new JSONObject();
    JSONArray arr = new JSONArray();
    for (String key : paras.keySet()) {
        JSONObject obj = new JSONObject();
        obj.put("id", key);
        obj.put("text", paras.get(key));
        arr.put(obj);
    }
    res.put("documents", arr);
    return res.toString();
}

参数paras中key为标识符，value为内容。

运行结果

需要识别的文本为：

An API is usually related to a software library. The API describes and prescribes the expected behavior (a specification) while the library is an actual implementation of this set of rules. A single API can have multiple implementations (or none, being abstract) in the form of different libraries that share the same programming interface.

输出结果为：

{“documents”:[{“id”:”1”,”detectedLanguages”:[{“name”:”English”,”iso6391Name”:”en”,”score”:1.0}]}],”errors”:[]}

关键词提取（Key Phrases）

调用方法

与上文类似，不再赘述。

运行结果

依旧使用上文的关于API定义的文本，输出结果为

{“documents”:[{“keyPhrases”:[“single API”,”actual implementation”,”form of different libraries”,”multiple implementations”,”specification”,”software library”,”expected behavior”,”set of rules”,”programming interface”],”id”:”1”}],”errors”:[]}

可以看到，提取出的关键词有API、Implementation、specification，这些关键词覆盖了这一句话的主要意思，即一段关于API的定义。