Aggregator

Aggregator

From Wikipedia, the free encyclopedia

(Redirected from News aggregator)
Jump to: navigation, search

An aggregator or news aggregator is a type of software that retrieves syndicated Web content that is supplied in the form of a web feed (RSS, Atom and other XML formats), and that are published by weblogs, podcasts, vlogs, and mainstream mass media websites.

Contents

What do aggregators do?

Aggregators reduce the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or IM. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed.

Aggregator features are gradually being built into portal sites such as My Yahoo! and Google, Web browsers such as Mozilla Firefox, Safari, Opera, e-mail programs like Microsoft Outlook, and other applications, including Apple's iTunes, which serves as a podcast aggregator.

The aggregator provides a consolidated view of the content in a single browser display or desktop application. Such applications are also referred to as RSS readers, feed readers, feed aggregators or news readers, although in Internet communication, the latter term was first used for programs that read Usenet newsgroups.

A website may incorporate aggregator features by republishing syndicated content on one or more of its pages. Aggregator features also may be incorporated in other client software, including Web browsers, e-mail clients, weblog creation programs, or media player programs. Devices such as mobile phones or Tivo video recorders (already aggregating television programs) may incorporate XML aggregators.

The syndicated content an aggregator will retrieve and interpret is usually supplied in the form of RSS or other XML-based data, such as RDF or Atom formats.

[ edit]

Clouds

Some news aggregators have the ability to register to clouds, centralized folksonomic services that monitor and track many syndicated content sources online. An aggregator using a cloud will receive notifications from the cloud server only when there are updates, thus eliminating the need for periodic polling. This approach attempts to produce a more efficient use of bandwidth, though the overhead associated with registering a cloud can mean no net saving. It also introduces issues of scalability and a single point of failure among others. In the time since the cloud concept was introduced in 2000, very few sources have implemented it.

Types of aggregators

Desktop

Desktop aggregators are software applications that are dedicated to the task of managing the subscriptions, monitoring and syndicated content of a user. Many aggregators display content in a window or list view similar to any email-program.

Other desktop aggregators have browser-based interfaces that look and operate like a Web-based aggregator, but are typically run on a local system and administered by the user. The interface may be served through an integrated HTTP server, that can be accessed from anywhere once the user's network is properly configured.

Some desktop applications may have aggregator functionality in addition to their primary function, such as a web browser, email client, music player or weblog editor.

Web-based

An online aggregator is a website service offering aggregator functionality, typically hosted by a service provider or portal site. Feeds are checked for updates by the service, thus reducing the bandwidth that multiple desktop aggregators would consume polling feeds individually. Since they are remotely hosted, online aggregators are accessible from anywhere, but are only as reliable as the service provider. These aggregators can be free or commercial. Web-based aggregators of blogs which are related to some project or group are often called "planets", named after the program Planet used to generate them.

OEM/Meta news feeds

Providers of aggregation services to news portals and search engines (not necessarily direct to end users).

See also

External links

在数据处理和任务管理领域,聚合任务(aggregator task)通常指将多个数据源或任务流整合为统一结果的操作。此类任务常见于分布式系统、数据流水线和实时分析平台。聚合任务的实现通常依赖于高效的数据处理策略和可扩展的任务调度机制。 数据聚合过程可以包括数据清洗、转换、合并、统计等操作,适用于日志分析、用户行为追踪、实时监控等场景。任务处理方案通常采用异步处理、批量处理、流式处理等技术,如Apache Kafka、Apache Flink、Spark Streaming等框架[^1]。 在任务调度方面,系统通常采用中心化调度器(如Kubernetes的调度机制)或去中心化的任务分发方式(如基于消息队列的架构)。调度器需考虑任务优先级、资源分配、容错机制等因素,以确保聚合任务的高效执行和数据一致性[^2]。 对于聚合任务的数据处理,常见的技术方案包括: - **MapReduce模型**:将任务拆分为Map和Reduce两个阶段,适合离线批量处理。例如,Hadoop平台广泛使用该模型进行大规模数据聚合。 - **流式处理模型**:适用于实时数据聚合,如Flink的DataStream API或Kafka Streams,能够持续处理数据流并输出实时结果。 - **批流一体架构**:结合批处理与流处理,统一数据处理流程,提高系统灵活性和扩展性。 代码示例:使用Python实现一个简单的聚合任务,统计多个日志文件中的访问次数 ```python from collections import defaultdict import os def aggregate_log_data(log_dir): access_count = defaultdict(int) for filename in os.listdir(log_dir): if filename.endswith(".log"): with open(os.path.join(log_dir, filename), 'r') as file: for line in file: if "access" in line: parts = line.split() user = parts[1] access_count[user] += 1 return access_count # 示例调用 log_directory = "/path/to/logs" result = aggregate_log_data(log_directory) print(result) ``` 该示例展示了如何从多个日志文件中提取访问记录,并统计每个用户的访问次数。实际系统中,此类聚合任务可能分布于多个节点上执行,并通过协调服务(如ZooKeeper)保证一致性[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值