python newspaper,使用Python中的NewsPaper库将新闻文章爬到一个列表中吗？

最新推荐文章于 2024-11-19 15:49:39 发布

碃凡瑶

最新推荐文章于 2024-11-19 15:49:39 发布

阅读量249

点赞数

文章标签： python newspaper

作者求助于Stackoverflow社区，寻求如何将从CNN RSS feed抓取的文章链接从多个列表整合到一个单独的列表中，使用Python的newspaper库。目标是获取所有链接并存储为一个列表或字典。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Dear Stackoverflow community!

I would like to scrape news articles from the CNN RSS feed and get the link for each scraped article. This workes very well with the Python NewsPaper library, but unfortunately I am unable to get the output in a usable format i.e. a list or a dictionary.

I want to add the scraped links into one SINGLE list, instead of many separated lists.

import feedparser as fp

import newspaper

from newspaper import Article

website = {"cnn": {"link": "http://edition.cnn.com/", "rss": "http://rss.cnn.com/rss/cnn_topstories.rss"}}

for source, value in website.items():

if 'rss' in value:

d = fp.parse(value['rss'])

#if there is an RSS value for a company, it will be extracted into d

for entry in d.entries:

if hasattr(entry, 'published'):

article = {}

article['link'] = entry.link

print(article['link'])

The output is as follows:

http://rss.cnn.com/~r/rss/cnn_topstories/~3/5aHaFHz2VtI/index.html

http://rss.cnn.com/~r/rss/cnn_topstories/~3/_O8rud1qEXA/joe-walsh-trump-gop-voters-sot-crn-vpx.cnn

http://rss.cnn.com/~r/rss/cnn_topstories/~3/xj-0PnZ_LwU/index.html

.......

I would like to have ONE list with all the links in it i.e:

list =[http://rss.cnn.com/~r/rss/cnn_topstories/~3/5aHaFHz2VtI/index.html , http://rss.cnn.com/~r/rss/cnn_topstories/~3/_O8rud1qEXA/joe-walsh-trump-gop-voters-sot-crn-vpx.cnn , http://rss.cnn.com/~r/rss/cnn_topstories/~3/xj-0PnZ_LwU/index.html ,... ]

I tried appending the content via a for loop as follows:

for i in article['link']:

article_list = []

article_list.append(i)

print(article_list)

But then the output is like this:

['h']

['t']

['p']

[':']

['/']

['r']