scholarly试用

检索文献有点麻烦,于是想用scholarly去爬取google学术上的文献。可惜只检索了不到900篇文献就被google的反爬虫系统侦测到了。还是去用publish or perish吧。
 

import os
import time
import csv
import random

import scholarly
from scholarly import scholarly


def search_scholar(query, filename):
    count = 0

    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['title', 'author', 'year', 'cites', 'url']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        
        # 写入 CSV 文件的头部信息
        writer.writeheader()

        search_query = scholarly.search_pubs('query')

        while True:
            try:
                pub = next(search_query)
                result = {
                    'title': pub['bib']['title'],
                    'author': pub['bib']['author'],
                    'year': pub['bib']['pub_year'],
                    'cites': pub['num_citations'],
                    'url': pub['url_scholarbib']
                }
                writer.writerow(result)
                count += 1
                if count%100 == 0:
                    print('fetched: ', count)
                    sleep_time = random.uniform(15, 45)
                    time.sleep(sleep_time)
            except StopIteration:
                break
            except Exception as e:
                # 记录异常情况
                print(f"Error occurred: {e}")
                continue
    
    return count


if __name__ == '__main__':
    os.chdir('./results/')

    filename = 'googleresults.csv'
    query = "检索内容"
    publications = search_scholar(query, filename)
    print(publications)

### Scholarly Python Library Usage and Documentation The term **scholarly** often refers to academic research or scholarly articles. In the context of a Python library, it likely pertains to tools that assist with accessing, analyzing, or processing data related to academic publications. #### Overview of the `Scholarly` Python Library A widely recognized Python package named `scholarly` exists specifically for scraping Google Scholar profiles programmatically. This tool allows users to extract information such as citations, co-authors, interests, and publication details from Google Scholar pages without manually navigating through them[^2]. Below is an example demonstrating how this library can be utilized: ```python import scholarly # Search for a specific author by name search_query = next(scholarly.search_author('Steven A Cholewa')) print(search_query) # Retrieve detailed information about the found author author_info = search_query.fill() print(author_info) # Accessing individual papers published by the author for paper in author_info.publications: print(paper.bib['title']) ``` This script demonstrates basic functionality including searching authors via their names, retrieving comprehensive profile data, and listing associated works. #### Key Features Supported by `scholarly` - Searching both researchers and institutions within Google Scholar. - Extracting metadata like titles, abstracts, citation counts per year, etc., directly linked to each article listed under any given scholar’s page. - Handling paginated results automatically when querying large datasets[^3]. It should also be noted while using external APIs or web scrapers always respect terms-of-service agreements imposed upon websites being accessed remotely since excessive requests could lead to IP bans or other penalties depending on site policies regarding automated access methods used against its servers.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值