爬取Instagram的帖子,用户

Instagram-Crawl

A tool to Crawl post, profile, hashtags information in Instagram.

1. Requirements

1.1 ChromeDriver

Install it on this website and put it in the directory bin/. You have to choose different version which is compatible with your operating system (Mac, windows, Linux).

And then make your chromedriver verified doing the following steps:

  • open terminal
  • Navigate to path where your chromedriver file is located
  • Execute xattr -d com.apple.quarantine chromedriver

Example:

$ cd $(Pkg_Path)/Crawl/bin 
$ xattr -d com.apple.quarantine chromedriver

1.2 Login Information complement

Due to the limitation, you have to put your user information in Crawl/UserInfo file including username and password.

1.3 Others

The rest package requirements are written in the requirement.txt file.

2. How to use?

This program can be used to get Instagram posts/profile/hashtag data without using Instagram API.

There are some arguments in main.py you can choose.

parser.add_argument(
        "--mode", type=str, default="hashtag", help="options: [posts, posts_full, profile, profile_script, hashtag]"
    )
  • posts: to get url, caption, first photo for each post.
  • posts_full: you will get url, caption, all photos, time, comments, number of likes and views for each post.
  • profile: to get the user information including post nums, followers, following numbers.
  • profile_script: get more user info than profile mode.
  • hashtag: get all the posts’ information of tag. It takes much longer to get data if the post number is over about 1000 since Instagram has set up the rate limit for data request

Example

python main.py posts -u cal_foodie -n 100 -o ./output
python main.py posts_full -u cal_foodie -n 100 -o ./output
python main.py profile -u cal_foodie -o ./output
python main.py profile_script -u cal_foodie -o ./output
python main.py hashtag -t taiwan -o ./output
The default number for fetching posts via hashtag is 100.

Print the result to the console if not specifying the output path of post -o, --output.

以上是关于如何使用的信息,代码放在仓库:https://github.com/Billy1900/Instagram-Crawl
如果觉得有点用,可以点下star,🙏

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值