Wget Examples

本文详细介绍了wget命令的基本使用方法,包括下载文件、设置重试次数、使用FTP、高级使用技巧等,并深入探讨了其高级特性如递归检索、目录限制、URL格式、定制可视化、Guru级使用等,提供了丰富的示例与实践指导。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The examples are classified into three sections, because of clarity.The first section is a tutorial for beginners. The second sectionexplains some of the more complex program features. The third sectioncontains advice for mirror administrators, as well as even more complexfeatures (that some would call perverted).

Simple Usage

  • Say you want to download a URL. Just type:
    wget http://fly.cc.fer.hr/
    
    The response will be something like:
    --13:30:45--  http://fly.cc.fer.hr:80/
               => `index.html'
    Connecting to fly.cc.fer.hr:80... connected!
    HTTP request sent, fetching headers... done.
    Length: 1,749 [text/html]
    
        0K -> .
    
    13:30:46 (68.32K/s) - `index.html' saved [1749/1749]
    
  • But what will happen if the connection is slow, and the file is lengthy?The connection will probably fail before the whole file is retrieved,more than once. In this case, Wget will try getting the file until iteither gets the whole of it, or exceeds the default number of retries(this being 20). It is easy to change the number of tries to 45, toinsure that the whole file will arrive safely:
    wget --tries=45 http://fly.cc.fer.hr/jpg/flyweb.jpg
    
  • Now let's leave Wget to work in the background, and write its progressto log file `log'. It is tiring to type `--tries', so weshall use `-t'.
    wget -t 45 -o log http://fly.cc.fer.hr/jpg/flyweb.jpg &
    
    The ampersand at the end of the line makes sure that Wget works in thebackground. To unlimit the number of retries, use `-t inf'.
  • The usage of FTP is as simple. Wget will take care of login andpassword.
    $ wget ftp://gnjilux.cc.fer.hr/welcome.msg
    --23:35:55--  ftp://gnjilux.cc.fer.hr:21/welcome.msg
               => `welcome.msg'
    Connecting to gnjilux.cc.fer.hr:21... connected!
    Logging in as anonymous ... Logged in!
    ==> TYPE I ... done.  ==> CWD not needed.
    ==> PORT ... done.    ==> RETR welcome.msg ... done.
    Length: 1,340 (unauthoritative)
     
        0K -> .
     
    23:35:56 (37.39K/s) - `welcome.msg' saved [1340]
    
  • If you specify a directory, Wget will retrieve the directory listing,parse it and convert it to HTML. Try:
    wget ftp://prep.ai.mit.edu/pub/gnu/
    lynx index.html
    

Advanced Usage

  • You would like to read the list of URLs from a file? Not a problemwith that:
    wget -i file
    
    If you specify `-' as file name, the URLs will be read fromstandard input.
  • Create a mirror image of GNU WWW site (with the same directory structurethe original has) with only one try per document, saving the log of theactivities to `gnulog':
    wget -r -t1 http://www.gnu.ai.mit.edu/ -o gnulog
    
  • Retrieve the first layer of yahoo links:
    wget -r -l1 http://www.yahoo.com/
    
  • Retrieve the index.html of `www.lycos.com', showing the originalserver headers:
    wget -S http://www.lycos.com/
    
  • Save the server headers with the file:
    wget -s http://www.lycos.com/
    more index.html
    
  • Retrieve the first two levels of `wuarchive.wustl.edu', saving themto /tmp.
    wget -P/tmp -l2 ftp://wuarchive.wustl.edu/
    
  • You want to download all the GIFs from an HTTP directory.`wget http://host/dir/*.gif' doesn't work, since HTTPretrieval does not support globbing. In that case, use:
    wget -r -l1 --no-parent -A.gif http://host/dir/
    
    It is a bit of a kludge, but it works. `-r -l1' means to retrieverecursively (See section Recursive Retrieval), with maximum depth of 1.`--no-parent' means that references to the parent directory areignored (See section Directory-Based Limits), and `-A.gif' means todownload only the GIF files. `-A "*.gif"' would have workedtoo.
  • Suppose you were in the middle of downloading, when Wget wasinterrupted. Now you do not want to clobber the files already present.It would be:
    wget -nc -r http://www.gnu.ai.mit.edu/
    
  • If you want to encode your own username and password to HTTP orFTP, use the appropriate URL syntax (See section URL Format).
    wget ftp://hniksic:mypassword@jagor.srce.hr/.emacs
    
  • If you do not like the default retrieval visualization (1K dots with 10dots per cluster and 50 dots per line), you can customize it through dotsettings (See section Wgetrc Commands). For example, many people like the"binary" style of retrieval, with 8K dots and 512K lines:
    wget --dot-style=binary ftp://prep.ai.mit.edu/pub/gnu/README
    
    You can experiment with other styles, like:
    wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
    wget --dot-style=micro http://fly.cc.fer.hr/
    
    To make these settings permanent, put them in your `.wgetrc', asdescribed before (See section Sample Wgetrc).

Guru Usage

  • If you wish Wget to keep a mirror of a page (or FTPsubdirectories), use `--mirror' (`-m'), which is the shorthandfor `-r -N'. You can put Wget in the crontab file asking it torecheck a site each Sunday:
    crontab
    0 0 * * 0 wget --mirror ftp://ftp.xemacs.org/pub/xemacs/ -o /home/me/weeklog
    
  • You may wish to do the same with someone's home page. But you do notwant to download all those images--you're only interested in HTML.
    wget --mirror -A.html http://www.w3.org/
    
  • But what about mirroring the hosts networkologically close to you? Itseems so awfully slow because of all that DNS resolving. Just use`-D' (See section Domain Acceptance).
    wget -rN -Dsrce.hr http://www.srce.hr/
    
    Now Wget will correctly find out that `regoc.srce.hr' is the sameas `www.srce.hr', but will not even take into consideration thelink to `www.mit.edu'.
  • You have a presentation and would like the dumb absolute links to beconverted to relative? Use `-k':
    wget -k -r URL
    
  • You would like the output documents to go to standard output instead ofto files? OK, but Wget will automatically shut up (turn on`--quiet') to prevent mixing of Wget output and the retrieveddocuments.
    wget -O - http://jagor.srce.hr/ http://www.srce.hr/
    
    You can also combine the two options and make weird pipelines toretrieve the documents from remote hotlists:
    wget -O - http://cool.list.com/ | wget --force-html -i -
    

http://www.editcorp.com/personal/lars_appel/wget/v1/wget_7.html


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值