Linux命令学习总结:dos2unix - unix2dos

命令简介:

dos2unix是将Windows格式文件转换为Unix、Linux格式的实用命令。Windows格式文件的换行符为\r\n ,而Unix&Linux文件的换行符为\n. dos2unix命令其实就是将文件中的\r\n 转换为\n。

而unix2dos则是和dos2unix互为孪生的一个命令,它是将Linux&Unix格式文件转换为Windows格式文件的命令。

命令语法:

dos2unix [options] [-c convmode] [-o file ...] [-n infile outfile ...]

unix2dos [options] [-c convmode] [-o file ...] [-n infile outfile ...]

命令参数:

此命令参数是Red Hat Enterprise Linux Server release 5.7下dos2unix命令参数,不同版本Linux的dos2nnix命令参数有可能不同。

参数

长参数

描叙

-h

 

显示命令dos2unix联机帮助信息。

-k

 

保持文件时间戳不变

-q

 

静默模式,不输出转换结果信息等

-V

 

显示命令版本信息

-c

 

转换模式

-o

 

在源文件转换,默认参数

-n

 

保留原本的旧档,将转换后的内容输出到新档案.默认都会直接在原来的文件上修改,

使用示例:

1: 查看dos2unix命令的帮助信息

[root@DB-Server myscript]# man dos2unix
 
[root@DB-Server myscript]# dos2unix -h
dos2unix Copyright (c) 1994-1995 Benjamin Lin
         Copyright (c) 1998      Bernd Johannes Wuebben (Version 3.0)
         Copyright (c) 1998      Christian Wurll (Version 3.1)
Usage: dos2unix [-hkqV] [-c convmode] [-o file ...] [-n infile outfile ...]
 -h --help        give this help
 -k --keepdate    keep output file date
 -q --quiet       quiet mode, suppress all warnings
                  always on in stdin->stdout mode
 -V --version     display version number
 -c --convmode    conversion mode
 convmode         ASCII, 7bit, ISO, Mac, default to ASCII
 -l --newline     add additional newline in all but Mac convmode
 -o --oldfile     write to old file
 file ...         files to convert in old file mode
 -n --newfile     write to new file
 infile           original file in new file mode
 outfile          output file in new file mode

2: dos2unix filename 将Windows格式文本转换为Unix&Linux格式文件

   [root@DB-Server myscript]# cat -v test.sh 
   . /home/oracle/.bash_profile^M
   echo ' '^M
   date^M
   echo ' '^M
   ^M
   sqlplus test/test @/home/oracle/scripts/test.sql^M
   ^M
   echo ' '^M
   date^M
   echo ' '^M
   [root@DB-Server myscript]# dos2unix test.sh 
   dos2unix: converting file test.sh to UNIX format ...
   [root@DB-Server myscript]# cat -v test.sh 
   . /home/oracle/.bash_profile
   echo ' '
   date
   echo ' '
    
   sqlplus test/test @/home/oracle/scripts/test.sql
    
   echo ' '
   date
   echo ' '

3: dos2unix 可以一次转换多个文件

dos2unix filename1 filename2 filename3

4: 默认情况下会在源文件上进行转换,如果需要保留源文件,那么可以使用参数-n dos2unix -n oldfilename newfilename

    [root@DB-Server myscript]# dos2unix -n dosfile linuxfile
    dos2unix: converting file dosfile to file linuxfile in UNIX format ...
    [root@DB-Server myscript]# cat -v dosfile 
    it is a windows dos file^M
    you should convert to unix&linux format^M
    [root@DB-Server myscript]# cat -v linuxfile 
    it is a windows dos file
    you should convert to unix&linux format
    [root@DB-Server myscript]# 

clip_image001

5:保持文件时间戳不变

   [root@DB-Server myscript]# ls -lrt dosfile 
   -rw-r--r-- 1 root root 67 Dec 26 11:46 dosfile
   [root@DB-Server myscript]# dos2unix dosfile 
   dos2unix: converting file dosfile to UNIX format ...
   [root@DB-Server myscript]# ls -lrt dosfile 
   -rw-r--r-- 1 root root 65 Dec 26 11:58 dosfile
   [root@DB-Server myscript]# dos2unix -k dosfile 
   dos2unix: converting file dosfile to UNIX format ...
   [root@DB-Server myscript]# ls -lrt dosfile 
   -rw-r--r-- 1 root root 65 Dec 26 11:58 dosfile

6:静默模式格式化文件

   [root@DB-Server myscript]# unix2dos -q dosfile 
    
   [root@DB-Server myscript]# 

dos2unix的下载地址为http://sourceforge.net/projects/dos2unix/ ,可以从上面下载最新版本的dos2unix、unix2dos等命令工具以及相关文档,dos2unix的源码如下所示

/*

*  Name: dos2unix

*  Documentation:

*    Remove cr ('\x0d') characters from a file.

*

*  The dos2unix package is distributed under FreeBSD style license.

*  See also http://www.freebsd.org/copyright/freebsd-license.html

*  --------

*

*  Copyright (C) 2009-2015 Erwin Waterlander

*  Copyright (C) 1998 Christian Wurll

*  Copyright (C) 1998 Bernd Johannes Wuebben

*  Copyright (C) 1994-1995 Benjamin Lin.

*  All rights reserved.

*

*  Redistribution and use in source and binary forms, with or without

*  modification, are permitted provided that the following conditions

*  are met:

*  1. Redistributions of source code must retain the above copyright

*     notice, this list of conditions and the following disclaimer.

*  2. Redistributions in binary form must reproduce the above copyright

*     notice in the documentation and/or other materials provided with

*     the distribution.

*

*  THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY

*  EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

*  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR

*  PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE

*  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR

*  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT

*  OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR

*  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,

*  WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE

*  OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN

*  IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*

*  == 1.0 == 1989.10.04 == John Birchfield (jb@koko.csustan.edu)

*  == 1.1 == 1994.12.20 == Benjamin Lin (blin@socs.uts.edu.au)

*     Cleaned up for Borland C/C++ 4.02

*  == 1.2 == 1995.03.16 == Benjamin Lin (blin@socs.uts.edu.au)

*     Modified to more conform to UNIX style.

*  == 2.0 == 1995.03.19 == Benjamin Lin (blin@socs.uts.edu.au)

*     Rewritten from scratch.

*  == 2.1 == 1995.03.29 == Benjamin Lin (blin@socs.uts.edu.au)

*     Conversion to SunOS charset implemented.

*  == 2.2 == 1995.03.30 == Benjamin Lin (blin@socs.uts.edu.au)

*     Fixed a bug in 2.1 where in new-file mode, if outfile already exists

*     conversion can not be completed properly.

*

* Added Mac text file translation, i.e. \r to \n conversion

* Bernd Johannes Wuebben, wuebben@kde.org

* Wed Feb  4 19:12:58 EST 1998

*

* Added extra newline if ^M occurs

* Christian Wurll, wurll@ira.uka.de

* Thu Nov 19 1998

*

*  See ChangeLog.txt for complete version history.

*

*/
 
 
 
/* #define DEBUG 1 */
 
#define __DOS2UNIX_C
 
 
#include "common.h"
 
#include "dos2unix.h"
 
# if (defined(_WIN32) && !defined(__CYGWIN__))
 
#include <windows.h>
 
#endif
 
#ifdef D2U_UNICODE
 
#if !defined(__MSDOS__) && !defined(_WIN32) && !defined(__OS2__)  /* Unix, Cygwin */
 
# include <langinfo.h>
 
#endif
 
#endif
 
 
void PrintLicense(void)
 
{
 
printf("%s", _("\
 
Copyright (C) 2009-2015 Erwin Waterlander\n\
 
Copyright (C) 1998      Christian Wurll (Version 3.1)\n\
 
Copyright (C) 1998      Bernd Johannes Wuebben (Version 3.0)\n\
 
Copyright (C) 1994-1995 Benjamin Lin\n\
 
All rights reserved.\n\n"));
 
PrintBSDLicense();
 
}
 
 
#ifdef D2U_UNICODE
 
wint_t StripDelimiterW(FILE* ipInF, FILE* ipOutF, CFlag *ipFlag, wint_t CurChar, unsigned int *converted, const char *progname)
 
{
 
wint_t TempNextChar;
 
/* CurChar is always CR (x0d) */
 
/* In normal dos2unix mode put nothing (skip CR). */
 
/* Don't modify Mac files when in dos2unix mode. */
 
if ( (TempNextChar = d2u_getwc(ipInF, ipFlag->bomtype)) != WEOF) {
 
if (d2u_ungetwc( TempNextChar, ipInF, ipFlag->bomtype) == WEOF) {  /* put back peek char */
 
d2u_getc_error(ipFlag,progname);
 
return WEOF;
 
}
 
if ( TempNextChar != 0x0a ) {
 
if (d2u_putwc(CurChar, ipOutF, ipFlag, progname) == WEOF) {  /* Mac line, put CR */
 
d2u_putwc_error(ipFlag,progname);
 
return WEOF;
 
}
 
} else {
 
(*converted)++;
 
if (ipFlag->NewLine) {  /* add additional LF? */
 
if (d2u_putwc(0x0a, ipOutF, ipFlag, progname) == WEOF) {
 
d2u_putwc_error(ipFlag,progname);
 
return WEOF;
 
}
 
}
 
}
 
} else {
 
if (ferror(ipInF)) {
 
d2u_getc_error(ipFlag,progname);
 
return WEOF;
 
}
 
if ( CurChar == 0x0d ) {  /* EOF: last Mac line delimiter (CR)? */
 
if (d2u_putwc(CurChar, ipOutF, ipFlag, progname) == WEOF) {
 
d2u_putwc_error(ipFlag,progname);
 
return WEOF;
 
}
 
}
 
}
 
return CurChar;
 
}
 
#endif
 
 
/* CUR        NEXT

0xd(CR)    0xa(LF)  => put LF if option -l was used

0xd(CR)  ! 0xa(LF)  => put CR

0xd(CR)    EOF      => put CR

*/
 
int StripDelimiter(FILE* ipInF, FILE* ipOutF, CFlag *ipFlag, int CurChar, unsigned int *converted, const char *progname)
 
{
 
int TempNextChar;
 
/* CurChar is always CR (x0d) */
 
/* In normal dos2unix mode put nothing (skip CR). */
 
/* Don't modify Mac files when in dos2unix mode. */
 
if ( (TempNextChar = fgetc(ipInF)) != EOF) {
 
if (ungetc( TempNextChar, ipInF ) == EOF) { /* put back peek char */
 
d2u_getc_error(ipFlag,progname);
 
return EOF;
 
}
 
if ( TempNextChar != '\x0a' ) {
 
if (fputc( CurChar, ipOutF ) == EOF) { /* Mac line, put CR */
 
d2u_putc_error(ipFlag,progname);
 
return EOF;
 
}
 
} else {
 
(*converted)++;
 
if (ipFlag->NewLine) {  /* add additional LF? */
 
if (fputc('\x0a', ipOutF) == EOF) {
 
d2u_putc_error(ipFlag,progname);
 
return EOF;
 
}
 
}
 
}
 
} else {
 
if (ferror(ipInF)) {
 
d2u_getc_error(ipFlag,progname);
 
return EOF;
 
}
 
if ( CurChar == '\x0d' ) {  /* EOF: last Mac line delimiter (CR)? */
 
if (fputc( CurChar, ipOutF ) == EOF) {
 
d2u_putc_error(ipFlag,progname);
 
return EOF;
 
}
 
}
 
}
 
return CurChar;
 
}
 
 
/* converts stream ipInF to UNIX format text and write to stream ipOutF

* RetVal: 0  if success

*         -1  otherwise

*/
 
#ifdef D2U_UNICODE
 
int ConvertDosToUnixW(FILE* ipInF, FILE* ipOutF, CFlag *ipFlag, const char *progname)
 
{
 
int RetVal = 0;
 
wint_t TempChar;
 
wint_t TempNextChar;
 
unsigned int line_nr = 1;
 
unsigned int converted = 0;
 
 
ipFlag->status = 0;
 
 
/* CR-LF -> LF */
 
/* LF    -> LF, in case the input file is a Unix text file */
 
/* CR    -> CR, in dos2unix mode (don't modify Mac file) */
 
/* CR    -> LF, in Mac mode */
 
/* \x0a = Newline/Line Feed (LF) */
 
/* \x0d = Carriage Return (CR) */
 
 
switch (ipFlag->FromToMode)
 
{
 
case FROMTO_DOS2UNIX: /* dos2unix */
 
while ((TempChar = d2u_getwc(ipInF, ipFlag->bomtype)) != WEOF) {  /* get character */
 
if ((ipFlag->Force == 0) &&
 
(TempChar < 32) &&
 
(TempChar != 0x0a) &&  /* Not an LF */
 
(TempChar != 0x0d) &&  /* Not a CR */
 
(TempChar != 0x09) &&  /* Not a TAB */
 
(TempChar != 0x0c)) {  /* Not a form feed */
 
RetVal = -1;
 
ipFlag->status |= BINARY_FILE ;
 
if (ipFlag->verbose) {
 
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Binary symbol 0x00%02X found at line %u\n"),TempChar, line_nr);
 
}
 
break;
 
}
 
if (TempChar != 0x0d) {
 
if (TempChar == 0x0a) /* Count all DOS and Unix line breaks */
 
++line_nr;
 
if (d2u_putwc(TempChar, ipOutF, ipFlag, progname) == WEOF) {
 
RetVal = -1;
 
d2u_putwc_error(ipFlag,progname);
 
break;
 
}
 
} else {
 
if (StripDelimiterW( ipInF, ipOutF, ipFlag, TempChar, &converted, progname) == WEOF) {
 
RetVal = -1;
 
break;
 
}
 
}
 
}
 
if ((TempChar == WEOF) && ferror(ipInF)) {
 
RetVal = -1;
 
d2u_getc_error(ipFlag,progname);
 
}
 
break;
 
case FROMTO_MAC2UNIX: /* mac2unix */
 
while ((TempChar = d2u_getwc(ipInF, ipFlag->bomtype)) != WEOF) {
 
if ((ipFlag->Force == 0) &&
 
(TempChar < 32) &&
 
(TempChar != 0x0a) &&  /* Not an LF */
 
(TempChar != 0x0d) &&  /* Not a CR */
 
(TempChar != 0x09) &&  /* Not a TAB */
 
(TempChar != 0x0c)) {  /* Not a form feed */
 
RetVal = -1;
 
ipFlag->status |= BINARY_FILE ;
 
if (ipFlag->verbose) {
 
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Binary symbol 0x00%02X found at line %u\n"), TempChar, line_nr);
 
}
 
break;
 
}
 
if ((TempChar != 0x0d)) {
 
if (TempChar == 0x0a) /* Count all DOS and Unix line breaks */
 
++line_nr;
 
if(d2u_putwc(TempChar, ipOutF, ipFlag, progname) == WEOF) {
 
RetVal = -1;
 
d2u_putwc_error(ipFlag,progname);
 
break;
 
}
 
}
 
else{
 
/* TempChar is a CR */
 
if ( (TempNextChar = d2u_getwc(ipInF, ipFlag->bomtype)) != WEOF) {
 
if (d2u_ungetwc( TempNextChar, ipInF, ipFlag->bomtype) == WEOF) {  /* put back peek char */
 
d2u_getc_error(ipFlag,progname);
 
RetVal = -1;
 
break;
 
}
 
/* Don't touch this delimiter if it's a CR,LF pair. */
 
if ( TempNextChar == 0x0a ) {
 
if (d2u_putwc(0x0d, ipOutF, ipFlag, progname) == WEOF) { /* put CR, part of DOS CR-LF */
 
d2u_putwc_error(ipFlag,progname);
 
RetVal = -1;
 
break;
 
}
 
continue;
 
}
 
}
 
if (d2u_putwc(0x0a, ipOutF, ipFlag, progname) == WEOF) { /* MAC line end (CR). Put LF */
 
RetVal = -1;
 
d2u_putwc_error(ipFlag,progname);
 
break;
 
}
 
converted++;
 
line_nr++; /* Count all Mac line breaks */
 
if (ipFlag->NewLine) {  /* add additional LF? */
 
if (d2u_putwc(0x0a, ipOutF, ipFlag, progname) == WEOF) {
 
RetVal = -1;
 
d2u_putwc_error(ipFlag,progname);
 
break;
 
}
 
}
 
}
 
}
 
if ((TempChar == WEOF) && ferror(ipInF)) {
 
RetVal = -1;
 
d2u_getc_error(ipFlag,progname);
 
}
 
break;
 
default: /* unknown FromToMode */
 
;
 
#if DEBUG
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("program error, invalid conversion mode %d\n"),ipFlag->FromToMode);
 
exit(1);
 
#endif
 
}
 
if (ipFlag->status & UNICODE_CONVERSION_ERROR)
 
ipFlag->line_nr = line_nr;
 
if ((RetVal == 0) && (ipFlag->verbose > 1)) {
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Converted %u out of %u line breaks.\n"), converted, line_nr -1);
 
}
 
return RetVal;
 
}
 
#endif
 
 
/* converts stream ipInF to UNIX format text and write to stream ipOutF

* RetVal: 0  if success

*         -1  otherwise

*/
 
int ConvertDosToUnix(FILE* ipInF, FILE* ipOutF, CFlag *ipFlag, const char *progname)
 
{
 
int RetVal = 0;
 
int TempChar;
 
int TempNextChar;
 
int *ConvTable;
 
unsigned int line_nr = 1;
 
unsigned int converted = 0;
 
 
ipFlag->status = 0;
 
 
switch (ipFlag->ConvMode) {
 
case CONVMODE_ASCII: /* ascii */
 
case CONVMODE_UTF16LE: /* Assume UTF-16LE, bomtype = FILE_UTF8 or GB18030 */
 
case CONVMODE_UTF16BE: /* Assume UTF-16BE, bomtype = FILE_UTF8 or GB18030 */
 
ConvTable = D2UAsciiTable;
 
break;
 
case CONVMODE_7BIT: /* 7bit */
 
ConvTable = D2U7BitTable;
 
break;
 
case CONVMODE_437: /* iso */
 
ConvTable = D2UIso437Table;
 
break;
 
case CONVMODE_850: /* iso */
 
ConvTable = D2UIso850Table;
 
break;
 
case CONVMODE_860: /* iso */
 
ConvTable = D2UIso860Table;
 
break;
 
case CONVMODE_863: /* iso */
 
ConvTable = D2UIso863Table;
 
break;
 
case CONVMODE_865: /* iso */
 
ConvTable = D2UIso865Table;
 
break;
 
case CONVMODE_1252: /* iso */
 
ConvTable = D2UIso1252Table;
 
break;
 
default: /* unknown convmode */
 
ipFlag->status |= WRONG_CODEPAGE ;
 
return(-1);
 
}
 
/* Turn off ISO and 7-bit conversion for Unicode text files */
 
if (ipFlag->bomtype > 0)
 
ConvTable = D2UAsciiTable;
 
 
if ((ipFlag->ConvMode > CONVMODE_7BIT) && (ipFlag->verbose)) { /* not ascii or 7bit */
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("using code page %d.\n"), ipFlag->ConvMode);
 
}
 
 
/* CR-LF -> LF */
 
/* LF    -> LF, in case the input file is a Unix text file */
 
/* CR    -> CR, in dos2unix mode (don't modify Mac file) */
 
/* CR    -> LF, in Mac mode */
 
/* \x0a = Newline/Line Feed (LF) */
 
/* \x0d = Carriage Return (CR) */
 
 
switch (ipFlag->FromToMode) {
 
case FROMTO_DOS2UNIX: /* dos2unix */
 
while ((TempChar = fgetc(ipInF)) != EOF) {  /* get character */
 
if ((ipFlag->Force == 0) &&
 
(TempChar < 32) &&
 
(TempChar != '\x0a') &&  /* Not an LF */
 
(TempChar != '\x0d') &&  /* Not a CR */
 
(TempChar != '\x09') &&  /* Not a TAB */
 
(TempChar != '\x0c')) {  /* Not a form feed */
 
RetVal = -1;
 
ipFlag->status |= BINARY_FILE ;
 
if (ipFlag->verbose) {
 
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Binary symbol 0x%02X found at line %u\n"),TempChar, line_nr);
 
}
 
break;
 
}
 
if (TempChar != '\x0d') {
 
if (TempChar == '\x0a') /* Count all DOS and Unix line breaks */
 
++line_nr;
 
if (fputc(ConvTable[TempChar], ipOutF) == EOF) {
 
RetVal = -1;
 
d2u_putc_error(ipFlag,progname);
 
break;
 
}
 
} else {
 
if (StripDelimiter( ipInF, ipOutF, ipFlag, TempChar, &converted, progname) == EOF) {
 
RetVal = -1;
 
break;
 
}
 
}
 
}
 
if ((TempChar == EOF) && ferror(ipInF)) {
 
RetVal = -1;
 
d2u_getc_error(ipFlag,progname);
 
}
 
break;
 
case FROMTO_MAC2UNIX: /* mac2unix */
 
while ((TempChar = fgetc(ipInF)) != EOF) {
 
if ((ipFlag->Force == 0) &&
 
(TempChar < 32) &&
 
(TempChar != '\x0a') &&  /* Not an LF */
 
(TempChar != '\x0d') &&  /* Not a CR */
 
(TempChar != '\x09') &&  /* Not a TAB */
 
(TempChar != '\x0c')) {  /* Not a form feed */
 
RetVal = -1;
 
ipFlag->status |= BINARY_FILE ;
 
if (ipFlag->verbose) {
 
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Binary symbol 0x%02X found at line %u\n"),TempChar, line_nr);
 
}
 
break;
 
}
 
if ((TempChar != '\x0d')) {
 
if (TempChar == '\x0a') /* Count all DOS and Unix line breaks */
 
++line_nr;
 
if(fputc(ConvTable[TempChar], ipOutF) == EOF) {
 
RetVal = -1;
 
d2u_putc_error(ipFlag,progname);
 
break;
 
}
 
}
 
else{
 
/* TempChar is a CR */
 
if ( (TempNextChar = fgetc(ipInF)) != EOF) {
 
if (ungetc( TempNextChar, ipInF ) == EOF) {  /* put back peek char */
 
d2u_getc_error(ipFlag,progname);
 
RetVal = -1;
 
break;
 
}
 
/* Don't touch this delimiter if it's a CR,LF pair. */
 
if ( TempNextChar == '\x0a' ) {
 
if (fputc('\x0d', ipOutF) == EOF) { /* put CR, part of DOS CR-LF */
 
RetVal = -1;
 
d2u_putc_error(ipFlag,progname);
 
break;
 
}
 
continue;
 
}
 
}
 
if (fputc('\x0a', ipOutF) == EOF) { /* MAC line end (CR). Put LF */
 
RetVal = -1;
 
d2u_putc_error(ipFlag,progname);
 
break;
 
}
 
converted++;
 
line_nr++; /* Count all Mac line breaks */
 
if (ipFlag->NewLine) {  /* add additional LF? */
 
if (fputc('\x0a', ipOutF) == EOF) {
 
RetVal = -1;
 
d2u_putc_error(ipFlag,progname);
 
break;
 
}
 
}
 
}
 
}
 
if ((TempChar == EOF) && ferror(ipInF)) {
 
RetVal = -1;
 
d2u_getc_error(ipFlag,progname);
 
}
 
break;
 
default: /* unknown FromToMode */
 
;
 
#if DEBUG
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("program error, invalid conversion mode %d\n"),ipFlag->FromToMode);
 
exit(1);
 
#endif
 
}
 
if ((RetVal == 0) && (ipFlag->verbose > 1)) {
 
d2u_fprintf(stderr, "%s: ", progname);
 
d2u_fprintf(stderr, _("Converted %u out of %u line breaks.\n"),converted, line_nr -1);
 
}
 
return RetVal;
 
}
 
 
 
int main (int argc, char *argv[])
 
{
 
/* variable declarations */
 
char progname[9];
 
CFlag *pFlag;
 
char *ptr;
 
char localedir[1024];
 
# ifdef __MINGW64__
 
int _dowildcard = -1; /* enable wildcard expansion for Win64 */
 
# endif
 
int  argc_new;
 
char **argv_new;
 
 
progname[8] = '\0';
 
strcpy(progname,"dos2unix");
 
 
#ifdef ENABLE_NLS
 
ptr = getenv("DOS2UNIX_LOCALEDIR");
 
if (ptr == NULL)
 
strcpy(localedir,LOCALEDIR);
 
else {
 
if (strlen(ptr) < sizeof(localedir))
 
strcpy(localedir,ptr);
 
else {
 
d2u_fprintf(stderr,"%s: ",progname);
 
d2u_ansi_fprintf(stderr, "%s", _("error: Value of environment variable DOS2UNIX_LOCALEDIR is too long.\n"));
 
strcpy(localedir,LOCALEDIR);
 
}
 
}
 
#endif
 
 
#if defined(ENABLE_NLS) || (defined(D2U_UNICODE) && !defined(__MSDOS__) && !defined(_WIN32) && !defined(__OS2__))
 
/* setlocale() is also needed for nl_langinfo() */
 
setlocale (LC_ALL, "");
 
#endif
 
 
#ifdef ENABLE_NLS
 
bindtextdomain (PACKAGE, localedir);
 
textdomain (PACKAGE);
 
#endif
 
 
 
/* variable initialisations */
 
pFlag = (CFlag*)malloc(sizeof(CFlag));
 
if (pFlag == NULL) {
 
d2u_fprintf(stderr, "dos2unix:");
 
d2u_ansi_fprintf(stderr, " %s\n", strerror(errno));
 
return errno;
 
}
 
pFlag->FromToMode = FROMTO_DOS2UNIX;  /* default dos2unix */
 
pFlag->keep_bom = 0;
 
 
if ( ((ptr=strrchr(argv[0],'/')) == NULL) && ((ptr=strrchr(argv[0],'\\')) == NULL) )
 
ptr = argv[0];
 
else
 
ptr++;
 
 
if ((strcmpi("mac2unix", ptr) == 0) || (strcmpi("mac2unix.exe", ptr) == 0)) {
 
pFlag->FromToMode = FROMTO_MAC2UNIX;
 
strcpy(progname,"mac2unix");
 
}
 
 
#ifdef D2U_UNIFILE
 
/* Get arguments in wide Unicode format in the Windows Command Prompt */
 
wchar_t **wargv;
 
char ***argv_glob;
 
 
/* This does not support wildcard expansion (globbing) */
 
wargv = CommandLineToArgvW(GetCommandLineW(), &argc);
 
 
argv_glob = (char ***)malloc(sizeof(char***));
 
if (argv_glob == NULL) {
 
d2u_fprintf(stderr, "%s:", progname);
 
d2u_ansi_fprintf(stderr, " %s\n", strerror(errno));
 
return errno;
 
}
 
/* Glob the arguments and convert them to UTF-8 */
 
argc_new = glob_warg(argc, wargv, argv_glob, pFlag, progname);
 
argv_new = *argv_glob;
 
#else  
 
argc_new = argc;
 
argv_new = argv;
 
#endif
 
 
#ifdef D2U_UNICODE
 
return parse_options(argc_new, argv_new, pFlag, localedir, progname, PrintLicense, ConvertDosToUnix, ConvertDosToUnixW);
 
#else
 
return parse_options(argc_new, argv_new, pFlag, localedir, progname, PrintLicense, ConvertDosToUnix);
 
#endif
 
}
 

<think>好的,我现在需要帮助用户解决关于下载Hugging Face的BERT base模型的问题。用户已经列出了模型仓库中的文件列表,并询问是否下载所有这些文件到本地文件夹就可以使用。此外,用户提到无法通过命令行或代码下载,因此需要手动下载。 首先,我需要确认用户的需求是否正确。用户希望将BERT base模型下载到本地,以便在无法使用Hugging Face提供的自动下载工具(如`transformers`库的`from_pretrained`方法)的情况下使用。手动下载确实是一个可行的替代方案,但需要确保所有必要的文件都正确下载并放置。 接下来,我需要回忆Hugging Face模型仓库的结构。通常,一个完整的模型仓库包含配置文件、模型权重文件(可能有多种格式)、分词器相关文件、许可证和README等。用户列出的文件包括`.gitattributes`、LICENSE、README.md、config.json、flax_model.msgpack、model.onnx、model.safetensors、pytorch_model.bin、rust_model.ot、tf_model.h5、tokenizer.json、tokenizer_config.json和vocab.txt。这些文件中的哪些是必须的? 关键点在于模型的核心文件:`config.json`定义了模型结构,`pytorch_model.bin`或`model.safetensors`是PyTorch的权重文件,`tf_model.h5`是TensorFlow的权重,`flax_model.msgpack`是JAX/Flax的权重。用户需要根据他们使用的框架选择对应的权重文件。此外,分词器的文件如`tokenizer.json`、`tokenizer_config.json`和`vocab.txt`也是必需的,以便正确加载分词器。 用户可能不需要所有框架的权重文件,只需下载对应他们使用的框架的那个。例如,如果用户使用PyTorch,那么`pytorch_model.bin`或`model.safetensors`是必要的,而其他框架的可以忽略。此外,`model.safetensors`是Hugging Face推荐的安全格式,可能更优选。 另外,用户提到的`.gitattributes`、LICENSE、README.md可能不是运行模型所必需的,但建议保留,尤其是LICENSE需要注意使用条款。其他如`model.onnx`可能是ONNX格式的模型,除非用户需要,否则可忽略。 因此,用户需要下载的关键文件包括:`config.json`、对应的权重文件(如PyTorch的)、分词器相关文件(`tokenizer.json`、`tokenizer_config.json`、`vocab.txt`)。其他文件可选,但下载全部不会有问题,只是占用更多空间。 接下来,用户需要将这些文件放在同一个文件夹中,并通过指定本地路径来加载模型和分词器。例如,使用`BertModel.from_pretrained('./bert-base-uncased')`,其中路径指向包含所有文件的文件夹。 需要提醒用户注意文件完整性,特别是大文件如权重文件可能下载不完整,需校验文件大小或哈希值。此外,不同框架的加载方式可能不同,需确保下载了正确的权重文件。 总结步骤:确认使用框架→下载对应权重+配置文件+分词器文件→放入同一文件夹→代码中指定本地路径加载→验证是否正常工作。</think>你需要将以下**核心文件**全部下载并放入同一个文件夹即可使用: 1. **必须下载的文件:** - ✅ `config.json` → 模型结构定义 - ✅ `pytorch_model.bin` 或 `model.safetensors` → PyTorch权重(二者选一) - ✅ `tokenizer.json` + `tokenizer_config.json` + `vocab.txt` → 分词器组件 - ✅ `tf_model.h5` → 如果你用TensorFlow - ✅ `flax_model.msgpack` → 如果你用JAX/Flax 2. **可选文件:** - ⚠️ `model.onnx` → 仅需ONNX推理时使用 - ⚠️ `rust_model.ot` → Rust场景专用 - ⚠️ `.gitattributes`/LICENSE/README → 文档类非必需 3. **操作步骤:** ```python from transformers import AutoTokenizer, AutoModel # 假设你把所有文件放在 ./bert-base 文件夹 model = AutoModel.from_pretrained("./bert-base") tokenizer = AutoTokenizer.from_pretrained("./bert-base") ``` 4. **常见问题:** - ❗ 如果报错"文件缺失",检查是否漏下配置文件 - ❗ 文件命名必须保持原始名称(不要重命名) - ❗ 推荐优先使用`model.safetensors`格式(更安全的权重格式) 5. **手动下载技巧:** - 在HuggingFace模型页面的"Files"选项卡逐个下载 - 大文件建议用浏览器下载管理器续传 - 最终文件夹结构应包含至少: ``` bert-base/ ├── config.json ├── pytorch_model.bin # 或 model.safetensors ├── tokenizer.json ├── tokenizer_config.json └── vocab.txt ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值