How to enable packet forwarding for IPv4 and IPv6

本文详细介绍Linux下通过/proc/sys/net/目录调整网络参数的方法,包括核心网络设置与IPv4配置,提升网络性能与安全性。

Key words:/etc/sysctl.conf /sbin/sysctl /proc/sys Linuxrc/Documentation/networking/ip-sysctl.txt


方法一
proc/sys相关参数赋值
echo 1 > /proc/sys/net/ipv6/conf/default/forwarding –IPv6
echo 1 > /proc/sys/net/ipv4/ip_forward –IPv4

方法二
修改/etc/sysctl.conf
# Uncomment the next line to enable packet forwarding for IPv4
#net.ipv4.ip_forward=1
# Uncomment the next line to enable packet forwarding for IPv6
#net.ipv6.ip_forward=1
然后
reboot or
sysctl –p /etc/sysctl.conf

 

proc/sys/net/目录说明转载自http://hi.baidu.com/hjt713/blog/item/3265e012d13e2128dc54012f.html
Linux这几年发展迅速,推出了很多版本。让我们眼花缭乱,不过在每个Linux的发行版本中,都存在一个/proc/目录,这就是Linux系统目录。有的也称它为Proc文件系统。对于那些网络管理员必须要掌握Linux系统目录。

在这个目录中,包括了一些特殊的文件,不仅能用来反映内核的现行状态和查看硬件信息,而且,有些文件还允许用户来修改其中的内容,以调节内核的现行工作状态,例如/proc/sys/子目录下的文件。

与/proc/目录中其它目录不相同的是,/proc/sys/目录下的文件不仅能提供系统的有关信息,而且还允许用户立即停止或开启内核的某些特性及功能。在/proc/sys/目录中的/proc/sys/net/子目录更是与网络息息相关,我们可以通过设置此目录下的某些文件来开启与网络应用相关的特殊功能,同时,也可以通过设置这个目录下的某些文件来保护我们的网络安全。因此,作为一名Linux下的网络管理员,就很有必要详细了解/proc/sys/net/目录下文件的各种功能和设置方法,让它能更好地为我们工作。

一、/proc/sys/net/目录说明。

/proc/sys/net/目录主要包括了许多网络相关的主题,例如:appletalk/,ethernet/,ipv4/,ipx/,及ipv6/。通过改变这些目录中的文件,网络管理员能够在系统运行时调整相关网络参数。虽然在Linux中还有很多有关网络的配置方法,但熟悉此目录中的相关内容对网络应用是有很大的帮助的。

在/proc/sys/net/目录下有两个目录,与现在的IPV4网络的运行息息相关,调整这两个目录下的某些文件的参数,能为我们的网络应用带到意想不到的效果,这两个目录就是/proc/sys/net/core/目录和/proc/sys/net/ipv4/目录,下面笔者将会对这两个目录中的重要文件分别作一个详细的说明。

1、/proc/sys/net/core/目录。

此目录中包括许多设置用来控制Linux内核与网络层的交互,即当网络有什么动作时,内核做出什么样的相应反应。

在其中有以下的一些重要文件:

(1)、message_burst:设置每十秒写入多少次请求警告;此设置可以用来防止DOS攻击,缺省设置为50;

(2)、message_cost:设置每一个警告的度量值,缺省为5,当用来防止DOS攻击时设置为0;

(3)、netdev_max_backlog:设置当个别接口接收包的速度快于内核处理速度时允许的最大的包序列,缺省为300;

(4)、optmem_max:设置每个socket的最大补助缓存大小;

(5)、rmem_default:设置接收socket的缺省缓存大小(字节);

(6)、rmem_max:设置接收socket的最大缓存大小(字节);

(7)、wmem_default:设置发送的socket缺省缓存大小(字节);

(8)、wmem_max:设置发送的socket最大缓存大小(字节)。

2、/proc/sys/net/ipv4/目录。

此目录中的内容用来添加网络设置,在其中的许多设置,可以用来阻止对系统的攻击,或用来设置系统的路由功能。

其中有以下的这些重要的文件:

(1)、icmp_destunreach_rate、icmp_echoreply_rate、icmp_paramprob_rate、icmp_timeexeed_rate:设置发送和回应的最大icmp包的速率,最好不要为0;

(2)、icmp_echo_ignore_all和icmp_echo_ignore_broadcasts:设置内核不应答icmp echo包,或指定的广播,值为0是允许回应,值为1是禁止;

(3)、ip_default_ttl:设置IP包的缺省生存时间(TTL),增加它的值能减少系统开销;

(4)、ip_forward:设置接口是否可以转发包,缺省为0,设置为1时允许网络进行包转发;

(5)、ip_local_port_range:当本地需要端口时指定TCP或UDP端口范围。第一数为低端口,第二个数为高端口;

(6)、tcp_syn_retries:提供限制在建立连接时重新发送回应的SYN包的次数;

(7)、tcp_retries1:设置回应连入重送的次数,缺省为3;

(8)、tcp_retries2:设置允许重送的TCP包的次数,缺省为15。

二、/proc/sys/net/目录下文件的设置方法。

在了解了/proc/sys/net/core/目录和/proc/sys/net/ipv4/目录中一些重要文件的意义和作用后,下面说说如何设置这两个目录中的这些重要文件来为我们工作的。

读者应该了解,在Linux系统中,要改变某种服务或设备的工作状态和功能,主要是通过使用命令方式和直接修改它的配置文件方式来达到目的,对于这两个目录下的文件,我们也可以通过这两种方式来修改这些文件内容中的值,使它们按照我们的意图工作。

在进行设置之前,应当注意的是,当你确定要修改某个文件的当前值时,一定要保证输入的命令格式和值的内容都是正确的,因为任何的错误设置都会引起内核的不稳定,如果你不小心造成了这种问题,你就不得不重新引导系统了。在下面的说明中,笔者会将注意的地方特别说明出来的。

首先来看看如何使用命令方式来修改这两个目录下文件的。我们可以通过echo和sysctl这两个命令来修改这两个目录中的文件,下面笔者将这两个命令的使用方法分别列出来。

1、sysctl命令是为设置这两个目录中的文件而定制的,它被默认安装在/sbin/目录中, 我们可以通过使用此命令来显示和设置/proc/sys/net/目录下的文件内容。例如:/sbin/sysctl -a命令用来显示此目录下的所有文件配置内容;/sbin/sysctl -w命令用来修改此目录下指定文件中的变量值,如:/sbin/sysctl -w net.ipv4.ip_forward="1"用来设置允许IP包转发。其它的参数,读者可以通过输入/sbin/sysctl -h命令来得到,在这里就不再具体全部列出了。要注意的是,这个命令的使用需要管理员权限的,如果用户不是以管理员身份登录的,在使用此命令前用SU命令得得管理权限后再操作。

2、/proc/sys/net/目录下的文件内容也可以通过用echo命令来修改。例如:echo 1 > /proc/sys/net/ipv4/ip_forward用来设置允许IP包转发;echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all用来设置不回应ICMP ECHO包。在使用echo命令时,还应特别注意此命令的输入格式,即在echo命令和值之间,以及值与在于符号(>)之间,在于符号与要修改的文件路径之间都必需有一个空格。而且,在这两个目录中的有些文件有不上一个的值,所以,如果你想一次性传递多个值,那么,每一个值之间也应保证用空格隔开。

同时也要注意的是,用此方法修改/proc/sys/net/目录下文件中的内容在系统重新启动后,所设置的内容会全部变为默认值,因此,如果要想设置的值永久有效,可以直接把这个命令加入到/ect/rc.d/rc.local文件中,在这里,这个文件的路径是指Red Hat Linux发行版本中的,其它发行版本读者根据具体情况来定。如果命令项太多,也可以把这些命令项编写成一个脚本后,加上可执行权限,再放到这个文件中,这样当系统启动时,就会按/etc/rc.d/rc.local中的设置自动执行。如果你不想修改/etc/rc.d/rc.local文件,那么笔者推荐你使用/sbin/sysctl命令。

使用命令方式设置/proc/sys/net/目录下的文件固然方便易行,但有一部分读者更喜欢直接修改它们的配置文件的方式,因为这种方式更加直观,但它比较适合对系统了解比较深的用户。

与其它服务或设备不同的是,Linux系统只为/proc/sys/net/目录提供了一个配置文件,那就是/ect/sysctl.conf,用户可以通过直接编辑/ect/sysctl.conf配置文件,来修改增加相应/proc/sys/net/目录下文件内容中的变量的值,这样当系统启动时就会读取此文件中的配置内容来设置相应的项。用vi来编辑此文件是非常简单的,此文件中内容格式也非常清晰易读,如其中有如下条目:net.ipv4.ip_forward=0,把值修改为1后就打开IP包转发。其实,用/sbin/sysctl命令修改和直接编辑 /etc/sysctl.conf文件内容具有相同的效果,因此,为了安全,推荐用户优先使用/sbin/sysctl命令方式。

到这里,想必读者已经对/proc/sys/net/目录下的/proc/sys/net/core/和/proc/sys/net/ipv4/这两个目录,已经有了一定了解了。可是,这只是笔者为了突出/proc/目录与IPV4网络的作用而特别选出来说明,其实,在/proc/目录下,还有许多的文件,虽然不能被用户如上述的两个目录那样可以被设置,但是,可以通过这些文件来了解系统详细情况和现行的运行状态,读者可以从网络上找到这个目录的详细说明。这些就是Linux系统目录介绍。

 

本文来自优快云博客,转载请标明出处:http://blog.youkuaiyun.com/outblue/archive/2010/01/11/5171192.aspx

Kconfig如下 开启哪些可以配置tcp_delack_min # # IP configuration # config IP_MULTICAST bool "IP: multicasting" help This is code for addressing several networked computers at once, enlarging your kernel by about 2 KB. You need multicasting if you intend to participate in the MBONE, a high bandwidth network on top of the Internet which carries audio and video broadcasts. More information about the MBONE is on the WWW at <http://www.savetz.com/mbone/>. For most people, it's safe to say N. config IP_ADVANCED_ROUTER bool "IP: advanced router" ---help--- If you intend to run your Linux box mostly as a router, i.e. as a computer that forwards and redistributes network packets, say Y; you will then be presented with several options that allow more precise control about the routing process. The answer to this question won't directly affect the kernel: answering N will just cause the configurator to skip all the questions about advanced routing. Note that your box can only act as a router if you enable IP forwarding in your kernel; you can do that by saying Y to "/proc file system support" and "Sysctl support" below and executing the line echo "1" > /proc/sys/net/ipv4/ip_forward at boot time after the /proc file system has been mounted. If you turn on IP forwarding, you should consider the rp_filter, which automatically rejects incoming packets if the routing table entry for their source address doesn't match the network interface they're arriving on. This has security advantages because it prevents the so-called IP spoofing, however it can pose problems if you use asymmetric routing (packets from you to a host take a different path than packets from that host to you) or if you operate a non-routing host which has several IP addresses on different interfaces. To turn rp_filter on use: echo 1 > /proc/sys/net/ipv4/conf/<device>/rp_filter or echo 1 > /proc/sys/net/ipv4/conf/all/rp_filter Note that some distributions enable it in startup scripts. For details about rp_filter strict and loose mode read <file:Documentation/networking/ip-sysctl.txt>. If unsure, say N here. config IP_FIB_TRIE_STATS bool "FIB TRIE statistics" depends on IP_ADVANCED_ROUTER ---help--- Keep track of statistics on structure of FIB TRIE table. Useful for testing and measuring TRIE performance. config IP_MULTIPLE_TABLES bool "IP: policy routing" depends on IP_ADVANCED_ROUTER select FIB_RULES ---help--- Normally, a router decides what to do with a received packet based solely on the packet's final destination address. If you say Y here, the Linux router will also be able to take the packet's source address into account. Furthermore, the TOS (Type-Of-Service) field of the packet can be used for routing decisions as well. If you need more information, see the Linux Advanced Routing and Traffic Control documentation at <http://lartc.org/howto/lartc.rpdb.html> If unsure, say N. config IP_ROUTE_MULTIPATH bool "IP: equal cost multipath" depends on IP_ADVANCED_ROUTER help Normally, the routing tables specify a single action to be taken in a deterministic manner for a given packet. If you say Y here however, it becomes possible to attach several actions to a packet pattern, in effect specifying several alternative paths to travel for those packets. The router considers all these paths to be of equal "cost" and chooses one of them in a non-deterministic fashion if a matching packet arrives. config IP_ROUTE_VERBOSE bool "IP: verbose route monitoring" depends on IP_ADVANCED_ROUTER help If you say Y here, which is recommended, then the kernel will print verbose messages regarding the routing, for example warnings about received packets which look strange and could be evidence of an attack or a misconfigured system somewhere. The information is handled by the klogd daemon which is responsible for kernel messages ("man klogd"). config IP_ROUTE_CLASSID bool config IP_PNP bool "IP: kernel level autoconfiguration" help This enables automatic configuration of IP addresses of devices and of the routing table during kernel boot, based on either information supplied on the kernel command line or by BOOTP or RARP protocols. You need to say Y only for diskless machines requiring network access to boot (in which case you want to say Y to "Root file system on NFS" as well), because all other machines configure the network in their startup scripts. config IP_PNP_DHCP bool "IP: DHCP support" depends on IP_PNP ---help--- If you want your Linux box to mount its whole root file system (the one containing the directory /) from some other computer over the net via NFS and you want the IP address of your computer to be discovered automatically at boot time using the DHCP protocol (a special protocol designed for doing this job), say Y here. In case the boot ROM of your network card was designed for booting Linux and does DHCP itself, providing all necessary information on the kernel command line, you can say N here. If unsure, say Y. Note that if you want to use DHCP, a DHCP server must be operating on your network. Read <file:Documentation/filesystems/nfs/nfsroot.txt> for details. config IP_PNP_BOOTP bool "IP: BOOTP support" depends on IP_PNP ---help--- If you want your Linux box to mount its whole root file system (the one containing the directory /) from some other computer over the net via NFS and you want the IP address of your computer to be discovered automatically at boot time using the BOOTP protocol (a special protocol designed for doing this job), say Y here. In case the boot ROM of your network card was designed for booting Linux and does BOOTP itself, providing all necessary information on the kernel command line, you can say N here. If unsure, say Y. Note that if you want to use BOOTP, a BOOTP server must be operating on your network. Read <file:Documentation/filesystems/nfs/nfsroot.txt> for details. config IP_PNP_RARP bool "IP: RARP support" depends on IP_PNP help If you want your Linux box to mount its whole root file system (the one containing the directory /) from some other computer over the net via NFS and you want the IP address of your computer to be discovered automatically at boot time using the RARP protocol (an older protocol which is being obsoleted by BOOTP and DHCP), say Y here. Note that if you want to use RARP, a RARP server must be operating on your network. Read <file:Documentation/filesystems/nfs/nfsroot.txt> for details. config NET_IPIP tristate "IP: tunneling" select INET_TUNNEL select NET_IP_TUNNEL ---help--- Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the encapsulating protocol. This particular tunneling driver implements encapsulation of IP within IP, which sounds kind of pointless, but can be useful if you want to make your (or some other) machine appear on a different network than it physically is, or to use mobile-IP facilities (allowing laptops to seamlessly move between networks without changing their IP addresses). Saying Y to this option will produce two modules ( = code which can be inserted in and removed from the running kernel whenever you want). Most people won't need this and can say N. config NET_IPGRE_DEMUX tristate "IP: GRE demultiplexer" help This is helper module to demultiplex GRE packets on GRE version field criteria. Required by ip_gre and pptp modules. config NET_IP_TUNNEL tristate select DST_CACHE select GRO_CELLS default n config NET_IPGRE tristate "IP: GRE tunnels over IP" depends on (IPV6 || IPV6=n) && NET_IPGRE_DEMUX select NET_IP_TUNNEL help Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the encapsulating protocol. This particular tunneling driver implements GRE (Generic Routing Encapsulation) and at this time allows encapsulating of IPv4 or IPv6 over existing IPv4 infrastructure. This driver is useful if the other endpoint is a Cisco router: Cisco likes GRE much better than the other Linux tunneling driver ("IP tunneling" above). In addition, GRE allows multicast redistribution through the tunnel. config NET_IPGRE_BROADCAST bool "IP: broadcast GRE over IP" depends on IP_MULTICAST && NET_IPGRE help One application of GRE/IP is to construct a broadcast WAN (Wide Area Network), which looks like a normal Ethernet LAN (Local Area Network), but can be distributed all over the Internet. If you want to do that, say Y here and to "IP multicast routing" below. config IP_MROUTE_COMMON bool depends on IP_MROUTE || IPV6_MROUTE config IP_MROUTE bool "IP: multicast routing" depends on IP_MULTICAST select IP_MROUTE_COMMON help This is used if you want your machine to act as a router for IP packets that have several destination addresses. It is needed on the MBONE, a high bandwidth network on top of the Internet which carries audio and video broadcasts. In order to do that, you would most likely run the program mrouted. If you haven't heard about it, you don't need it. config IP_MROUTE_MULTIPLE_TABLES bool "IP: multicast policy routing" depends on IP_MROUTE && IP_ADVANCED_ROUTER select FIB_RULES help Normally, a multicast router runs a userspace daemon and decides what to do with a multicast packet based on the source and destination addresses. If you say Y here, the multicast router will also be able to take interfaces and packet marks into account and run multiple instances of userspace daemons simultaneously, each one handling a single table. If unsure, say N. config IP_PIMSM_V1 bool "IP: PIM-SM version 1 support" depends on IP_MROUTE help Kernel side support for Sparse Mode PIM (Protocol Independent Multicast) version 1. This multicast routing protocol is used widely because Cisco supports it. You need special software to use it (pimd-v1). Please see <http://netweb.usc.edu/pim/> for more information about PIM. Say Y if you want to use PIM-SM v1. Note that you can say N here if you just want to use Dense Mode PIM. config IP_PIMSM_V2 bool "IP: PIM-SM version 2 support" depends on IP_MROUTE help Kernel side support for Sparse Mode PIM version 2. In order to use this, you need an experimental routing daemon supporting it (pimd or gated-5). This routing protocol is not used widely, so say N unless you want to play with it. config SYN_COOKIES bool "IP: TCP syncookie support" ---help--- Normal TCP/IP networking is open to an attack known as "SYN flooding". This denial-of-service attack prevents legitimate remote users from being able to connect to your computer during an ongoing attack and requires very little work from the attacker, who can operate from anywhere on the Internet. SYN cookies provide protection against this type of attack. If you say Y here, the TCP/IP stack will use a cryptographic challenge protocol known as "SYN cookies" to enable legitimate users to continue to connect, even when your machine is under attack. There is no need for the legitimate users to change their TCP/IP software; SYN cookies work transparently to them. For technical information about SYN cookies, check out <http://cr.yp.to/syncookies.html>. If you are SYN flooded, the source address reported by the kernel is likely to have been forged by the attacker; it is only reported as an aid in tracing the packets to their actual source and should not be taken as absolute truth. SYN cookies may prevent correct error reporting on clients when the server is really overloaded. If this happens frequently better turn them off. If you say Y here, you can disable SYN cookies at run time by saying Y to "/proc file system support" and "Sysctl support" below and executing the command echo 0 > /proc/sys/net/ipv4/tcp_syncookies after the /proc file system has been mounted. If unsure, say N. config NET_IPVTI tristate "Virtual (secure) IP: tunneling" depends on IPV6 || IPV6=n select INET_TUNNEL select NET_IP_TUNNEL depends on INET_XFRM_MODE_TUNNEL ---help--- Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the encapsulating protocol. This can be used with xfrm mode tunnel to give the notion of a secure tunnel for IPSEC and then use routing protocol on top. config NET_UDP_TUNNEL tristate select NET_IP_TUNNEL default n config NET_FOU tristate "IP: Foo (IP protocols) over UDP" select XFRM select NET_UDP_TUNNEL ---help--- Foo over UDP allows any IP protocol to be directly encapsulated over UDP include tunnels (IPIP, GRE, SIT). By encapsulating in UDP network mechanisms and optimizations for UDP (such as ECMP and RSS) can be leveraged to provide better service. config NET_FOU_IP_TUNNELS bool "IP: FOU encapsulation of IP tunnels" depends on NET_IPIP || NET_IPGRE || IPV6_SIT select NET_FOU ---help--- Allow configuration of FOU or GUE encapsulation for IP tunnels. When this option is enabled IP tunnels can be configured to use FOU or GUE encapsulation. config INET_AH tristate "IP: AH transformation" select XFRM_ALGO select CRYPTO select CRYPTO_HMAC select CRYPTO_MD5 select CRYPTO_SHA1 ---help--- Support for IPsec AH. If unsure, say Y. config INET_ESP tristate "IP: ESP transformation" select XFRM_ALGO select CRYPTO select CRYPTO_AUTHENC select CRYPTO_HMAC select CRYPTO_MD5 select CRYPTO_CBC select CRYPTO_SHA1 select CRYPTO_DES select CRYPTO_ECHAINIV ---help--- Support for IPsec ESP. If unsure, say Y. config INET_ESP_OFFLOAD tristate "IP: ESP transformation offload" depends on INET_ESP select XFRM_OFFLOAD default n ---help--- Support for ESP transformation offload. This makes sense only if this system really does IPsec and want to do it with high throughput. A typical desktop system does not need it, even if it does IPsec. If unsure, say N. config INET_IPCOMP tristate "IP: IPComp transformation" select INET_XFRM_TUNNEL select XFRM_IPCOMP ---help--- Support for IP Payload Compression Protocol (IPComp) (RFC3173), typically needed for IPsec. If unsure, say Y. config INET_TABLE_PERTURB_ORDER int "INET: Source port perturbation table size (as power of 2)" if EXPERT default 16 help Source port perturbation table size (as power of 2) for RFC 6056 3.3.4. Algorithm 4: Double-Hash Port Selection Algorithm. The default is almost always what you want. Only change this if you know what you are doing. config INET_XFRM_TUNNEL tristate select INET_TUNNEL default n config INET_TUNNEL tristate default n config INET_XFRM_MODE_TRANSPORT tristate "IP: IPsec transport mode" default y select XFRM ---help--- Support for IPsec transport mode. If unsure, say Y. config INET_XFRM_MODE_TUNNEL tristate "IP: IPsec tunnel mode" default y select XFRM ---help--- Support for IPsec tunnel mode. If unsure, say Y. config INET_XFRM_MODE_BEET tristate "IP: IPsec BEET mode" default y select XFRM ---help--- Support for IPsec BEET mode. If unsure, say Y. config INET_DIAG tristate "INET: socket monitoring interface" default y ---help--- Support for INET (TCP, DCCP, etc) socket monitoring interface used by native Linux tools such as ss. ss is included in iproute2, currently downloadable at: http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2 If unsure, say Y. config INET_TCP_DIAG depends on INET_DIAG def_tristate INET_DIAG config INET_UDP_DIAG tristate "UDP: socket monitoring interface" depends on INET_DIAG && (IPV6 || IPV6=n) default n ---help--- Support for UDP socket monitoring interface used by the ss tool. If unsure, say Y. config INET_RAW_DIAG tristate "RAW: socket monitoring interface" depends on INET_DIAG && (IPV6 || IPV6=n) default n ---help--- Support for RAW socket monitoring interface used by the ss tool. If unsure, say Y. config INET_DIAG_DESTROY bool "INET: allow privileged process to administratively close sockets" depends on INET_DIAG default n ---help--- Provides a SOCK_DESTROY operation that allows privileged processes (e.g., a connection manager or a network administration tool such as ss) to close sockets opened by other processes. Closing a socket in this way interrupts any blocking read/write/connect operations on the socket and causes future socket calls to behave as if the socket had been disconnected. If unsure, say N. menuconfig TCP_CONG_ADVANCED bool "TCP: advanced congestion control" ---help--- Support for selection of various TCP congestion control modules. Nearly all users can safely say no here, and a safe default selection will be made (CUBIC with new Reno as a fallback). If unsure, say N. if TCP_CONG_ADVANCED config TCP_CONG_BIC tristate "Binary Increase Congestion (BIC) control" default m ---help--- BIC-TCP is a sender-side only change that ensures a linear RTT fairness under large windows while offering both scalability and bounded TCP-friendliness. The protocol combines two schemes called additive increase and binary search increase. When the congestion window is large, additive increase with a large increment ensures linear RTT fairness as well as good scalability. Under small congestion windows, binary search increase provides TCP friendliness. See http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/ config TCP_CONG_CUBIC tristate "CUBIC TCP" default y ---help--- This is version 2.0 of BIC-TCP which uses a cubic growth function among other techniques. See http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf config TCP_CONG_WESTWOOD tristate "TCP Westwood+" default m ---help--- TCP Westwood+ is a sender-side only modification of the TCP Reno protocol stack that optimizes the performance of TCP congestion control. It is based on end-to-end bandwidth estimation to set congestion window and slow start threshold after a congestion episode. Using this estimation, TCP Westwood+ adaptively sets a slow start threshold and a congestion window which takes into account the bandwidth used at the time congestion is experienced. TCP Westwood+ significantly increases fairness wrt TCP Reno in wired networks and throughput over wireless links. config TCP_CONG_HTCP tristate "H-TCP" default m ---help--- H-TCP is a send-side only modifications of the TCP Reno protocol stack that optimizes the performance of TCP congestion control for high speed network links. It uses a modeswitch to change the alpha and beta parameters of TCP Reno based on network conditions and in a way so as to be fair with other Reno and H-TCP flows. config TCP_CONG_HSTCP tristate "High Speed TCP" default n ---help--- Sally Floyd's High Speed TCP (RFC 3649) congestion control. A modification to TCP's congestion control mechanism for use with large congestion windows. A table indicates how much to increase the congestion window by when an ACK is received. For more detail see http://www.icir.org/floyd/hstcp.html config TCP_CONG_HYBLA tristate "TCP-Hybla congestion control algorithm" default n ---help--- TCP-Hybla is a sender-side only change that eliminates penalization of long-RTT, large-bandwidth connections, like when satellite legs are involved, especially when sharing a common bottleneck with normal terrestrial connections. config TCP_CONG_VEGAS tristate "TCP Vegas" default n ---help--- TCP Vegas is a sender-side only change to TCP that anticipates the onset of congestion by estimating the bandwidth. TCP Vegas adjusts the sending rate by modifying the congestion window. TCP Vegas should provide less packet loss, but it is not as aggressive as TCP Reno. config TCP_CONG_NV tristate "TCP NV" default n ---help--- TCP NV is a follow up to TCP Vegas. It has been modified to deal with 10G networks, measurement noise introduced by LRO, GRO and interrupt coalescence. In addition, it will decrease its cwnd multiplicatively instead of linearly. Note that in general congestion avoidance (cwnd decreased when # packets queued grows) cannot coexist with congestion control (cwnd decreased only when there is packet loss) due to fairness issues. One scenario when they can coexist safely is when the CA flows have RTTs << CC flows RTTs. For further details see http://www.brakmo.org/networking/tcp-nv/ config TCP_CONG_SCALABLE tristate "Scalable TCP" default n ---help--- Scalable TCP is a sender-side only change to TCP which uses a MIMD congestion control algorithm which has some nice scaling properties, though is known to have fairness issues. See http://www.deneholme.net/tom/scalable/ config TCP_CONG_LP tristate "TCP Low Priority" default n ---help--- TCP Low Priority (TCP-LP), a distributed algorithm whose goal is to utilize only the excess network bandwidth as compared to the ``fair share`` of bandwidth as targeted by TCP. See http://www-ece.rice.edu/networks/TCP-LP/ config TCP_CONG_VENO tristate "TCP Veno" default n ---help--- TCP Veno is a sender-side only enhancement of TCP to obtain better throughput over wireless networks. TCP Veno makes use of state distinguishing to circumvent the difficult judgment of the packet loss type. TCP Veno cuts down less congestion window in response to random loss packets. See <http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1177186> config TCP_CONG_YEAH tristate "YeAH TCP" select TCP_CONG_VEGAS default n ---help--- YeAH-TCP is a sender-side high-speed enabled TCP congestion control algorithm, which uses a mixed loss/delay approach to compute the congestion window. It's design goals target high efficiency, internal, RTT and Reno fairness, resilience to link loss while keeping network elements load as low as possible. For further details look here: http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf config TCP_CONG_ILLINOIS tristate "TCP Illinois" default n ---help--- TCP-Illinois is a sender-side modification of TCP Reno for high speed long delay links. It uses round-trip-time to adjust the alpha and beta parameters to achieve a higher average throughput and maintain fairness. For further details see: http://www.ews.uiuc.edu/~shaoliu/tcpillinois/index.html config TCP_CONG_DCTCP tristate "DataCenter TCP (DCTCP)" default n ---help--- DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. It is designed to provide: - High burst tolerance (incast due to partition/aggregate), - Low latency (short flows, queries), - High throughput (continuous data updates, large file transfers) with commodity, shallow-buffered switches. All switches in the data center network running DCTCP must support ECN marking and be configured for marking when reaching defined switch buffer thresholds. The default ECN marking threshold heuristic for DCTCP on switches is 20 packets (30KB) at 1Gbps, and 65 packets (~100KB) at 10Gbps, but might need further careful tweaking. For further details see: http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf config TCP_CONG_CDG tristate "CAIA Delay-Gradient (CDG)" default n ---help--- CAIA Delay-Gradient (CDG) is a TCP congestion control that modifies the TCP sender in order to: o Use the delay gradient as a congestion signal. o Back off with an average probability that is independent of the RTT. o Coexist with flows that use loss-based congestion control. o Tolerate packet loss unrelated to congestion. For further details see: D.A. Hayes and G. Armitage. "Revisiting TCP congestion control using delay gradients." In Networking 2011. Preprint: http://goo.gl/No3vdg config TCP_CONG_BBR tristate "BBR TCP" default n ---help--- BBR (Bottleneck Bandwidth and RTT) TCP congestion control aims to maximize network utilization and minimize queues. It builds an explicit model of the the bottleneck delivery rate and path round-trip propagation delay. It tolerates packet loss and delay unrelated to congestion. It can operate over LAN, WAN, cellular, wifi, or cable modem links. It can coexist with flows that use loss-based congestion control, and can operate with shallow buffers, deep buffers, bufferbloat, policers, or AQM schemes that do not provide a delay signal. It requires the fq ("Fair Queue") pacing packet scheduler. #if defined(CONFIG_BCM_KF_MPTCP) && defined(CONFIG_BCM_MPTCP) config TCP_CONG_LIA tristate "MPTCP Linked Increase" depends on MPTCP default n ---help--- MultiPath TCP Linked Increase Congestion Control To enable it, just put 'lia' in tcp_congestion_control config TCP_CONG_OLIA tristate "MPTCP Opportunistic Linked Increase" depends on MPTCP default n ---help--- MultiPath TCP Opportunistic Linked Increase Congestion Control To enable it, just put 'olia' in tcp_congestion_control config TCP_CONG_WVEGAS tristate "MPTCP WVEGAS CONGESTION CONTROL" depends on MPTCP default n ---help--- wVegas congestion control for MPTCP To enable it, just put 'wvegas' in tcp_congestion_control config TCP_CONG_BALIA tristate "MPTCP BALIA CONGESTION CONTROL" depends on MPTCP default n ---help--- Multipath TCP Balanced Linked Adaptation Congestion Control To enable it, just put 'balia' in tcp_congestion_control config TCP_CONG_MCTCPDESYNC tristate "DESYNCHRONIZED MCTCP CONGESTION CONTROL (EXPERIMENTAL)" depends on MPTCP default n ---help--- Desynchronized MultiChannel TCP Congestion Control. This is experimental code that only supports single path and must have set mptcp_ndiffports larger than one. To enable it, just put 'mctcpdesync' in tcp_congestion_control For further details see: http://ieeexplore.ieee.org/abstract/document/6911722/ https://doi.org/10.1016/j.comcom.2015.07.010 #endif choice prompt "Default TCP congestion control" default DEFAULT_CUBIC help Select the TCP congestion control that will be used by default for all connections. config DEFAULT_BIC bool "Bic" if TCP_CONG_BIC=y config DEFAULT_CUBIC bool "Cubic" if TCP_CONG_CUBIC=y config DEFAULT_HTCP bool "Htcp" if TCP_CONG_HTCP=y config DEFAULT_HYBLA bool "Hybla" if TCP_CONG_HYBLA=y config DEFAULT_VEGAS bool "Vegas" if TCP_CONG_VEGAS=y config DEFAULT_VENO bool "Veno" if TCP_CONG_VENO=y config DEFAULT_WESTWOOD bool "Westwood" if TCP_CONG_WESTWOOD=y config DEFAULT_DCTCP bool "DCTCP" if TCP_CONG_DCTCP=y config DEFAULT_CDG bool "CDG" if TCP_CONG_CDG=y config DEFAULT_BBR bool "BBR" if TCP_CONG_BBR=y #if defined(CONFIG_BCM_KF_MPTCP) && defined(CONFIG_BCM_MPTCP) config DEFAULT_LIA bool "Lia" if TCP_CONG_LIA=y config DEFAULT_OLIA bool "Olia" if TCP_CONG_OLIA=y config DEFAULT_WVEGAS bool "Wvegas" if TCP_CONG_WVEGAS=y config DEFAULT_BALIA bool "Balia" if TCP_CONG_BALIA=y config DEFAULT_MCTCPDESYNC bool "Mctcpdesync (EXPERIMENTAL)" if TCP_CONG_MCTCPDESYNC=y #endif config DEFAULT_RENO bool "Reno" endchoice endif config TCP_CONG_CUBIC tristate depends on !TCP_CONG_ADVANCED default y config DEFAULT_TCP_CONG string default "bic" if DEFAULT_BIC default "cubic" if DEFAULT_CUBIC default "htcp" if DEFAULT_HTCP default "hybla" if DEFAULT_HYBLA default "vegas" if DEFAULT_VEGAS default "westwood" if DEFAULT_WESTWOOD default "veno" if DEFAULT_VENO #if defined(CONFIG_BCM_KF_MPTCP) && defined(CONFIG_BCM_MPTCP) default "lia" if DEFAULT_LIA default "olia" if DEFAULT_OLIA default "wvegas" if DEFAULT_WVEGAS default "balia" if DEFAULT_BALIA #endif default "reno" if DEFAULT_RENO default "dctcp" if DEFAULT_DCTCP default "cdg" if DEFAULT_CDG default "bbr" if DEFAULT_BBR default "cubic" config TCP_MD5SIG bool "TCP: MD5 Signature Option support (RFC2385)" select CRYPTO select CRYPTO_MD5 ---help--- RFC2385 specifies a method of giving MD5 protection to TCP sessions. Its main (only?) use is to protect BGP sessions between core routers on the Internet. If unsure, say N.
10-29
Control Traffic Trapping/Mirroring to the CPU Protocols implemented in software require specific control traffic to be trapped or mirrored to the CPU. The device provides trapping/mirroring mechanisms for many well-known control traffic protocols. Other control traffic that does not have specific support can be trapped/mirrored to the CPU by the Policy engine. A global configuration disables the following controls for packets that were dropped by the Spanning Tree:  IGMP  MLD and Other IPv6 ICMP  UDP Broadcast Mirror/Trap (UDP Relay)  IP Interface Control Traffic Configuration To disable L3 Controls for packets that were dropped by the Spanning Tree, set the <DisableIPControltoCPUforSTP> field in the Bridge Global Configuration1 Register (Table 320 p. 2218). 10.8.1 Layer 2 Interface Unicast Management Traffic Layer 2 interface Unicast management traffic arrives with a MAC DA set to the management MAC address on a given eVLAN interface. Typical examples of Unicast management traffic are:  IP management protocols like SNMP, HTTP, and ICMP  ARP Reply packets To direct Unicast Management Traffic to the CPU, the FDB is configured with an entry as follows:  <MAC Address> = management Unicast MAC Address  <FID> = The Forwarding ID of the management VLAN interface  <DA Command> = FORWARD  <ePort number> = CPU port 63 (refer to Section 23.1, CPU Port)  <Application Specific CPU Code Enable> =1  <static> = 1 Packets with MAC DA matching the management Unicast MAC Address are sent to the CPU with the CPU code BRIDGED_PACKET_FORWARD. However, if the FDB entry <Application Specific CPU Code Enable> is set, the CPU code may be overridden by a finer-grain CPU code. For example, if the packet is an ARP reply, its CPU code assignment can be overridden with the specific CPU code ARP_REPLY_TO_ME. This is performed in the Pre-Egress stage of the Ingress pipeline (Section 23.2.4, Application-Specific CPU Codes). Refer to Section 10.4.7.2, FDB Table Read/Write Access for details on how the FDB is updated. Note When the FDB entry <port number> is set to the CPU port 63, the FDB entry <device number> is not relevant, as the packet is forwarded to the target device according to the CPU code table configuration, as described in Section 23.2.5, CPU Code Table. IPv4/6 and FCoE Interface for Routed Traffic Bridged traffic with a MAC DA of the router interface is subject to processing by the Router engine, per the Router Engine Trigger (refer to Section 11.2, Router Engine Trigger). This traffic may be inbound management traffic with the Unicast destination IP address of the router, or other IP traffic that requires routing to its next-hop interface. It may also be FCoE traffic that is subjected to FCoE Forwarding (FCF) by the Router engine. The FDB is configured with a static Unicast Router MAC address for each eVLAN interface enabled for routing. The FDB Router MAC entry is configured as follows:  <MAC Address> = Unicast Router MAC address  <FID> = The Forwarding ID of the management VLAN interface  <static> = 1  <DA Route>=1, indicating that the entry Unicast MAC Address is the Router MAC  <DA Command>=FORWARD  <Device Number> = local device  <ePort Number>= CPU port (63)  <Application Specific CPU Code Enable> =1 The Router MAC FDB entry serves two purposes:  The <DA Route> bit serves as input to the Router engine trigger check to determine whether the packet is eligible for routing. If the packet is subsequently routed by the Router engine, the bridge forwarding decision (CPU port 63) is overridden by the router next-hop information.  If the packet does not trigger the Router engine for any other reason (for example, packet is not non-IP), the packet is sent to CPU port 63, with the CPU code BRIDGED_PACKET_FORWARD. However, if the FDB entry <Application Specific CPU Code Enable> is set, the CPU code may be overridden by a finer-grain CPU code, for example, if the packet is an ARP reply, its CPU code assignment can be overridden with the specific CPU code ARP_REPLY_TO_ME. This is performed in the Pre-Egress stage of the Ingress pipeline (Section 23.2.4, Application-Specific CPU Codes). 10.8.3 IEEE Reserved Multicast IEEE 802.1D/Q defines the following reserved Multicast MAC ranges:  Bridge Standard Protocols: 01-80-C2-00-00-00 - 01-80-C2-00-00-0F  Bridge GARP Applications: 01-80-C2-00-00-20 - 01-80-C2-00-00-2F Common addresses that fall into these ranges are listed in Table 28. Table 28: IEEE Reserved Multicast Addresses Protocol Identified by IEEE 802.1Q BPDU DA=01-80-C2-00-00-00 IEEE 802.3 Slow Protocols (for example, LACP) DA=01-80-C2-00-00-02 IEEE 802.1X PAE address DA=01-80-C2-00-00-03 IEEE 802.1Q Provider BPDU DA=01-80-C2-00-00-08 IEEE 802.1Q Provider Bridge GVRP DA=01-80-C2-00-00-0D IEEE 802.1AB LLDP DA=01-80-C2-00-00-0E As a generic mechanism to trap or mirror the above reserved IEEE ranges and possible future IEEE protocols, the device supports 8 IEEE Reserved Multicast Command tables. A packet is considered an IEEE Reserved Multicast packet if its MAC destination address is in the range 01-80-C2-00-00-XX. Each ePort is configured to utilize one of these 8 tables to process IEEE Reserved Multicast packets. If a packet is identified as an IEEE Reserved Multicast packet (its MAC destination address is in the range 01-80-C2-00-00-XX) then:  The respective IEEE Reserved Multicast command table is indexed by the last byte of the Multicast address. The command for each entry can be set to: • SOFT DROP • FORWARD • MIRROR • TRAP  A configuration (indexed by the last byte of the Multicast address) defines if the packet is treated as Registered MC (refer to Section 10.14.1.3, Bridge Phase 1 Modification of the Unregistered/Unknown Status). To enable differentiation between different types of IEEE Reserved Multicast packets that are either trapped or mirrored to the CPU, there is a configurable 2-bit <CPU code index>, described in Table 29, assigned to the packet based on table entry index, meaning that the value of the last byte of the MAC destination address. The same <CPU Code Index> assignment is applied independently of the binding to one of the 8 IEEE Reserved Multicast tables. Configuration  To configure the IEEE Reserved Multicast command, set the respective IEEE Reserved Multicast table entry via one of the 8 available tables: • IEEE Reserved Multicast Configuration0 <n> Register (n=0–15) (Table 335 p. 2232) • IEEE Reserved Multicast Configuration1 <n> Register (n=0–15) (Table 336 p. 2234) • IEEE Reserved Multicast Configuration2 <n> Register (n=0–15) (Table 337 p. 2236) IEEE 802.1D GMRP DA=01-80-C2-00-00-20 IEEE 802.1Q GVRP DA=01-80-C2-00-00-21 Table 29: IEEE Reserved Multicast CPU Code Assignment <CPU Code Index> CPU Code Assignment 0 IEEE_RES_MC_ ADDR_TRAP/MIRROR_0 1 IEEE_RES_MC_ ADDR_TRAP/MIRROR_1 2 IEEE_RES_MC_ ADDR_TRAP/MIRROR_2 3 IEEE_RES_MC_ ADDR_TRAP/MIRROR_3 IEEE Reserved Multicast Configuration3 <n> Register (n=0–15) (Table 338 p. 2238) • IEEE Reserved Multicast Configuration4 <n> Register (n=0–15) (Table 339 p. 2241) • IEEE Reserved Multicast Configuration5 <n> Register (n=0–15) (Table 340 p. 2243) • IEEE Reserved Multicast Configuration6 <n> Register (n=0–15) (Table 341 p. 2245) • IEEE Reserved Multicast Configuration7 <n> Register (n=0–15) (Table 342 p. 2247)  To configure for IEEE Reserved Multicast address, if packets to that address are considered registered, set the respective bit in the IEEE Reserved Multicast Registered <n> (n=0–7) (Table 344 p. 2250)  Select one of the 8 tables to be utilized by each ePort by configuring the <IEEE Reserved MC table select> field in the Ingress Bridge ePort Table (Table 392 p. 2333).  To set the CPU code index for each IEEE Reserved Multicast address, configure the relevant fields in IEEE Reserved Multicast CPU Index <n> Register (n=0–15) (Table 343 p. 2249). 10.8.3.1 FDB Learning of Trapped or Dropped IEEE Reserved Multicast Packets When two bridges are connected to each other by more than a single port, IEEE-reserved MC packets are received by the bridge on all ports connecting between the two bridges. The IEEE 802.1Q standard requires the bridges to send the IEEE-reserved MC packets with unique MAC SA per port. There are bridges that use the same MAC SA for all the ports. This triggers continuous moved address indications and related processing in the receiving bridge. To protect those bridges from constantly processing the moved address for IEEE-reserved MC packets, the user can configure the device not to perform MAC SA learning for reserved IEEE MC packets that are dropped or trapped to the CPU. Configuration To enable/disable this feature, use the<EnLearnOnTrapIEEEReservedMC> field in the Ingress Bridge ePort Table (Table 392 p. 2333).翻译一下
最新发布
11-18
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值