A bit of fun: fun with bits

Introduction
Most of the optimizations that go into TopCoder contests are high-level; that is, they affect the algorithm rather than the implementation. However, one of the most useful and effective low-level optimizations is bit manipulation, or using the bits of an integer to represent a set. Not only does it produce an order-of-magnitude improvement in both speed and size, it can often simplify code at the same time.

I'll start by briefly recapping the basics, before going on to cover more advanced techniques.

The basics
At the heart of bit manipulation are the bit-wise operators & (and), | (or), ~ (not) and ^ (xor). The first three you should already be familiar with in their boolean forms ( &&, || and !). As a reminder, here are the truth tables:

AB!AA && BA || BA ^ B
001000
011011
100011
110110


The bit-wise versions of the operations are the same, except that instead of interpreting their arguments as true or false, they operate on each bit of the arguments. Thus, if A is 1010 and B is 1100, then
  • A & B = 1000
  • A | B = 1110
  • A ^ B = 0110
  • ~A = 11110101 (the number of 1's depends on the type of A).
The other two operators we will need are the shift operators a << b and a >> b. The former shifts all the bits in a to the left by b positions; the latter does the same but shifts right. For non-negative values (which are the only ones we're interested in), the newly exposed bits are filled with zeros. You can think of left-shifting by b as multiplication by 2 b and right-shifting as integer division by 2 b. The most common use for shifting is to access a particular bit, for example, 1 << x is a binary number with bit x set and the others clear (bits are almost always counted from the right-most/least-significant bit, which is numbered 0).

In general, we will use an integer to represent a set on a domain of up to 32 values (or 64, using a 64-bit integer), with a 1 bit representing a member that is present and a 0 bit one that is absent. Then the following operations are quite straightforward, where ALL_BITS is a number with 1's for all bits corresponding to the elements of the domain:
Set union
A | B
Set intersection
A & B
Set subtraction
A & ~B
Set negation
ALL_BITS ^ A
Set bit
A |= 1 << bit
Clear bit
A &= ~(1 << bit)
Test bit
(A & 1 << bit) != 0
Extracting every last bit
In this section I'll consider the problems of finding the highest and lowest 1 bit in a number. These are basic operations for splitting a set into its elements.

Finding the lowest set bit turns out to be surprisingly easy, with the right combination of bitwise and arithmetic operators. Suppose we wish to find the lowest set bit of x (which is known to be non-zero). If we subtract 1 from x then this bit is cleared, but all the other one bits in x remain set. Thus, x & ~(x - 1) consists of only the lowest set bit of x. However, this only tells us the bit value, not the index of the bit.

If we want the index of the highest or lowest bit, the obvious approach is simply to loop through the bits (upwards or downwards) until we find one that is set. At first glance this sounds slow, since it does not take advantage of the bit-packing at all. However, if all 2 N subsets of the N-element domain are equally likely, then the loop will take only two iterations on average, and this is actually the fastest method.

The 386 introduced CPU instructions for bit scanning: BSF (bit scan forward) and BSR (bit scan reverse). GCC exposes these instructions through the built-in functions __builtin_ctz (count trailing zeros) and __builtin_clz (count leading zeros). These are the most convenient way to find bit indices for C++ programmers in TopCoder. Be warned though: the return value is undefined for an argument of zero.

Finally, there is a portable method that performs well in cases where the looping solution would require many iterations. Use each byte of the 4- or 8-byte integer to index a precomputed 256-entry table that stores the index of the highest (lowest) set bit in that byte. The highest (lowest) bit of the integer is then the maximum (minimum) of the table entries. This method is only mentioned for completeness, and the performance gain is unlikely to justify its use in a TopCoder match.

Counting out the bits
One can easily check if a number is a power of 2: clear the lowest 1 bit (see above) and check if the result is 0. However, sometimes it is necessary to know how many bits are set, and this is more difficult.

GCC has a function called __builtin_popcount which does precisely this. However, unlike __builtin_ctz, it does not translate into a hardware instruction (at least on x86). Instead, it uses a table-based method similar to the one described above for bit searches. It is nevertheless quite efficient and also extremely convenient.

Users of other languages do not have this option (although they could re-implement it). If a number is expected to have very few 1 bits, an alternative is to repeatedly extract the lowest 1 bit and clear it.

All the subsets
A big advantage of bit manipulation is that it is trivial to iterate over all the subsets of an N-element set: every N-bit value represents some subset. Even better, if A is a subset of B then the number representing A is less than that representing B, which is convenient for some dynamic programming solutions.

It is also possible to iterate over all the subsets of a particular subset (represented by a bit pattern), provided that you don't mind visiting them in reverse order (if this is problematic, put them in a list as they're generated, then walk the list backwards). The trick is similar to that for finding the lowest bit in a number. If we subtract 1 from a subset, then the lowest set element is cleared, and every lower element is set. However, we only want to set those lower elements that are in the superset. So the iteration step is just i = (i - 1) & superset.

Even a bit wrong scores zero
There are a few mistakes that are very easy to make when performing bit manipulations. Watch out for them in your code.
  1. When executing shift instructions for a << b, the x86 architecture uses only the bottom 5 bits of b (6 for 64-bit integers). This means that shifting left (or right) by 32 does nothing, rather than clearing all the bits. This behaviour is also specified by the Java and C# language standards; C99 says that shifting by at least the size of the value gives an undefined result. Historical trivia: the 8086 used the full shift register, and the change in behaviour was often used to detect newer processors.
  2. The & and | operators have lower precedence than comparison operators. That means that x & 3 == 1 is interpreted as x & (3 == 1), which is probably not what you want.
  3. If you want to write completely portable C/C++ code, be sure to use unsigned types, particularly if you plan to use the top-most bit. C99 says that shift operations on negative values are undefined. Java only has signed types: >> will sign-extend values (which is probably not what you want), but the Java-specific operator >>> will shift in zeros.
Cute tricks
There are a few other tricks that can be done with bit manipulation. They're good for amazing your friends, but generally not worth the effect to use in practice.
Reversing the bits in an integer
x = ((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1);
x = ((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2);
x = ((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4);
x = ((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8);
x = ((x & 0xffff0000) >> 16) | ((x & 0x0000ffff) << 16);
As an exercise, see if you can adapt this to count the number of bits in a word.
Iterate through all k-element subsets of {0, 1, … N-1}
int s = (1 << k) - 1;
while (!(s & 1 << N))
{
// do stuff with s
int lo = s & ~(s - 1); // lowest one bit
int lz = (s + lo) & ~s; // lowest zero bit above lo
s |= lz; // add lz to the set
s &= ~(lz - 1); // reset bits below lz
s |= (lz / lo / 2) - 1; // put back right number of bits at end
}
In C, the last line can be written as s |= (lz >> ffs(lo)) - 1 to avoid the division.
Evaluate x ? y : -y, where x is 0 or 1
(-x ^ y) + x
This works on a twos-complement architecture (which is almost any machine you find today), where negation is done by inverting all the bits then adding 1. Note that on i686 and above, the original expression can be evaluated just as efficiently (i.e., without branches) due to the CMOVE (conditional move) instruction.


Sample problems
TCCC 2006, Round 1B Medium
For each city, keep a bit-set of the neighbouring cities. Once the part-building factories have been chosen (recursively), ANDing together these bit-sets will give a bit-set which describes the possible locations of the part-assembly factories. If this bit-set has k bits, then there are kC m ways to allocate the part-assembly factories.

TCO 2006, Round 1 Easy
The small number of nodes strongly suggests that this is done by considering all possible subsets. For every possible subset we consider two possibilities: either the smallest-numbered node does not communicate at all, in which case we refer back to the subset that excludes it, or it communicates with some node, in which case we refer back to the subset that excludes both of these nodes. The resulting code is extremely short:
static int dp[1 << 18];

int SeparateConnections::howMany(vector <string> mat)
{
int N = mat.size();
int N2 = 1 << N;
dp[0] = 0;
for (int i = 1; i < N2; i++)
{
int bot = i & ~(i - 1);
int use = __builtin_ctz(bot);
dp[i] = dp[i ^ bot];
for (int j = use + 1; j < N; j++)
if ((i & (1 << j)) && mat[use][j] == 'Y')
dp[i] = max(dp[i], dp[i ^ bot ^ (1 << j)] + 2);
}
return dp[N2 - 1];
}
SRM 308, Division 1 Medium
The board contains 36 squares and the draughts are indistinguishable, so the possible positions can be encoded into 64-bit integers. The first step is to enumerate all the legal moves. Any legal move can be encoded using three bit-fields: a before state, an after state and a mask, which defines which parts of the before state are significant. The move can be made from the current state if (current & mask) == before; if it is made, the new state is (current & ~mask) | after.

SRM 320, Division 1 Hard
The constraints tell us that there are at most 8 columns (if there are more, we can swap rows and columns), so it is feasible to consider every possible way to lay out a row. Once we have this information, we can solve the remainder of the problem (refer to the match editorial for details). We thus need a list of all n-bit integers which do not have two adjacent 1 bits, and we also need to know how many 1 bits there are in each such row. Here is my code for this:
for (int i = 0; i < (1 << n); i++)
{
if (i & (i << 1)) continue;
pg.push_back(i);
pgb.push_back(__builtin_popcount(i));
}
By  bmerry 

width="728" scrolling="no" height="90" frameborder="0" align="middle" marginwidth="0" marginheight="0" src="http://download1.youkuaiyun.com/down3/20070601/01184120111.htm">
//########################################################################### // // FILE: F2806x_PieVect.h // // TITLE: F2806x Devices PIE Vector Table Definitions. // //########################################################################### // $TI Release: F2806x C/C++ Header Files and Peripheral Examples V136 $ // $Release Date: Apr 15, 2013 $ //########################################################################### #ifndef F2806x_PIE_VECT_H #define F2806x_PIE_VECT_H #ifdef __cplusplus extern "C" { #endif //--------------------------------------------------------------------------- // PIE Interrupt Vector Table Definition: // // Create a user type called PINT (pointer to interrupt): typedef interrupt void(*PINT)(void); // Define Vector Table: struct PIE_VECT_TABLE { // Reset is never fetched from this table. // It will always be fetched from 0x3FFFC0 in // boot ROM PINT PIE1_RESERVED; PINT PIE2_RESERVED; PINT PIE3_RESERVED; PINT PIE4_RESERVED; PINT PIE5_RESERVED; PINT PIE6_RESERVED; PINT PIE7_RESERVED; PINT PIE8_RESERVED; PINT PIE9_RESERVED; PINT PIE10_RESERVED; PINT PIE11_RESERVED; PINT PIE12_RESERVED; PINT PIE13_RESERVED; // Non-Peripheral Interrupts: PINT TINT1; // CPU-Timer1 PINT TINT2; // CPU-Timer2 PINT DATALOG; // Datalogging interrupt PINT RTOSINT; // RTOS interrupt PINT EMUINT; // Emulation interrupt PINT NMI; // Non-maskable interrupt PINT ILLEGAL; // Illegal operation TRAP PINT USER1; // User Defined trap 1 PINT USER2; // User Defined trap 2 PINT USER3; // User Defined trap 3 PINT USER4; // User Defined trap 4 PINT USER5; // User Defined trap 5 PINT USER6; // User Defined trap 6 PINT USER7; // User Defined trap 7 PINT USER8; // User Defined trap 8 PINT USER9; // User Defined trap 9 PINT USER10; // User Defined trap 10 PINT USER11; // User Defined trap 11 PINT USER12; // User Defined trap 12 // Group 1 PIE Peripheral Vectors: PINT ADCINT1; // ADC - if Group 10 ADCINT1 is enabled, this must be rsvd1_1 PINT ADCINT2; // ADC - if Group 10 ADCINT2 is enabled, this must be rsvd1_2 PINT rsvd1_3; PINT XINT1; // External Interrupt 1 PINT XINT2; // External Interrupt 2 PINT ADCINT9; // ADC 9 PINT TINT0; // Timer 0 PINT WAKEINT; // WD // Group 2 PIE Peripheral Vectors: PINT EPWM1_TZINT; // EPWM-1 PINT EPWM2_TZINT; // EPWM-2 PINT EPWM3_TZINT; // EPWM-3 PINT EPWM4_TZINT; // EPWM-4 PINT EPWM5_TZINT; // EPWM-5 PINT EPWM6_TZINT; // EPWM-6 PINT EPWM7_TZINT; // EPWM-7 PINT EPWM8_TZINT; // EPWM-8 // Group 3 PIE Peripheral Vectors: PINT EPWM1_INT; // EPWM-1 PINT EPWM2_INT; // EPWM-2 PINT EPWM3_INT; // EPWM-3 PINT EPWM4_INT; // EPWM-4 PINT EPWM5_INT; // EPWM-5 PINT EPWM6_INT; // EPWM-6 PINT EPWM7_INT; // EPWM-7 PINT EPWM8_INT; // EPWM-8 // Group 4 PIE Peripheral Vectors: PINT ECAP1_INT; // ECAP-1 PINT ECAP2_INT; // ECAP-2 PINT ECAP3_INT; // ECAP-3 PINT rsvd4_4; PINT rsvd4_5; PINT rsvd4_6; PINT HRCAP1_INT; // HRCAP-1 PINT HRCAP2_INT; // HRCAP-2 // Group 5 PIE Peripheral Vectors: PINT EQEP1_INT; // EQEP-1 PINT EQEP2_INT; // EQEP-2 PINT rsvd5_3; PINT HRCAP3_INT; // HRCAP-3 PINT HRCAP4_INT; // HRCAP-4 PINT rsvd5_6; PINT rsvd5_7; PINT USB0_INT; // USB-0 // Group 6 PIE Peripheral Vectors: PINT SPIRXINTA; // SPI-A PINT SPITXINTA; // SPI-A PINT SPIRXINTB; // SPI-B PINT SPITXINTB; // SPI-B PINT MRINTA; // McBSP-A PINT MXINTA; // McBSP-A PINT rsvd6_7; PINT rsvd6_8; // Group 7 PIE Peripheral Vectors: PINT DINTCH1; // DMA CH1 PINT DINTCH2; // DMA CH2 PINT DINTCH3; // DMA CH3 PINT DINTCH4; // DMA CH4 PINT DINTCH5; // DMA CH5 PINT DINTCH6; // DMA CH6 PINT rsvd7_7; PINT rsvd7_8; // Group 8 PIE Peripheral Vectors: PINT I2CINT1A; // I2C-A PINT I2CINT2A; // I2C-A PINT rsvd8_3; PINT rsvd8_4; PINT rsvd8_5; PINT rsvd8_6; PINT rsvd8_7; PINT rsvd8_8; // Group 9 PIE Peripheral Vectors: PINT SCIRXINTA; // SCI-A PINT SCITXINTA; // SCI-A PINT SCIRXINTB; // SCI-B PINT SCITXINTB; // SCI-B PINT ECAN0INTA; // eCAN-A PINT ECAN1INTA; // eCAN-A PINT rsvd9_7; PINT rsvd9_8; // Group 10 PIE Peripheral Vectors: PINT rsvd10_1; // Can be ADCINT1, but must make ADCINT1 in Group 1 space "reserved". PINT rsvd10_2; // Can be ADCINT2, but must make ADCINT2 in Group 1 space "reserved". PINT ADCINT3; // ADC PINT ADCINT4; // ADC PINT ADCINT5; // ADC PINT ADCINT6; // ADC PINT ADCINT7; // ADC PINT ADCINT8; // ADC // Group 11 PIE Peripheral Vectors: PINT CLA1_INT1; // CLA PINT CLA1_INT2; // CLA PINT CLA1_INT3; // CLA PINT CLA1_INT4; // CLA PINT CLA1_INT5; // CLA PINT CLA1_INT6; // CLA PINT CLA1_INT7; // CLA PINT CLA1_INT8; // CLA // Group 12 PIE Peripheral Vectors: PINT XINT3; PINT rsvd12_2; PINT rsvd12_3; PINT rsvd12_4; PINT rsvd12_5; PINT rsvd12_6; PINT LVF; // Latched overflow PINT LUF; // Latched underflow }; //--------------------------------------------------------------------------- // PIE Interrupt Vector Table External References & Function Declarations: // extern struct PIE_VECT_TABLE PieVectTable; #ifdef __cplusplus } #endif /* extern "C" */ #endif // end of F2806x_PIE_VECT_H definition //=========================================================================== // End of file. //=========================================================================== /*===================================================================================== File name: sysInit.c Originator: Motor Control Systems Group Description: ===================================================================================== History: ------------------------------------------------------------------------------------- 04-15-2005 Version 3.20 -------------------------------------------------------------------------------------*/ #ifndef _SYSINIT_C #define _SYSINIT_C #include "pmsmahf.h" // These are defined by the linker (see F2812.cmd) extern Uint16 RamfuncsLoadStart; extern Uint16 RamfuncsLoadEnd; extern Uint16 RamfuncsRunStart; /*=====================================================================================*/ /**/ /*=====================================================================================*/ void fun_sysInit() { // ****************************************** // Initialization code for DSP_TARGET = F2812 // ****************************************** // Initialize System Control registers, PLL, WatchDog, Clocks to default state: // This function is found in the DSP281x_SysCtrl.c file. InitSysCtrl(); /* // Globally synchronize all ePWM modules to the time base clock (TBCLK) EALLOW; SysCtrlRegs.PCLKCR0.bit.TBCLKSYNC = 1; EDIS; // HISPCP prescale register settings, normally it will be set to default values EALLOW; // This is needed to write to EALLOW protected registers SysCtrlRegs.HISPCP.all = 0x0000; // SYSCLKOUT/1 EDIS; // This is needed to disable write to EALLOW protected registers */ // Disable and clear all CPU interrupts: DINT; IER = 0x0000; IFR = 0x0000; // Initialize Pie Control Registers To Default State: // This function is found in the DSP281x_PieCtrl.c file. InitPieCtrl(); // Initialize the PIE Vector Table To a Known State: // This function is found in DSP281x_PieVect.c. // This function populates the PIE vector table with pointers // to the shell ISR functions found in DSP281x_DefaultIsr.c. InitPieVectTable(); // Copy time critical code and Flash setup code to RAM // This includes the following ISR functions: EvaTimer1(), EvaTimer2() // EvbTimer3 and and InitFlash(); // The RamfuncsLoadStart, RamfuncsLoadEnd, and RamfuncsRunStart // symbols are created by the linker. Refer to the F2812.cmd file. MemCopy(&RamfuncsLoadStart, &RamfuncsLoadEnd, &RamfuncsRunStart); // Call Flash Initialization to setup flash waitstates // This function must reside in RAM InitFlash(); // User specific functions, Reassign vectors (optional), Enable Interrupts: // Waiting for enable flag set // Enable CNT_zero interrupt using EPWM1 Time-base EALLOW; // This is needed to write to EALLOW protected registers EPwm1Regs.ETSEL.bit.INTEN = 1; // Enable EPWM1INT generation EPwm1Regs.ETSEL.bit.INTSEL = 1; // Enable interrupt CNT_zero event EPwm1Regs.ETPS.bit.INTPRD = 1; // Generate interrupt on the 1st event EPwm1Regs.ETCLR.bit.INT = 1; // Enable more interrupts #if HD_PHASE_OC_SOURCE==HD_PHASE_OC_HARDWARE EPwm1Regs.TZEINT.bit.OST = 1; // Enable EPWM1_TZINT generation #else EPwm1Regs.TZEINT.bit.OST = 0; // Disable EPWM1_TZINT generation #endif EPwm1Regs.TZCLR.bit.OST = 1; // Enable more interrupts EPwm1Regs.TZCLR.bit.CBC = 1; // Enable more interrupts EPwm1Regs.TZCLR.bit.INT = 1; // Enable more interrupts EDIS; // This is needed to disable write to EALLOW protected registers // Reassign ISRs. // Reassign the PIE vector for T1UFINT and CAP3INT to point to a different // ISR then the shell routine found in DSP281x_DefaultIsr.c. // This is done if the user does not want to use the shell ISR routine // but instead wants to use their own ISR. EALLOW; // This is needed to write to EALLOW protected registers PieVectTable.EPWM1_INT = &MainISR; #if HD_PHASE_OC_SOURCE==HD_PHASE_OC_HARDWARE PieVectTable.EPWM1_TZINT = &PdpintISR; #endif EDIS; // This is needed to disable write to EALLOW protected registers // Enable PIE group 3 interrupt 1 for EPWM1_INT PieCtrlRegs.PIEIER3.all = M_INT1; // Enable PIE group 2 interrupt 1 for EPWM1_TZINT PieCtrlRegs.PIEIER2.all = M_INT1; // Enable CPU INT3 for EPWM1_INT: IER |= (M_INT2 | M_INT3); // Initialize scia: /*#if PRODUCT_TYPE_ID==PRODUCT_TYPE_DRIVER_HZ gpioReg.init_HC(&gpioReg); #else*/ gpioReg.init(&gpioReg); //#endif sciCom1.LspClk = ((Uint32)SYSTEM_FREQUENCY*1000000/4); sciCom1.ComNo = SCI_COM1;//usb sciCom1.BaudRate = 9600; sciCom1.init(&sciCom1); // sciCom2.LspClk = ((Uint32)SYSTEM_FREQUENCY*1000000/4); // sciCom2.ComNo = SCI_COM2;//485 // sciCom2.BaudRate = 19200; // sciCom2.init(&sciCom2); fun_errorInit(); fun_ioSignalInit(); fun_dbCtrlInit(); fun_adInit(); fun_eepromInit(); fun_saveRomInit(); ////////////////// fun_machineLockInit(); fun_intfDataInit(); fun_internParaInit(); fun_timerInit(); // fun_masterEncInit(); fun_fanCtrl_init(); fun_sysCtrlInit(); fun_motorDriveInit(); fun_MODBUS_Init(&sciCom1); fun_overLoadInit(); fun_mtAngleOriginInit(); fun_mtoffsetTestInit(); // fun_mtoffsetTestHallInit(); fun_mtAngleInit(); fun_mtCtrlModInit(); fun_dataLogInit(); fun_trqSeltInit(); fun_spdSeltInit(); fun_posSeltInit(); fun_udcSeltInit(); fun_posGenInit(); fun_speedInit(); // fun_appCtrlEMPInit(); fun_mtCtrl_fsm_Init(); fun_appCtrlInit(); fun_runLedInit(); // fun_mtDriveOnInit(); fun_chargCtrlInit(); // fun_appCompInit(); // fun_appDensityInit(); fun_tmpMd_Init(); // fun_appTsn_init(); fun_ad_offsetCheck(); // encoder_para_init(); fun_appCtrl_fsm_Init(); fun_appCtrl_fsm_para_Init(); DI_init(); DI_para_init(); DO_led_init(); DO_init(); encoder_para_init(); //必须在fun_mtEnc_init()之后 error_init_withoutEeprom(); error_init_withEeprom(); production_init(); weftSens_init(); weftSens_para_init(); fun_phaseLackInit(); // fun_testParInit(); fun_fuzzyPIDInit(); // EvaRegs.T1CON.bit.TENABLE = 1; // Enable global Interrupts and higher priority real-time debug events: EnableDog(); EINT; // Enable Global interrupt INTM ERTM; // Enable Global realtime interrupt DBGM } #endif 为什么会触发PdpintISR()这个中断函数
07-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值