If protected-mode virtual interrupts are not enabled, STI sets the interrupt flag (IF) in the EFLAGS register. After the IF flag is set, the processor begins responding to external, maskable interrupts after the next instruction is executed. The delayed effect of this instruction is provided to allow interrupts to be enabled just before returning from a procedure (or subroutine).
Source: Intel Developer's Manual, Vol 2B.
Starting with the Pentium Pro, Intel processors have supported out-of-order execution, where instructions are not necessarily performed in the order they appear in the executable. This can cause RDTSC to be executed later than expected, producing a misleading cycle count. This problem can be solved by executing a serializing instruction, such as CPUID, to force every preceding instruction to complete before allowing the program to continue or by using RDTSCP instruction, which is a serializing variant of the RDTSC instruction (starting from Core i7).
Source: Wikipedia: Time Stamp Counter
The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix.
Source: Intel Developer's Manual, Vol 2A.