Vector.X = InVec.X;
Vector.Y = InVec.Y;
Vector.Z = InVec.Z;
The problem is that it simply can't track registers properly and that means it has to continually reload the base address of the object which ends up making the code twice the size.
It also insists on loading then storing floats even though its just transfering bits. You can obviously just use integer registers to transfer data and this could pipline much better than series of FPU load/stores. It's frustrating as if this was C++ I could just drop to ASM and do it myself, but in managed code your at the mercy of the JIT.
Now the JIT gets better each itteration but it doesn't appear to get better where it counts sometimes. For normal code, this simply doesn't matter, you lose around 5-10% speed max. But for code that needs to be highly optimal, this can be a real issue.