Performance fix
Aug. 3rd, 2016 01:02 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Here is a simple patch I made at work last week (the code below is not real):
int insert_packet_hdr_to_ring_buffer(ring_buffer *buf, packet_hdr *pkt)
{
// here goes a complex logic for lock-free ringbuffer,
// in the end we have ringbuffer pointers updated and in *pkt_ptr there is a pointer
// to store the packet header
- *pkt_ptr = *pkt;
+ memcpy(pkt_ptr, pkt, packet_len);
return 0;
}
This change brought ~5% speedup to the benchmark, replacing copying structure that spans on 2 cache lines element by element to SIMD copy from libc.
int insert_packet_hdr_to_ring_buffer(ring_buffer *buf, packet_hdr *pkt)
{
// here goes a complex logic for lock-free ringbuffer,
// in the end we have ringbuffer pointers updated and in *pkt_ptr there is a pointer
// to store the packet header
- *pkt_ptr = *pkt;
+ memcpy(pkt_ptr, pkt, packet_len);
return 0;
}
This change brought ~5% speedup to the benchmark, replacing copying structure that spans on 2 cache lines element by element to SIMD copy from libc.