Optimise `memcmp` for speed

I saw that in other parts of the `string` module iterations over `usize` were
used to increase iteration speed.  In this patch I apply the same logic to
`memcmp`.  With this change I measured a 7x speedup for `memcmp` on a ~1MB
buffer (comparing two buffers with the same content) on my machine (i7-7500U),
but I did not do any real world benchmarking for the change.  The increase in
speed comes with the tradeoff of both increased complexity and larger generated
assembly code for the function.

I tested the correctness of the implementation by generating two randomly filled
buffers and comparing the `memcmp` result of the old implementation against this
new one.

I ran the tests and currently currently three of them fail:
  - netdb (fails to run)
  - stdio/rename (fails to verify)
  - unistd/pipe (fails to verify)

They do so though regardless of this change, so I don't think they are related.
......@@ -73,14 +73,32 @@ pub unsafe extern "C" fn memchr(s: *const c_void, c: c_int, n: usize) -> *mut c_
pub unsafe extern "C" fn memcmp(s1: *const c_void, s2: *const c_void, n: usize) -> c_int {
let mut i = 0;
while i < n {
let a = *(s1 as *const u8).offset(i as isize);
let b = *(s2 as *const u8).offset(i as isize);
if a != b {
let (div, rem) = (n / mem::size_of::<usize>(), n % mem::size_of::<usize>());
let mut a = s1 as *const usize;
let mut b = s2 as *const usize;
for _ in 0..div {
if *a != *b {
for i in 0..mem::size_of::<usize>() {
let c = *(a as *const u8).offset(i as isize);
let d = *(b as *const u8).offset(i as isize);
if c != d {
return c as i32 - d as i32;
a = a.offset(1);
b = b.offset(1);
let mut a = a as *const u8;
let mut b = b as *const u8;
for _ in 0..rem {
if *a != *b {
return a as i32 - b as i32;
i += 1;
a = a.offset(1);
b = b.offset(1);
