Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Share This Article

Sed ut perspiciatis unde.

The Shift-To-Middle Array is a dynamic array designed to optimize insertions and deletions at both ends, offering a high-performance alternative to std::deque, std::vector, and linked lists. It achieves this while maintaining contiguous memory storage, improving cache locality and enabling efficient parallel processing.

✅ Amortized O(1) insertions & deletions at both ends
✅ Fast random access (O(1))
✅ Better cache locality than linked lists
✅ Supports SIMD & parallel optimizations
✅ Efficient memory usage compared to std::deque

Unlike std::deque, which uses a fragmented block structure, the Shift-To-Middle Array dynamically redistributes space to avoid costly shifts. When resizing, elements are moved toward the middle, ensuring efficient insertions at both ends without excessive copying.

🚀 Time Complexity Comparison

The following table compares the time complexity of Shift-To-Middle Array operations with other common data structures:

Operation	ArrayList (`std::vector`)	Linked List	Shift-To-Middle Array
Access (by index)	O(1)	O(n)	O(1)
Insertion at head	O(n)	O(1)	O(1) amortized
Insertion at tail	O(1) amortized	O(1)	O(1) amortized
Insertion in middle	O(n)	O(n)	O(n)
Deletion at head	O(n)	O(1)	O(1) amortized
Deletion at tail	O(1)	O(1)	O(1) amortized
Deletion in middle	O(n)	O(n)	O(n)
Cache Locality	Excellent	Poor	Excellent

Benchmarks comparing Shift-To-Middle Array vs. std::deque vs. ExpandingRingBuffer vs. std::queue demonstrate that performance improvements depend on CPU and GPU capabilities, such as multi-core parallelism, SIMD optimizations, and cache efficiency.

The benchmarks were compiled using GCC with the -O3 optim

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (18)

18 Comments

Post Author

AttilaT

Posted March 23, 2025 at 11:20 pm

I recently developed a new data structure called the Shift-To-Middle Array, designed as an alternative to std::deque, std::vector, and linked lists. My goal was to optimize insertion and deletion at both ends, while also improving cache locality and performance compared to traditional implementations.

What is the Shift-To-Middle Array?
Unlike std::deque, which uses a fragmented block-based structure, the Shift-To-Middle Array maintains a contiguous memory layout. Instead of shifting elements inefficiently (like std::vector), it dynamically redistributes free space toward the middle, reducing unnecessary data movement.

Key Features:
Fast insertions & deletions at both ends (amortized O(1))
Efficient cache utilization (better than linked lists)
Supports fast random access (O(1))
No pointer chasing (unlike linked lists)
Parallelization & SIMD optimizations possible

Performance Benchmarks
I benchmarked Shift-To-Middle Array vs. std::deque vs. ExpandingRingBuffer vs. std::queue across different workloads. Some highlights:

Push-heavy workload → Shift-To-Middle Array showed improved insertion performance over std::deque.

Pop-heavy workload → Showed improvements in memory access and removal operations.

Random insert/remove workloads → Demonstrated better cache efficiency compared to linked lists.

(Full benchmarks and source code available below.)

When Should You Use It?
High-performance queue-like structures

Game engines (handling real-time events efficiently)

Networking applications (handling packet buffers)

Dynamic sequences (e.g., computational geometry, physics sims)

Would love to hear thoughts and feedback from the community! Have you encountered similar performance bottlenecks with std::deque or other dynamic structures?

0Likes Log in to Reply
Post Author

orlp

Posted March 23, 2025 at 11:37 pm
I made something similar to this ~10 years ago: https://github.com/orlp/devector. I never finished it (writing proper containers in C++ is a nightmare [1] [2] [3]), although I did start a similar project in Rust a year or two ago… which I also haven't finished yet (the repo is still private). The double-ended vector is very similar to a regular vector, it can just have free space on both ends:

<------------ cap_front ------------> <------------ cap_back ------------> <----------------- total_capacity -----------------> <----- len -----> <-- space_front --> <-- space_back --> [ [ elements ] ] ^ +--- ptr

In the Rust crate I store 1 pointer and three lengths: len, space_front, space_back for a total size of 32 bytes compared to the usual 24 bytes of Vec.

—

I don't think you always want to shift to the middle. Rather, I propose the following strategy (which I do in the Rust crate, unsure if I did the same in C++ implementation):

1. When a request is made for more free space on one side, check if there is already enough free space, and if not,

2. Compute an amortized growing capacity (e.g. double the current capacity), and take the maximum of that with the requested capacity. While doing this ensure you only take into account the capacity of the side you want more space on (e.g. cap_back in the above picture when growing the back),

3. Check if halving the free space on the other side is sufficient to satisfy the amortized request, if yes, do not reallocate and just shift the values internally, otherwise,

4. Allocate a new buffer with the computed capacity, plus the same amount of free space on the other side and copy over the values.

The above strategy ensures you will not exceed 3N space (with doubling space on grow) even when the double-ended vector is used in a LIFO pattern. For example a regular Vec which doubles its size has a 2N total space worst-case.

[1] https://stackoverflow.com/questions/26902006/may-the-element…
[2] https://stackoverflow.com/questions/27453230/is-there-any-wa…
[3] https://stackoverflow.com/questions/26744589/what-is-a-prope…
0Likes Log in to Reply
Post Author

taco9999

Posted March 23, 2025 at 11:44 pm

What benefits does this have over a standard VecDeque?

0Likes Log in to Reply
Post Author

boguscoder

Posted March 23, 2025 at 11:54 pm

is it just me or benchmarks report link is dead and hence there's no way to see the comparison

0Likes Log in to Reply
Post Author

kragen

Posted March 23, 2025 at 11:58 pm

Interesting! This is the kind of thing I like.

I'm having a hard time understanding the description.
If I understand right, it's kind of like an inside-out gap buffer, or a hybrid of a gap buffer and a ring buffer? Is the free space in the array always contiguous? If not, is the non-free space? How is it different from ExpandingRingBuffer?

0Likes Log in to Reply
Post Author

ksherlock

Posted March 23, 2025 at 11:58 pm

A couple notes looking at the c++ implementation

– this is going to have problems with non-trivial types. (Think about destructors or move constructors like std::unique_ptr). If you don't want to deal with them, at least add a static_assert(std::is_trivially_copyable<T>::value == true);

– front() doesn't return a reference and it doesn't even return the front

– adding iterators (begin()/end()) will let it play nice with for( : ) loops and <algorithms>, etc.

0Likes Log in to Reply
Post Author

jeffzha

Posted March 23, 2025 at 11:59 pm

How does this compare to boost::devector?

https://www.boost.org/doc/libs/develop/doc/html/container/no…

0Likes Log in to Reply
Post Author

rwbt

Posted March 24, 2025 at 12:07 am

AFAIK, Apple's CoreFoundation CFArray also works similarly[0].
NSMutableArray works little differently (using a circular buffer). From the always excellent Cichenowski[1].

[0] – https://github.com/opensource-apple/CF/blob/master/CFArray.c
[1] – https://ciechanow.ski/exposing-nsmutablearray/

0Likes Log in to Reply
Post Author

pcwalton

Posted March 24, 2025 at 12:18 am

Interesting alternative idea I thought of just now: a data structure that works like VecDeque (a circular buffer) but uses mmap to map two views onto the same pages right after one another. That would ensure that the entire array can be accessed in a consecutive fashion, no matter where it gets split, without any copying. The downside is that reallocation would be really slow, involving multiple syscalls, and the minimum size of the array would be 4kB or more, depending on the page size.

0Likes Log in to Reply
Post Author

pezezin

Posted March 24, 2025 at 12:31 am

You should call it Middle-Out Array for maximum Internet points /jk

0Likes Log in to Reply
Post Author

kistaro

Posted March 24, 2025 at 12:49 am

Isn’t this already implemented by NSMutableArray in Apple SDKs/libraries?

0Likes Log in to Reply
Post Author

delifue

Posted March 24, 2025 at 2:43 am

If I keep removing one element in front and adding one element on back, then normal ring-buffer deque will involve no copying, but this will keep doing copying to empty space, so its performance could be much worse than deque if the queue is large.

0Likes Log in to Reply
Post Author

dzaima

Posted March 24, 2025 at 3:01 am

The ExpandingRingBuffer.h is rather bad for a representation of a ring buffer – it uses modulo for the index calculation, which is pretty damn bad and should really at the very least be masking by a power-of-two.

(it might very-slightly-but-not-really be excusable if the reason was memory utilization.. ..but the resize does "size_t new_capacity = capacity * 2;" so it does doubling anyway. Also, see reply noting that you don't even need power-of-two sizes for fast wrapping, which I managed to completely forget about)

0Likes Log in to Reply
Post Author

nurettin

Posted March 24, 2025 at 5:26 am

Try to turn everything into arrays. Maps, hashmaps are convenient, but if possible, sort your data and use parallel arrays. Deques of fixed size turn into ring buffers. At work we have invented several data structures over the years with weird names and they all make use of some trick to shave off memory allocation times when working with time series.

0Likes Log in to Reply
Post Author

alextingle

Posted March 24, 2025 at 5:32 am

I love how the descriptive text specifically calls it an alternative to a deque, but the complexity comparison pointedly does not compare it to a deque.

0Likes Log in to Reply
Post Author

bogdan-lab

Posted March 24, 2025 at 7:13 am

The main benefit list gives you comparing to vector is a stable memory. This is often an important property. But lists are slow with an element access. This problem is solved by deque.

Therefore, I argue that alternative to deque has to have a stable memory property. Otherwise, you can just use a vector.

This implementation is trying to do so, btw, but for some reasons it operates with a raw memory under the hood instead of just holding a vector and rotating it here and there. Such approach is unnecessary complicated and error-prone

0Likes Log in to Reply
Post Author

TylerGlaiel

Posted March 24, 2025 at 7:45 am

This implementation grows indefinitely if you repeatedly push to the head and remove from the tail, even if the max number of elements in the array is small

0Likes Log in to Reply
Post Author

hoseja

Posted March 24, 2025 at 8:31 am

I really hate how MSVC made std::deque toxic for cross-platform use, it's such a cool concept.

0Likes Log in to Reply

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Share This Article

Newsletter

🚀 Time Complexity Comparison

HackTech

18 Comments

AttilaT

orlp

taco9999

boguscoder

kragen

ksherlock

jeffzha

rwbt

pcwalton

pezezin

kistaro

delifue

dzaima

nurettin

alextingle

bogdan-lab

TylerGlaiel

hoseja

Leave a comment Cancel reply

Editor's Choice

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Share This Article

Newsletter

🚀 Time Complexity Comparison

18 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter