The Shift-To-Middle Array is a dynamic array designed to optimize insertions and deletions at both ends, offering a high-performance alternative to std::deque
, std::vector
, and linked lists. It achieves this while maintaining contiguous memory storage, improving cache locality and enabling efficient parallel processing.
✅ Amortized O(1) insertions & deletions at both ends
✅ Fast random access (O(1))
✅ Better cache locality than linked lists
✅ Supports SIMD & parallel optimizations
✅ Efficient memory usage compared to std::deque
Unlike std::deque
, which uses a fragmented block structure, the Shift-To-Middle Array dynamically redistributes space to avoid costly shifts. When resizing, elements are moved toward the middle, ensuring efficient insertions at both ends without excessive copying.
The following table compares the time complexity of Shift-To-Middle Array operations with other common data structures:
Operation | ArrayList (std::vector ) |
Linked List | Shift-To-Middle Array |
---|---|---|---|
Access (by index) | O(1) | O(n) | O(1) |
Insertion at head | O(n) | O(1) | O(1) amortized |
Insertion at tail | O(1) amortized | O(1) | O(1) amortized |
Insertion in middle | O(n) | O(n) | O(n) |
Deletion at head | O(n) | O(1) | O(1) amortized |
Deletion at tail | O(1) | O(1) | O(1) amortized |
Deletion in middle | O(n) | O(n) | O(n) |
Cache Locality | Excellent | Poor | Excellent |
Benchmarks comparing Shift-To-Middle Array vs. std::deque
vs. ExpandingRingBuffer vs. std::queue
demonstrate that performance improvements depend on CPU and GPU capabilities, such as multi-core parallelism, SIMD optimizations, and cache efficiency.
The benchmarks were compiled using GCC with the -O3
optim
18 Comments
AttilaT
I recently developed a new data structure called the Shift-To-Middle Array, designed as an alternative to std::deque, std::vector, and linked lists. My goal was to optimize insertion and deletion at both ends, while also improving cache locality and performance compared to traditional implementations.
What is the Shift-To-Middle Array?
Unlike std::deque, which uses a fragmented block-based structure, the Shift-To-Middle Array maintains a contiguous memory layout. Instead of shifting elements inefficiently (like std::vector), it dynamically redistributes free space toward the middle, reducing unnecessary data movement.
Key Features:
Fast insertions & deletions at both ends (amortized O(1))
Efficient cache utilization (better than linked lists)
Supports fast random access (O(1))
No pointer chasing (unlike linked lists)
Parallelization & SIMD optimizations possible
Performance Benchmarks
I benchmarked Shift-To-Middle Array vs. std::deque vs. ExpandingRingBuffer vs. std::queue across different workloads. Some highlights:
Push-heavy workload → Shift-To-Middle Array showed improved insertion performance over std::deque.
Pop-heavy workload → Showed improvements in memory access and removal operations.
Random insert/remove workloads → Demonstrated better cache efficiency compared to linked lists.
(Full benchmarks and source code available below.)
When Should You Use It?
High-performance queue-like structures
Game engines (handling real-time events efficiently)
Networking applications (handling packet buffers)
Dynamic sequences (e.g., computational geometry, physics sims)
Would love to hear thoughts and feedback from the community! Have you encountered similar performance bottlenecks with std::deque or other dynamic structures?
orlp
I made something similar to this ~10 years ago: https://github.com/orlp/devector. I never finished it (writing proper containers in C++ is a nightmare [1] [2] [3]), although I did start a similar project in Rust a year or two ago… which I also haven't finished yet (the repo is still private). The double-ended vector is very similar to a regular vector, it can just have free space on both ends:
In the Rust crate I store 1 pointer and three lengths: len, space_front, space_back for a total size of 32 bytes compared to the usual 24 bytes of Vec.
—
I don't think you always want to shift to the middle. Rather, I propose the following strategy (which I do in the Rust crate, unsure if I did the same in C++ implementation):
1. When a request is made for more free space on one side, check if there is already enough free space, and if not,
2. Compute an amortized growing capacity (e.g. double the current capacity), and take the maximum of that with the requested capacity. While doing this ensure you only take into account the capacity of the side you want more space on (e.g. cap_back in the above picture when growing the back),
3. Check if halving the free space on the other side is sufficient to satisfy the amortized request, if yes, do not reallocate and just shift the values internally, otherwise,
4. Allocate a new buffer with the computed capacity, plus the same amount of free space on the other side and copy over the values.
The above strategy ensures you will not exceed 3N space (with doubling space on grow) even when the double-ended vector is used in a LIFO pattern. For example a regular Vec which doubles its size has a 2N total space worst-case.
[1] https://stackoverflow.com/questions/26902006/may-the-element…
[2] https://stackoverflow.com/questions/27453230/is-there-any-wa…
[3] https://stackoverflow.com/questions/26744589/what-is-a-prope…
taco9999
What benefits does this have over a standard VecDeque?
boguscoder
is it just me or benchmarks report link is dead and hence there's no way to see the comparison
kragen
Interesting! This is the kind of thing I like.
I'm having a hard time understanding the description.
If I understand right, it's kind of like an inside-out gap buffer, or a hybrid of a gap buffer and a ring buffer? Is the free space in the array always contiguous? If not, is the non-free space? How is it different from ExpandingRingBuffer?
ksherlock
A couple notes looking at the c++ implementation
– this is going to have problems with non-trivial types. (Think about destructors or move constructors like std::unique_ptr). If you don't want to deal with them, at least add a static_assert(std::is_trivially_copyable<T>::value == true);
– front() doesn't return a reference and it doesn't even return the front
– adding iterators (begin()/end()) will let it play nice with for( : ) loops and <algorithms>, etc.
jeffzha
How does this compare to boost::devector?
https://www.boost.org/doc/libs/develop/doc/html/container/no…
rwbt
AFAIK, Apple's CoreFoundation CFArray also works similarly[0].
NSMutableArray works little differently (using a circular buffer). From the always excellent Cichenowski[1].
[0] – https://github.com/opensource-apple/CF/blob/master/CFArray.c
[1] – https://ciechanow.ski/exposing-nsmutablearray/
pcwalton
Interesting alternative idea I thought of just now: a data structure that works like VecDeque (a circular buffer) but uses mmap to map two views onto the same pages right after one another. That would ensure that the entire array can be accessed in a consecutive fashion, no matter where it gets split, without any copying. The downside is that reallocation would be really slow, involving multiple syscalls, and the minimum size of the array would be 4kB or more, depending on the page size.
pezezin
You should call it Middle-Out Array for maximum Internet points /jk
kistaro
Isn’t this already implemented by NSMutableArray in Apple SDKs/libraries?
delifue
If I keep removing one element in front and adding one element on back, then normal ring-buffer deque will involve no copying, but this will keep doing copying to empty space, so its performance could be much worse than deque if the queue is large.
dzaima
The ExpandingRingBuffer.h is rather bad for a representation of a ring buffer – it uses modulo for the index calculation, which is pretty damn bad and should really at the very least be masking by a power-of-two.
(it might very-slightly-but-not-really be excusable if the reason was memory utilization.. ..but the resize does "size_t new_capacity = capacity * 2;" so it does doubling anyway. Also, see reply noting that you don't even need power-of-two sizes for fast wrapping, which I managed to completely forget about)
nurettin
Try to turn everything into arrays. Maps, hashmaps are convenient, but if possible, sort your data and use parallel arrays. Deques of fixed size turn into ring buffers. At work we have invented several data structures over the years with weird names and they all make use of some trick to shave off memory allocation times when working with time series.
alextingle
I love how the descriptive text specifically calls it an alternative to a deque, but the complexity comparison pointedly does not compare it to a deque.
bogdan-lab
The main benefit list gives you comparing to vector is a stable memory. This is often an important property. But lists are slow with an element access. This problem is solved by deque.
Therefore, I argue that alternative to deque has to have a stable memory property. Otherwise, you can just use a vector.
This implementation is trying to do so, btw, but for some reasons it operates with a raw memory under the hood instead of just holding a vector and rotating it here and there. Such approach is unnecessary complicated and error-prone
TylerGlaiel
This implementation grows indefinitely if you repeatedly push to the head and remove from the tail, even if the max number of elements in the array is small
hoseja
I really hate how MSVC made std::deque toxic for cross-platform use, it's such a cool concept.