Skip to content Skip to footer
0 items - $0.00 0

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

Shift-to-Middle Array: A Faster Alternative to Std:Deque? by AttilaT

18 Comments

  • Post Author
    AttilaT
    Posted March 23, 2025 at 11:20 pm

    I recently developed a new data structure called the Shift-To-Middle Array, designed as an alternative to std::deque, std::vector, and linked lists. My goal was to optimize insertion and deletion at both ends, while also improving cache locality and performance compared to traditional implementations.

    What is the Shift-To-Middle Array?
    Unlike std::deque, which uses a fragmented block-based structure, the Shift-To-Middle Array maintains a contiguous memory layout. Instead of shifting elements inefficiently (like std::vector), it dynamically redistributes free space toward the middle, reducing unnecessary data movement.

    Key Features:
    Fast insertions & deletions at both ends (amortized O(1))
    Efficient cache utilization (better than linked lists)
    Supports fast random access (O(1))
    No pointer chasing (unlike linked lists)
    Parallelization & SIMD optimizations possible

    Performance Benchmarks
    I benchmarked Shift-To-Middle Array vs. std::deque vs. ExpandingRingBuffer vs. std::queue across different workloads. Some highlights:

    Push-heavy workload → Shift-To-Middle Array showed improved insertion performance over std::deque.

    Pop-heavy workload → Showed improvements in memory access and removal operations.

    Random insert/remove workloads → Demonstrated better cache efficiency compared to linked lists.

    (Full benchmarks and source code available below.)

    When Should You Use It?
    High-performance queue-like structures

    Game engines (handling real-time events efficiently)

    Networking applications (handling packet buffers)

    Dynamic sequences (e.g., computational geometry, physics sims)

    Would love to hear thoughts and feedback from the community! Have you encountered similar performance bottlenecks with std::deque or other dynamic structures?

  • Post Author
    orlp
    Posted March 23, 2025 at 11:37 pm

    I made something similar to this ~10 years ago: https://github.com/orlp/devector. I never finished it (writing proper containers in C++ is a nightmare [1] [2] [3]), although I did start a similar project in Rust a year or two ago… which I also haven't finished yet (the repo is still private). The double-ended vector is very similar to a regular vector, it can just have free space on both ends:

        <------------ cap_front ------------>
                          <------------ cap_back ------------>
        <-----------------  total_capacity  ----------------->
                          <-----  len  ----->
        <-- space_front -->                 <-- space_back -->
        [                 [    elements     ]                ]
                          ^
                          +--- ptr
    

    In the Rust crate I store 1 pointer and three lengths: len, space_front, space_back for a total size of 32 bytes compared to the usual 24 bytes of Vec.

    I don't think you always want to shift to the middle. Rather, I propose the following strategy (which I do in the Rust crate, unsure if I did the same in C++ implementation):

    1. When a request is made for more free space on one side, check if there is already enough free space, and if not,

    2. Compute an amortized growing capacity (e.g. double the current capacity), and take the maximum of that with the requested capacity. While doing this ensure you only take into account the capacity of the side you want more space on (e.g. cap_back in the above picture when growing the back),

    3. Check if halving the free space on the other side is sufficient to satisfy the amortized request, if yes, do not reallocate and just shift the values internally, otherwise,

    4. Allocate a new buffer with the computed capacity, plus the same amount of free space on the other side and copy over the values.

    The above strategy ensures you will not exceed 3N space (with doubling space on grow) even when the double-ended vector is used in a LIFO pattern. For example a regular Vec which doubles its size has a 2N total space worst-case.

    [1] https://stackoverflow.com/questions/26902006/may-the-element…
    [2] https://stackoverflow.com/questions/27453230/is-there-any-wa…
    [3] https://stackoverflow.com/questions/26744589/what-is-a-prope…

  • Post Author
    taco9999
    Posted March 23, 2025 at 11:44 pm

    What benefits does this have over a standard VecDeque?

  • Post Author
    boguscoder
    Posted March 23, 2025 at 11:54 pm

    is it just me or benchmarks report link is dead and hence there's no way to see the comparison

  • Post Author
    kragen
    Posted March 23, 2025 at 11:58 pm

    Interesting! This is the kind of thing I like.

    I'm having a hard time understanding the description.
    If I understand right, it's kind of like an inside-out gap buffer, or a hybrid of a gap buffer and a ring buffer? Is the free space in the array always contiguous? If not, is the non-free space? How is it different from ExpandingRingBuffer?

  • Post Author
    ksherlock
    Posted March 23, 2025 at 11:58 pm

    A couple notes looking at the c++ implementation

    – this is going to have problems with non-trivial types. (Think about destructors or move constructors like std::unique_ptr). If you don't want to deal with them, at least add a static_assert(std::is_trivially_copyable<T>::value == true);

    – front() doesn't return a reference and it doesn't even return the front

    – adding iterators (begin()/end()) will let it play nice with for( : ) loops and <algorithms>, etc.

  • Post Author
    jeffzha
    Posted March 23, 2025 at 11:59 pm

    How does this compare to boost::devector?

    https://www.boost.org/doc/libs/develop/doc/html/container/no…

  • Post Author
    rwbt
    Posted March 24, 2025 at 12:07 am

    AFAIK, Apple's CoreFoundation CFArray also works similarly[0].
    NSMutableArray works little differently (using a circular buffer). From the always excellent Cichenowski[1].

    [0] – https://github.com/opensource-apple/CF/blob/master/CFArray.c
    [1] – https://ciechanow.ski/exposing-nsmutablearray/

  • Post Author
    pcwalton
    Posted March 24, 2025 at 12:18 am

    Interesting alternative idea I thought of just now: a data structure that works like VecDeque (a circular buffer) but uses mmap to map two views onto the same pages right after one another. That would ensure that the entire array can be accessed in a consecutive fashion, no matter where it gets split, without any copying. The downside is that reallocation would be really slow, involving multiple syscalls, and the minimum size of the array would be 4kB or more, depending on the page size.

  • Post Author
    pezezin
    Posted March 24, 2025 at 12:31 am

    You should call it Middle-Out Array for maximum Internet points /jk

  • Post Author
    kistaro
    Posted March 24, 2025 at 12:49 am

    Isn’t this already implemented by NSMutableArray in Apple SDKs/libraries?

  • Post Author
    delifue
    Posted March 24, 2025 at 2:43 am

    If I keep removing one element in front and adding one element on back, then normal ring-buffer deque will involve no copying, but this will keep doing copying to empty space, so its performance could be much worse than deque if the queue is large.

  • Post Author
    dzaima
    Posted March 24, 2025 at 3:01 am

    The ExpandingRingBuffer.h is rather bad for a representation of a ring buffer – it uses modulo for the index calculation, which is pretty damn bad and should really at the very least be masking by a power-of-two.

    (it might very-slightly-but-not-really be excusable if the reason was memory utilization.. ..but the resize does "size_t new_capacity = capacity * 2;" so it does doubling anyway. Also, see reply noting that you don't even need power-of-two sizes for fast wrapping, which I managed to completely forget about)

  • Post Author
    nurettin
    Posted March 24, 2025 at 5:26 am

    Try to turn everything into arrays. Maps, hashmaps are convenient, but if possible, sort your data and use parallel arrays. Deques of fixed size turn into ring buffers. At work we have invented several data structures over the years with weird names and they all make use of some trick to shave off memory allocation times when working with time series.

  • Post Author
    alextingle
    Posted March 24, 2025 at 5:32 am

    I love how the descriptive text specifically calls it an alternative to a deque, but the complexity comparison pointedly does not compare it to a deque.

  • Post Author
    bogdan-lab
    Posted March 24, 2025 at 7:13 am

    The main benefit list gives you comparing to vector is a stable memory. This is often an important property. But lists are slow with an element access. This problem is solved by deque.

    Therefore, I argue that alternative to deque has to have a stable memory property. Otherwise, you can just use a vector.

    This implementation is trying to do so, btw, but for some reasons it operates with a raw memory under the hood instead of just holding a vector and rotating it here and there. Such approach is unnecessary complicated and error-prone

  • Post Author
    TylerGlaiel
    Posted March 24, 2025 at 7:45 am

    This implementation grows indefinitely if you repeatedly push to the head and remove from the tail, even if the max number of elements in the array is small

  • Post Author
    hoseja
    Posted March 24, 2025 at 8:31 am

    I really hate how MSVC made std::deque toxic for cross-platform use, it's such a cool concept.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.