Tools

Tools: Doubly: Why Arrays Aren't Always Enough

2026-01-18 0 views admin

Tools: Doubly: Why Arrays Aren't Always Enough

Source: Dev.to

Cursor Based Navigation vs Indexed Access ## From Hash Nodes to C Registries ## Doubly::Linked::PP — Pure Perl Simplicity ## Doubly::Linked — XS with Perl Structures ## Doubly::Pointer — Raw C Pointers ## Then: Enter Claude ## Doubly — The Thread Safe Registry ## When to Use Doubly Lists Over Arrays ## Choose Doubly Lists When: As most of you reading this will know, Perl has three core data types: scalars, arrays, and hashes. They're relatively fast, they're familiar, and for most tasks, they just work. But sometimes you need something more. A while back, I started at a new company and had a conversation with a colleague. They'd been working on a certain part of the codebase where the bottleneck was a pure Perl implementation of a doubly linked list. I knew of the concept and would usually just use an array for this task. They were adamant this could not work with an array though, so I inspected the code to understand the use case. My initial gut instinct and others on IRC after I had published the first version, was why not just use an array, well.. if you take this example Every insertion is O(n) because Perl has to shift elements. Your index can become stale if you're not careful. And if you're working with threads or nested data? Good luck keeping that index synchronised. Also just wrapping this in a module does not work the moment you add insertion. This launched me on a journey that would eventually produce four different doubly linked list implementations, each one teaching me something new about Perl, XS, C, and the fascinating trade offs between simplicity, speed, and safety. This journey spanned several months and it was 'Claude' opus who helped me come to the final solution which is thread safe and has no memory leakage. Before diving into the advantages, let's establish what a doubly linked list actually is. Unlike an array where elements sit in contiguous memory slots accessed by index, a doubly linked list is a chain of nodes. Each node contains three things: your data, a pointer to the next node, and a pointer to the previous node. This bidirectional linking is what puts the "doubly" in doubly linked list. Beyond the basic node structure, a well designed doubly linked list maintains a current position—an internal cursor that tracks where you are in the list. This changes everything about how you interact with your data. Navigation methods like next(), prev(), start(), and end() move this internal cursor, and data() operates on whatever node you're currently pointing at. Here's what that looks like in practice: Compare this to the array approach where you're juggling data and position separately: The array version works, but you're managing two things instead of one. The index is disconnected from the data. Pass that array to a function and the position doesn't follow, you'd need to pass both. But here's where it gets really interesting: nested lists with shared position tracking. When you store a Doubly list inside another Doubly list, something magical happens. The inner list maintains its own cursor, and every time you access it, you get the same object with its position preserved: This shared position tracking is incredibly powerful for hierarchical data. Think of a file browser where each directory remembers which file you last selected, or a document editor where each section remembers your scroll position. The cursor based model also makes certain algorithms beautifully clean. Need to process items around your current position? No index arithmetic required: With arrays, insert_after at position n means splice(@arr, $n + 1, 0, $item)—and that's an O(n) operation that shifts every subsequent element. The doubly linked list just adjusts a few pointers. Also with a doubly linked list when you attain a reference to the data, that reference remains the same. If you were to use a plain perl object to track your state that reference will either change on iteration or become stale if you clone. This cursor based navigation is what makes doubly linked lists shine for the use cases my colleague was struggling with. It's not just about performance—it's about expressing your intent clearly and letting the data structure handle the bookkeeping. Now let's look under the hood at how each implementation actually works. The pure Perl implementation is the most transparent. Each node is a blessed hash reference with three keys: The list object itself tracks head, tail, current. Every operation is pure Perl hash manipulation: This approach has huge advantages: no compilation required, fully debuggable with Data::Dumper, and runs anywhere Perl runs. You can literally inspect the entire structure at any time: The downside? At 769 ops/sec, it's not winning any races. Every method call goes through Perl's method dispatch. Every hash access is a hash lookup. And there's a subtler problem: circular references. Node A's next points to Node B, and Node B's prev points back to Node A. Perl's reference counting garbage collector can't automatically clean these up—you need to manually break the cycle or use weak references, otherwise you're leaking memory. The overhead adds up fast. The next step was Doubly::Linked, which uses XS (Perl's interface to C) but still constructs Perl hash structures. The C code builds the same {data, next, prev} hashes, just faster: This gives you compiled C speed for node creation and manipulation, whilst keeping the familiar Perl hash structure. You still get that Data::Dumper visibility. Performance jumped to about 1,020 ops/sec in non threaded Perl roughly 1.3x faster than pure Perl. Respectable, but I knew we could do better. The problem is that we're still paying for Perl's hash overhead on every single operation. Here's where things get fast. Doubly::Pointer abandons Perl structures entirely and uses pure C: The Perl object is just a thin wrapper around a C pointer. No hashes, no Perl data structure overhead just raw memory and pointer arithmetic. The result? 12,987 ops/sec. That's nearly 17x faster than pure Perl! Navigation is just pointer dereferencing. Insertion is just reassigning a couple of pointers. This is as fast as it gets in Perl land. I released these three in quick succession and was happy with the perfromance boost of the Doubly implmentation but it had some flaws. Mainly it was leaking memory, I asked on IRC for help but none was provided so I left the module for a while.. After letting the module sit for several months with its memory leaks, as we all know llm has gradually been getting better at code generation and debugging so I decided to revisit it with the help of Opus. What followed was an intensive collaboration that transformed my understanding of the problem. Claude helped me trace through the reference counting issues, understand where my SvREFCNT_inc and SvREFCNT_dec calls were mismatched, and ultimately architect the registry based solution that would become Doubly. The breakthrough wasn't just fixing the leaks, it was realising that the entire pointer in Perl object approach was fundamentally flawed for threaded environments. So the problem is that raw pointers becomes ticking time bombs in threaded code. When Perl clones an object across threads, it copies the pointer value. But that pointer address is only valid in the original thread's memory space. Access it from another thread and you get crashes, corruption, or worse... silent data mangling. For single threaded applications, the old Doubly now Doubly::Pointer is fine. But I wanted something that could handle threads safely. So with claudes teachings, this is where it all came together. Doubly achieves thread safety through an architectural shift: instead of storing C pointers in Perl objects, it uses a global registry with integer IDs. Every list gets a unique integer ID. The Perl object stores only this ID—not a pointer. When you call any method, the C code: When Perl clones the object across threads, it copies the integer ID which is perfectly safe. The actual data lives in the shared registry, protected by mutex locks. For storing Perl references (hashes, arrays, objects), which is not simple for the same reasons as the pointers, Doubly uses threads::shared::shared_clone to create thread safe copies: We could not figure out how to do this in directly in the C/XS, so had to implement light callback hooks in the perl namespace, but it works perfectly. The performance impact? In non threaded Perl, Doubly runs at 14,085 ops/sec, about 8% faster than Doubly::Pointer's 12,987 ops/sec. In threaded Perl, it performs similarly at 14,286 ops/sec versus 14,184 ops/sec. The registry architecture optimises surprisingly well. The integer to pointer lookup is O(1). You get nearly all the speed of raw pointers with none of the thread safety nightmares. So after all this, four implementations, countless hours of debugging segfaults, and deep dives into XS and mutex locks. When should you actually reach for a doubly linked list instead of a trusty array? You need O(1) insertions and deletions in the middle of your data. Arrays make you pay O(n) for every splice. If you're building an editor buffer, a task queue with priorities, or anything where items frequently enter and leave from arbitrary positions, doubly linked lists win decisively. Your algorithm thinks in terms of "current position." Undo/redo systems, playlist navigators, browser history, paginated views. These all have an inherent notion of "where am I?" that maps perfectly to cursor based navigation. Fighting against this with index tracking is unnecessary complexity. You need bidirectional traversal. Moving forward and backward through data is natural with next() and prev(). With arrays, you're doing index arithmetic and bounds checking manually. Thread safety matters. If there's any chance your code will run in a threaded environment, Doubly's registry architecture handles it transparently. No locks to manage, no race conditions to debug. You're storing nested structures with independent navigation state. This is Doubly's secret weapon, inner lists maintain their own cursor position. Try implementing that cleanly with arrays. Give doubly linked lists a try the next time you catch yourself wrestling with array indices and splice operations. You might be surprised how naturally some problems fit the cursor based model. And if you find yourself needing threads? You'll be glad you chose Doubly from the start. Happy coding and may your pointers always be valid! https://metacpan.org/pod/Doubly Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: my @playlist = ('song_a', 'song_b', 'song_c'); my $current = 1; # pointing at 'song_b' # Move forward $current++ if $current < $#playlist; # Insert after current position? Now things get messy. splice(@playlist, $current + 1, 0, 'new_song'); Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: my @playlist = ('song_a', 'song_b', 'song_c'); my $current = 1; # pointing at 'song_b' # Move forward $current++ if $current < $#playlist; # Insert after current position? Now things get messy. splice(@playlist, $current + 1, 0, 'new_song'); COMMAND_BLOCK: my @playlist = ('song_a', 'song_b', 'song_c'); my $current = 1; # pointing at 'song_b' # Move forward $current++ if $current < $#playlist; # Insert after current position? Now things get messy. splice(@playlist, $current + 1, 0, 'new_song'); CODE_BLOCK: ┌──────────┐ ┌──────────┐ ┌──────────┐ │ prev ←─┼────┼─ prev ←──┼────┼─ prev │ │ 'intro' │ │ 'verse' │ │ 'chorus' │ │ next ──┼────┼→ next ───┼────┼→ next │ └──────────┘ └──────────┘ └──────────┘ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ┌──────────┐ ┌──────────┐ ┌──────────┐ │ prev ←─┼────┼─ prev ←──┼────┼─ prev │ │ 'intro' │ │ 'verse' │ │ 'chorus' │ │ next ──┼────┼→ next ───┼────┼→ next │ └──────────┘ └──────────┘ └──────────┘ CODE_BLOCK: ┌──────────┐ ┌──────────┐ ┌──────────┐ │ prev ←─┼────┼─ prev ←──┼────┼─ prev │ │ 'intro' │ │ 'verse' │ │ 'chorus' │ │ next ──┼────┼→ next ───┼────┼→ next │ └──────────┘ └──────────┘ └──────────┘ COMMAND_BLOCK: use Doubly; my $playlist = Doubly->new(); $playlist->add('intro.mp3') ->add('verse.mp3') ->add('chorus.mp3') ->add('outro.mp3'); # Navigate with the cursor $playlist->start(); # cursor at 'intro.mp3' $playlist->next(); # cursor at 'verse.mp3' print $playlist->data(); # prints 'verse.mp3' $playlist->next()->next(); # method chaining! cursor at 'outro.mp3' $playlist->prev(); # back to 'chorus.mp3' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: use Doubly; my $playlist = Doubly->new(); $playlist->add('intro.mp3') ->add('verse.mp3') ->add('chorus.mp3') ->add('outro.mp3'); # Navigate with the cursor $playlist->start(); # cursor at 'intro.mp3' $playlist->next(); # cursor at 'verse.mp3' print $playlist->data(); # prints 'verse.mp3' $playlist->next()->next(); # method chaining! cursor at 'outro.mp3' $playlist->prev(); # back to 'chorus.mp3' COMMAND_BLOCK: use Doubly; my $playlist = Doubly->new(); $playlist->add('intro.mp3') ->add('verse.mp3') ->add('chorus.mp3') ->add('outro.mp3'); # Navigate with the cursor $playlist->start(); # cursor at 'intro.mp3' $playlist->next(); # cursor at 'verse.mp3' print $playlist->data(); # prints 'verse.mp3' $playlist->next()->next(); # method chaining! cursor at 'outro.mp3' $playlist->prev(); # back to 'chorus.mp3' COMMAND_BLOCK: my @playlist = ('intro.mp3', 'verse.mp3', 'chorus.mp3', 'outro.mp3'); my $pos = 0; $pos++; print $playlist[$pos]; # 'verse.mp3' $pos += 2; $pos--; Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: my @playlist = ('intro.mp3', 'verse.mp3', 'chorus.mp3', 'outro.mp3'); my $pos = 0; $pos++; print $playlist[$pos]; # 'verse.mp3' $pos += 2; $pos--; COMMAND_BLOCK: my @playlist = ('intro.mp3', 'verse.mp3', 'chorus.mp3', 'outro.mp3'); my $pos = 0; $pos++; print $playlist[$pos]; # 'verse.mp3' $pos += 2; $pos--; COMMAND_BLOCK: my $outer = Doubly->new(); my $inner = Doubly->new(); $inner->add('a')->add('b')->add('c'); $outer->add($inner); $outer->start(); my $nested = $outer->data(); # get the inner list $nested->start()->next(); # move inner cursor to 'b' # Later... my $same_nested = $outer->data(); # same object! print $same_nested->data(); # still at 'b'! Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: my $outer = Doubly->new(); my $inner = Doubly->new(); $inner->add('a')->add('b')->add('c'); $outer->add($inner); $outer->start(); my $nested = $outer->data(); # get the inner list $nested->start()->next(); # move inner cursor to 'b' # Later... my $same_nested = $outer->data(); # same object! print $same_nested->data(); # still at 'b'! COMMAND_BLOCK: my $outer = Doubly->new(); my $inner = Doubly->new(); $inner->add('a')->add('b')->add('c'); $outer->add($inner); $outer->start(); my $nested = $outer->data(); # get the inner list $nested->start()->next(); # move inner cursor to 'b' # Later... my $same_nested = $outer->data(); # same object! print $same_nested->data(); # still at 'b'! COMMAND_BLOCK: # Insert after current position - O(1)! $list->insert_after('new_item'); # Remove current and move to next $list->remove(); # Check boundaries naturally while (!$list->is_end()) { process($list->data()); $list->next(); } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Insert after current position - O(1)! $list->insert_after('new_item'); # Remove current and move to next $list->remove(); # Check boundaries naturally while (!$list->is_end()) { process($list->data()); $list->next(); } COMMAND_BLOCK: # Insert after current position - O(1)! $list->insert_after('new_item'); # Remove current and move to next $list->remove(); # Check boundaries naturally while (!$list->is_end()) { process($list->data()); $list->next(); } COMMAND_BLOCK: my $node = bless { data => 'my_value', next => $next_node, # reference to next node (or undef) prev => $prev_node, # reference to previous node (or undef) }, 'Doubly::Linked::PP'; Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: my $node = bless { data => 'my_value', next => $next_node, # reference to next node (or undef) prev => $prev_node, # reference to previous node (or undef) }, 'Doubly::Linked::PP'; COMMAND_BLOCK: my $node = bless { data => 'my_value', next => $next_node, # reference to next node (or undef) prev => $prev_node, # reference to previous node (or undef) }, 'Doubly::Linked::PP'; CODE_BLOCK: sub next { my ($self) = @_; if ($self->{current} && $self->{current}->{next}) { $self->{current} = $self->{current}->{next}; } return $self; } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: sub next { my ($self) = @_; if ($self->{current} && $self->{current}->{next}) { $self->{current} = $self->{current}->{next}; } return $self; } CODE_BLOCK: sub next { my ($self) = @_; if ($self->{current} && $self->{current}->{next}) { $self->{current} = $self->{current}->{next}; } return $self; } COMMAND_BLOCK: use Data::Dumper; print Dumper($list); # see everything! Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: use Data::Dumper; print Dumper($list); # see everything! COMMAND_BLOCK: use Data::Dumper; print Dumper($list); # see everything! CODE_BLOCK: // Simplified from Doubly::Linked XS HV* node = newHV(); hv_store(node, "data", 4, newSVpv(data, 0), 0); hv_store(node, "next", 4, next_sv, 0); hv_store(node, "prev", 4, prev_sv, 0); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // Simplified from Doubly::Linked XS HV* node = newHV(); hv_store(node, "data", 4, newSVpv(data, 0), 0); hv_store(node, "next", 4, next_sv, 0); hv_store(node, "prev", 4, prev_sv, 0); CODE_BLOCK: // Simplified from Doubly::Linked XS HV* node = newHV(); hv_store(node, "data", 4, newSVpv(data, 0), 0); hv_store(node, "next", 4, next_sv, 0); hv_store(node, "prev", 4, prev_sv, 0); CODE_BLOCK: typedef struct Node { SV* data; // Perl scalar value struct Node* next; // C pointer struct Node* prev; // C pointer } Node; typedef struct { Node* head; Node* tail; Node* current; int length; } DoublyList; Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: typedef struct Node { SV* data; // Perl scalar value struct Node* next; // C pointer struct Node* prev; // C pointer } Node; typedef struct { Node* head; Node* tail; Node* current; int length; } DoublyList; CODE_BLOCK: typedef struct Node { SV* data; // Perl scalar value struct Node* next; // C pointer struct Node* prev; // C pointer } Node; typedef struct { Node* head; Node* tail; Node* current; int length; } DoublyList; CODE_BLOCK: #define MAX_LISTS 65536 static DoublyList* registry[MAX_LISTS]; static int registry_ids[MAX_LISTS]; static perl_mutex registry_mutex; Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: #define MAX_LISTS 65536 static DoublyList* registry[MAX_LISTS]; static int registry_ids[MAX_LISTS]; static perl_mutex registry_mutex; CODE_BLOCK: #define MAX_LISTS 65536 static DoublyList* registry[MAX_LISTS]; static int registry_ids[MAX_LISTS]; static perl_mutex registry_mutex; CODE_BLOCK: DoublyList* get_list(int id) { DoublyList* list = NULL; MUTEX_LOCK(&registry_mutex); if (id >= 0 && id < MAX_LISTS) { list = registry[id]; } MUTEX_UNLOCK(&registry_mutex); return list; } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: DoublyList* get_list(int id) { DoublyList* list = NULL; MUTEX_LOCK(&registry_mutex); if (id >= 0 && id < MAX_LISTS) { list = registry[id]; } MUTEX_UNLOCK(&registry_mutex); return list; } CODE_BLOCK: DoublyList* get_list(int id) { DoublyList* list = NULL; MUTEX_LOCK(&registry_mutex); if (id >= 0 && id < MAX_LISTS) { list = registry[id]; } MUTEX_UNLOCK(&registry_mutex); return list; } COMMAND_BLOCK: # When you do this in threaded Perl: $list->add({ name => 'test', values => [1, 2, 3] }); # Doubly internally does: my $shared = threads::shared::shared_clone($data); # Then stores a reference ID to retrieve it later Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # When you do this in threaded Perl: $list->add({ name => 'test', values => [1, 2, 3] }); # Doubly internally does: my $shared = threads::shared::shared_clone($data); # Then stores a reference ID to retrieve it later COMMAND_BLOCK: # When you do this in threaded Perl: $list->add({ name => 'test', values => [1, 2, 3] }); # Doubly internally does: my $shared = threads::shared::shared_clone($data); # Then stores a reference ID to retrieve it later - Locks the mutex - Looks up the list by ID in the registry - Performs the operation - Unlocks the mutex

🏷️ Tags

how-totutorialguidedev.toaillmnode