ae just used a gap buffer, same as GNU Emacs. If you move your insertion/deletion point from one end of a 12MB file to the other, you have to copy 12 MB across the gap, which takes about 4.1 milliseconds on this laptop, which is not ideal but plenty fast enough for keystroke response. Not terribly difficult to do with Numpy, I think. But with gap buffers you have to store your undo log in the form of an edit list or something.
(If you just keep your buffer in a Python string, you incur the 4.1-millisecond cost every time you insert a character near the beginning of the 12-MB string, possibly twice, and you need enough memory for multiple copies of the buffer. The gap buffer saves you this time cost except when you're moving the insertion point or you run out of gap, and keeps your space cost limited to slightly more than enough to hold the text itself.)
In a garbage-collected language on a modern machine (meaning, lots of RAM) I'd be tempted to use ropes, since they make undo trivial. Here's an implementation of ropes I wrote in 45 minutes and 45 lines of code just now, which seems to more or less work, though I'm sure I'm overlooking a few things:
#!/usr/bin/python3
from collections import namedtuple
from functools import cached_property # 3.8 or later
def as_rope(obj):
return obj if isinstance(obj, Rope) else Leaf(obj)
class Rope:
def __add__(self, other):
return Concat(self, as_rope(other))
def __radd__(self, other):
return as_rope(other) + self
def __str__(self):
return ''.join(self.walk())
class Leaf(Rope, namedtuple("Leaf", ('s',))):
def __getitem__(self, slice):
return Leaf(self.s[slice])
def __len__(self):
return len(self.s)
def walk(self):
yield self.s
class Concat(Rope, namedtuple("Concat", ('a', 'b'))):
def __len__(self):
return self.len
@cached_property
def len(self):
return len(self.a) + len(self.b)
def walk(self):
yield from self.a.walk()
yield from self.b.walk()
def __getitem__(self, sl):
if sl.start is None:
sl = slice(0, sl.stop)
if sl.stop is None:
sl = slice(sl.start, len(self))
if sl.start < 0:
sl = slice(len(self) + sl.start, sl.stop)
if sl.stop < 0:
sl = slice(sl.start, len(self) + sl.stop)
# Important special case to stop recursion:
if sl.start == 0 and sl.stop == len(self):
return self
a_len = len(self.a)
if sl.start >= a_len:
return self.b[sl.start - a_len : sl.stop - a_len]
if sl.stop <= a_len:
return self.a[sl]
# At this point we know we need part of a and part of b.
# Since slicing a leaf creates a Rope, we can blithely just do this:
result = self.a[sl.start:] + self.b[:sl.stop - a_len]
# Avoid making Concat nodes for lots of tiny leaves:
return Leaf(str(result)) if len(result) < 32 else result
So then inserting a character, for example, is buf[:point] + c + buf[point:], which just creates a couple of Concat nodes and a Leaf to hold c (plus slicing existing nodes if necessary); this takes about 30 microseconds, which is 150 times faster than the brute-force string-copying solution for a 12-megabyte buffer. You need slightly more code than this for a robust editor because your trees get unbalanced, a problem with many well-known solutions.
Syntax highlighting, markers that move when text is inserted, etc., do imply more effort, but, again, there are well-known solutions to these problems. And an adequate redisplay is no longer the major research effort that it was 37 years ago.
So, while it's true that the data structures you need for a reliably responsive text editor aren't trivially available in Python, or Lua, or JS, I think you can just write them. I wrote the above code at about 6 words per minute because there were a lot of things about ropes (and about Python) that I wasn't clear about and had to figure out in the process; surely I could have written it 2-5 times as fast if I knew what I were doing.
If you're interested in learning about this stuff, Finseth's book is the classic, and Raph Levien has blogged a bunch of really nice up-to-date stuff about his rope-based editor Xi, which he wrote in Rust.