I have never thought much about writing a text editor, sounds like one of those horribly messy things I'd rather not deal with. On one project where we needed such a thing we all volunteered another guy to do it
But I do recall reading something by a guy who claimed to have had an epiphany regarding building an editor when the following idea occurred to him:
1) Keep all the text in some region of RAM reserved for file editing. We'll ignore the problem of editing text that is bigger than will fit in RAM for now.
2) Keep all the text before the cursor position/insertion point at the bottom of this RAM space.
3) Keep all the text after the cursor position as high as possible in the RAM region. You have a hole in the middle.
This means you are always adding removing from the end of a string which is the "low text". Which is easy.
If the user positions the cursor earlier in the line or on previous lines just move the text they moved over up to the "high text" and continue.
Similarly if the user moves the cursor forward, just move text from the high end to the low end and continue.
Possibly/maybe the way of managing things is much simpler and takes up less code and data space than managing doubly linked lists or whatever.
I have never thought much about writing a text editor, sounds like one of those horribly messy things I'd rather not deal with. On one project where we needed such a thing we all volunteered another guy to do it
But I do recall reading something by a guy who claimed to have had an epiphany regarding building an editor when the following idea occurred to him:
1) Keep all the text in some region of RAM reserved for file editing. We'll ignore the problem of editing text that is bigger than will fit in RAM for now.
2) Keep all the text before the cursor position/insertion point at the bottom of this RAM space.
3) Keep all the text after the cursor position as high as possible in the RAM region. You have a hole in the middle.
This means you are always adding removing from the end of a string which is the "low text". Which is easy.
If the user positions the cursor earlier in the line or on previous lines just move the text they moved over up to the "high text" and continue.
Similarly if the user moves the cursor forward, just move text from the high end to the low end and continue.
Possibly/maybe the way of managing things is much simpler and takes up less code and data space than managing doubly linked lists or whatever.
Very interesting idea. Doesn't make it very easy to display the file though. That is one thing I like very much about organizing it by line in RAM: displaying the content is now very easy. The above plan would make insertion wonderfully simple... until the cursor is moved. It's also highly efficient in terms of RAM usage. But... I'm not convinced (yet) that it is simpler overall.
I'll assume for now to ignore problems of lines that are longer than the screen width, wrapping lines etc...
Your cursor is at some position on the screen. On that screen line you display the the last line in the low-ram area. That gets you up to the cursor position. And you display up till the line terminator whatever is at the start of the high RAM area. You can find the start of the last line in low-ram by simply scanning backwards from the top.
Then you continue scanning backwards in low RAM and displaying what you find in the lines above the cursor on the screen. Until the screen is filled to the top.
Also continue scanning forwards in the high-ram and displaying what you find below the cursor on the screen. Until the screen is filled to the bottom.
Of course I may have missed some complications here, and have not though about long lines and line wrapping etc.
I'll assume for now to ignore problems of lines that are longer than the screen width, wrapping lines etc...
This is exactly the problem I was imagining. Hard to handle files with lines longer than the screen width. I think that's a rather critical aspect of a text editor, and not worth ignoring, even temporarily.
Hmm....presumably as you scan backwards in the low-ram you can find where the next last line starts, and how many characters long it is. Knowing it's length you know how many screen lines it will take up. That tells you where to paint it onto the screen. Keep scanning backwards till you run off the top of the screen.
Displaying the lines after the insertion point just means scanning chars forwards in the high-ram and painting them to the screen until you run off the bottom.
Sounds a bit fiddly to get right but should not be very big in terms of actual code.
This all assumes you are redrawing the screen after every insertion and deletion I guess. Is that a problem?
PWEdit is definitely going down the lines of vim, with separate insert/normal modes. I'd like some way to distinguish between them which does not take up an entire row of the display. The larger the display, the longer redraws take, so it pays to keep the display small. Also, I've been successfully testing PWEdit on my little 4x20 HD44780 screen and it works! How cool But it sure wouldn't work very well if 25% of the screen was used up by a status line like in vim.
So: who has an idea for signaling that the user is in "insert" mode without using the bottom row of the terminal?
PWEdit is definitely going down the lines of vim, with separate insert/normal modes. I'd like some way to distinguish between them which does not take up an entire row of the display. The larger the display, the longer redraws take, so it pays to keep the display small. Also, I've been successfully testing PWEdit on my little 4x20 HD44780 screen and it works! How cool But it sure wouldn't work very well if 25% of the screen was used up by a status line like in vim.
So: who has an idea for signaling that the user is in "insert" mode without using the bottom row of the terminal?
Usually in text mode, the cursor shape is used to distinguish the insert/overwrite mode: a full character cursor is overwrite, an underline (a bit more tick than a single line) cursor is insert. I think you can define the shape for that kind of display, right ?
Alternatively, a timed indication that overlays the last line for few seconds may be used to indicate the current state when the mode is toggled, something that display 'INSERT' or 'OVERWRITE' then disappear so you don't waste space.
Line endings: thought on how to handle? Hardcoding \n would be the easiest/smallest code. A boolean option to the constructor for whether or not to insert \r would also be an efficient option. Auto-detecting could be reasonably easy, depending on how "smart" it was. I think if i were going to auto-detect, I would just look at the first line and nothing else. If the file didn't have any newlines, what would the default be?
The way I usually handle text files on input is to assume a \n at the end, and remove both \r and \n for internal storage relying only on the NULL terminator. On output, a \r could be added as an option just before the \n if you want to create a DOS style text file.
The way I usually handle text files on input is to assume a \n at the end, and remove both \r and \n for internal storage relying only on the NULL terminator.
That's exactly what I'm doing. Glad to know I came up with the same solution as you
To allow you to edit bigger files, what if you have a swapfile on an SD card or in EEPROM that you offload edited but unsaved parts of the file in? Keep untouched parts of the file in the source file, edited but unsaved parts that don't fit in RAM in the swapfile, and whatever you're currently working on and whatever else fits in RAM.
That guy (the author of PuTTY and a lot of other cool things) wrote a really neat hex editor using the data structure described on that page. It can insert, delete, cut, and paste virtually instantaneously on gigabyte-sized binary files. Saving the file still takes O(n) time, obviously, but everything other than saving and searching takes O(log(n)) time. The only reason it isn't the perfect hex editor is because it has an Emacs interface .
As a basic summary, his format is a B-tree, where each branch node knows how many bytes are under it, to allow efficient seeking. Leaf nodes of the tree are either fixed-sized buffers that only have so much used that fill from the bottom up, or placeholders indicating unloaded parts of the file (these can be bigger than one block). The buffers are in a doubly linked list, for quickly finding neighbors. If you try inserting into a full buffer, it splits the buffer into two around the insertion point first.
That option is what I'm asking about though. How should I determine whether or not to add the \r before \n?
The DOS format is \r\n. I don't think anybody uses the reverse order, though there are probably some oddball cases of it. Personally, I wouldn't bother with supporting \n\r.
@Electrodude, I started working on an editor that used a swap file about a year ago, but I never finished it. It uses an 8K cache in RAM, and keeps the rest of the text file in the swap file. When a file is opened the contents of the file are converted to a doubly-linked list and written to the swap file. Only the portion of the file that is currently being edited is kept in RAM. The file is saved by converting the doubly-linked list swap file back to a normal text file terminated with newlines.
That option is what I'm asking about though. How should I determine whether or not to add the \r before \n?
The DOS format is \r\n. I don't think anybody uses the reverse order, though there are probably some oddball cases of it. Personally, I wouldn't bother with supporting \n\r.
Sorry, I said that in a poor way. I simply meant whether or not I should add \r at all. The \n will be hardcoded in as the last character. So I'd have something like
if (this->insertCarriageReturn) {
line << '\r';
line << '\n';
But the question is how should I set the value "this->insertCarriageReturn"?
Sorry, I said that in a poor way. I simply meant whether or not I should add \r at all. The \n will be hardcoded in as the last character. So I'd have something like
if (this->insertCarriageReturn) {
line << '\r';
line << '\n';
But the question is how should I set the value "this->insertCarriageReturn"?
I don't understand the problem. You can just use \n without \r, windows users may have some complaints reading the file with nodepad, but every other editor should be fine with that. Or, add a boolean to the class constructor, or to the function that saves the file, I don't see the problem.
Sorry, I said that in a poor way. I simply meant whether or not I should add \r at all. The \n will be hardcoded in as the last character. So I'd have something like
if (this->insertCarriageReturn) {
line << '\r';
line << '\n';
But the question is how should I set the value "this->insertCarriageReturn"?
I don't understand the problem. You can just use \n without \r, windows users may have some complaints reading the file with nodepad, but every other editor should be fine with that. Or, add a boolean to the class constructor, or to the function that saves the file, I don't see the problem.
It's not a "problem" per se. Just a question what folks would like. I know I personally do everything without \r, but I also live on Linux where that is never a problem. I also know a lot of (most?) users on this forum are Windows users. So your vote goes to adding a boolean to the constructor? This would make it non-adjustable at runtime. That's probably okay - but I'd love to hear other opinions on here (because obviously we can't all agree, that'd be against forum policy or something)
PWEdit is definitely going down the lines of vim, with separate insert/normal modes. I'd like some way to distinguish between them which does not take up an entire row of the display. The larger the display, the longer redraws take, so it pays to keep the display small. Also, I've been successfully testing PWEdit on my little 4x20 HD44780 screen and it works! How cool But it sure wouldn't work very well if 25% of the screen was used up by a status line like in vim.
So: who has an idea for signaling that the user is in "insert" mode without using the bottom row of the terminal?
Alternatively, a timed indication that overlays the last line for few seconds may be used to indicate the current state when the mode is toggled, something that display 'INSERT' or 'OVERWRITE' then disappear so you don't waste space.
Been thinking more about this and I like the idea of the last line displaying the current mode whenever the mode is switched and holding that display until the first key is pressed after the mode switch, at which point the last line redraws. Eventually, I can add a shortcut such as "ctrl m" to display the current mode again until the next key press.
I can't think of any good way to use a timer for this that doesn't have one of the following drawbacks: RTC required, extra cog required, 53-rollover. I'm not okay with any of those significant drawbacks for such a simple requirement.
It's not a "problem" per se. Just a question what folks would like. I know I personally do everything without \r, but I also live on Linux where that is never a problem. I also know a lot of (most?) users on this forum are Windows users. So your vote goes to adding a boolean to the constructor? This would make it non-adjustable at runtime. That's probably okay - but I'd love to hear other opinions on here (because obviously we can't all agree, that'd be against forum policy or something)
I see, in that case maybe a key sequence to toggle the line termination, with a default/initial setting in the constructor. Back in the BBS era the full screen editors used a sequence like ^k + <other key> to access advanced options, maybe ^k + ^l toggles the line termination.
You can also try to auto-detect the terminator and default setting on file loading, if not empty of course.
Life is slowly starting to go back to normal again. I'm back at work on this text editor a little every night. I decided to try mocking some of the PropWare classes and then running PWEdit locally where I'd have the luxury of a faster development cycle and a real debugger. That failed horribly . The terminal used by CLion is not accepting backspace (0x08), but rather just showing a non-sense character. It's also ignoring my attempt to disable echo on std::cin - though at least that part works successfully in Ubuntu's standard terminal emulator. And my "cursor" (a # that is moved around) doesn't actually move when I press the asdw keys, despite PWEdit registering the keys perfectly well and doing the right thing behind the scenes. I'll have to put a bit more work into this later... the idea of desktop development sure is appealing.
Visual Studio checks line endings while loading a file. If consistent it just switch to that mode, if not the user get ask if he want to adjust line endings to be consistent.
The options are
CRLF or \r\n (Windows)
LF or \n (Linux)
CR or \r (old Mac format)
In the save dialog you can also decide how you like to save the file.
But basically a editor should NEVER press any specific line endings on a user file. This should definitely be a user decision.
Say I edit .ini files. Without CRLF I will trash my system. Same with manual editing Email files .eml or just creating http requests and responses. you MUST adhere to the RFCs and there definitive you need CRLF.
Comments
But I do recall reading something by a guy who claimed to have had an epiphany regarding building an editor when the following idea occurred to him:
1) Keep all the text in some region of RAM reserved for file editing. We'll ignore the problem of editing text that is bigger than will fit in RAM for now.
2) Keep all the text before the cursor position/insertion point at the bottom of this RAM space.
3) Keep all the text after the cursor position as high as possible in the RAM region. You have a hole in the middle.
This means you are always adding removing from the end of a string which is the "low text". Which is easy.
If the user positions the cursor earlier in the line or on previous lines just move the text they moved over up to the "high text" and continue.
Similarly if the user moves the cursor forward, just move text from the high end to the low end and continue.
Possibly/maybe the way of managing things is much simpler and takes up less code and data space than managing doubly linked lists or whatever.
Very interesting idea. Doesn't make it very easy to display the file though. That is one thing I like very much about organizing it by line in RAM: displaying the content is now very easy. The above plan would make insertion wonderfully simple... until the cursor is moved. It's also highly efficient in terms of RAM usage. But... I'm not convinced (yet) that it is simpler overall.
I'll assume for now to ignore problems of lines that are longer than the screen width, wrapping lines etc...
Your cursor is at some position on the screen. On that screen line you display the the last line in the low-ram area. That gets you up to the cursor position. And you display up till the line terminator whatever is at the start of the high RAM area. You can find the start of the last line in low-ram by simply scanning backwards from the top.
Then you continue scanning backwards in low RAM and displaying what you find in the lines above the cursor on the screen. Until the screen is filled to the top.
Also continue scanning forwards in the high-ram and displaying what you find below the cursor on the screen. Until the screen is filled to the bottom.
Of course I may have missed some complications here, and have not though about long lines and line wrapping etc.
This is exactly the problem I was imagining. Hard to handle files with lines longer than the screen width. I think that's a rather critical aspect of a text editor, and not worth ignoring, even temporarily.
Displaying the lines after the insertion point just means scanning chars forwards in the high-ram and painting them to the screen until you run off the bottom.
Sounds a bit fiddly to get right but should not be very big in terms of actual code.
This all assumes you are redrawing the screen after every insertion and deletion I guess. Is that a problem?
What have I missed here?
I think this makes it pretty easy to display and move around to different parts of the file
So: who has an idea for signaling that the user is in "insert" mode without using the bottom row of the terminal?
Change the cursor some how.
Usually in text mode, the cursor shape is used to distinguish the insert/overwrite mode: a full character cursor is overwrite, an underline (a bit more tick than a single line) cursor is insert. I think you can define the shape for that kind of display, right ?
Alternatively, a timed indication that overlays the last line for few seconds may be used to indicate the current state when the mode is toggled, something that display 'INSERT' or 'OVERWRITE' then disappear so you don't waste space.
That's exactly what I'm doing. Glad to know I came up with the same solution as you
That option is what I'm asking about though. How should I determine whether or not to add the \r before \n?
You should also read this:
http://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.html
That guy (the author of PuTTY and a lot of other cool things) wrote a really neat hex editor using the data structure described on that page. It can insert, delete, cut, and paste virtually instantaneously on gigabyte-sized binary files. Saving the file still takes O(n) time, obviously, but everything other than saving and searching takes O(log(n)) time. The only reason it isn't the perfect hex editor is because it has an Emacs interface .
As a basic summary, his format is a B-tree, where each branch node knows how many bytes are under it, to allow efficient seeking. Leaf nodes of the tree are either fixed-sized buffers that only have so much used that fill from the bottom up, or placeholders indicating unloaded parts of the file (these can be bigger than one block). The buffers are in a doubly linked list, for quickly finding neighbors. If you try inserting into a full buffer, it splits the buffer into two around the insertion point first.
@Electrodude, I started working on an editor that used a swap file about a year ago, but I never finished it. It uses an 8K cache in RAM, and keeps the rest of the text file in the swap file. When a file is opened the contents of the file are converted to a doubly-linked list and written to the swap file. Only the portion of the file that is currently being edited is kept in RAM. The file is saved by converting the doubly-linked list swap file back to a normal text file terminated with newlines.
Sorry, I said that in a poor way. I simply meant whether or not I should add \r at all. The \n will be hardcoded in as the last character. So I'd have something like
But the question is how should I set the value "this->insertCarriageReturn"?
I don't understand the problem. You can just use \n without \r, windows users may have some complaints reading the file with nodepad, but every other editor should be fine with that. Or, add a boolean to the class constructor, or to the function that saves the file, I don't see the problem.
It's not a "problem" per se. Just a question what folks would like. I know I personally do everything without \r, but I also live on Linux where that is never a problem. I also know a lot of (most?) users on this forum are Windows users. So your vote goes to adding a boolean to the constructor? This would make it non-adjustable at runtime. That's probably okay - but I'd love to hear other opinions on here (because obviously we can't all agree, that'd be against forum policy or something)
Been thinking more about this and I like the idea of the last line displaying the current mode whenever the mode is switched and holding that display until the first key is pressed after the mode switch, at which point the last line redraws. Eventually, I can add a shortcut such as "ctrl m" to display the current mode again until the next key press.
I can't think of any good way to use a timer for this that doesn't have one of the following drawbacks: RTC required, extra cog required, 53-rollover. I'm not okay with any of those significant drawbacks for such a simple requirement.
I see, in that case maybe a key sequence to toggle the line termination, with a default/initial setting in the constructor. Back in the BBS era the full screen editors used a sequence like ^k + <other key> to access advanced options, maybe ^k + ^l toggles the line termination.
You can also try to auto-detect the terminator and default setting on file loading, if not empty of course.
Booooo
The options are
CRLF or \r\n (Windows)
LF or \n (Linux)
CR or \r (old Mac format)
In the save dialog you can also decide how you like to save the file.
But basically a editor should NEVER press any specific line endings on a user file. This should definitely be a user decision.
Say I edit .ini files. Without CRLF I will trash my system. Same with manual editing Email files .eml or just creating http requests and responses. you MUST adhere to the RFCs and there definitive you need CRLF.
Enjoy!
Mike
CR should position the curser on the first char of the actual line and LF should do a new line
Enjoy!
Mike