We plan to move the schematic information *out* of sdspiqasm.spin specifically because Parallax chose to use UTF-16
and that doesn't play well with others. So we'll have schematic.spin (that does nothing) and all our source files will be
ASCII clean.
In the meantime, you're on your own.
By the way, we've adopted mercurial for the fsrw project; the thing I like about mercurial (and its command line tool hg)
is that it is really simple (and it's distributed, which means full history and all your changes are on your box, no matter
what sort of network connection you have). But yes, the other tools should do the job too.
Jazzed: "Non-ASCII files like spin could be committed as binary type"
Ouch, that's not nice. Half the point of a version control system is to be able to compare source of one version against a different version to see what you broke (err.. changed) last. Committing as binary throws that possibility out the window. Better to convert the Spins to ASCII.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
rokicki said...
specifically because Parallax chose to use UTF-16
Actually, I'm not sure it was a deliberate design choice on the part of Parallax. Microsoft chose UTF-16 for Windows unicode handling. My mother tells me if I don't have anything nice to say I should simply not say anything at all.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Missed it by ->" "<- that much!
heater said...
Jazzed: "Non-ASCII files like spin could be committed as binary type"
Ouch, that's not nice. Half the point of a version control system is to be able to compare source of one version against a different version to see what you broke (err.. changed) last. Committing as binary throws that possibility out the window. Better to convert the Spins to ASCII.
Absolutely. Fair argument for just using ASCII Although you could still use "diff" on the files ... but that would be inconvenient.
rokicki said...
specifically because Parallax chose to use UTF-16
Actually, I'm not sure it was a deliberate design choice on the part of Parallax. Microsoft chose UTF-16 for Windows unicode handling. My mother tells me if I don't have anything nice to say I should simply not say anything at all.
Microsoft chose UTF-16 as an *internal* format for Unicode.
Most Microsoft tools, when serializing Unicode text, write it out in UTF-8 by default.
Many developers think just because the in-memory format should be UTF-16, that the external
format should also be, but that's fallacious.
rokicki said...
specifically because Parallax chose to use UTF-16
Actually, I'm not sure it was a deliberate design choice on the part of Parallax. Microsoft chose UTF-16 for Windows unicode handling. My mother tells me if I don't have anything nice to say I should simply not say anything at all.
Microsoft chose UTF-16 as an *internal* format for Unicode.
I still don't have anything nice to say about them! [noparse];)[/noparse]
rokicki said...
Most Microsoft tools, when serializing Unicode text, write it out in UTF-8 by default.
On that point I can't comment. I must admit to having *never* used a Microsoft development tool outside of GWBasic, and I don't recall that supporting Unicode.
rokicki said...
Many developers think just because the in-memory format should be UTF-16, that the external
format should also be, but that's fallacious.
Agree completely. And it's not as if translating is difficult (although heater has pointed out I appear to have a bug in my UTF-8 writer, so it's not quite as easy as I thought obviously!). I suspect (and this is pure speculation) it's more that the editor component that was used in the IDE has an inbuilt stream read/writer that generates/reads UTF-16.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Missed it by ->" "<- that much!
I used iconv to convert UTF-16 into UTF-8.
Diff (and hence RCS's) seemed to work OK with UTF-8.
To force it to ASCII, use -t ASCII//IGNORE
I still maintain UTF-8 is good enough. You will note a diff actually shows the change properly after the conversion.
When I did this I found my UTF-8 was just from the dumb box characters around the file header/footer.
By converting to UTF-8 you are not actually changing anything, so getting a update from the author is easy to merge/diff.
@mtab:
I only just discovered iconv and have been playing with it a bit.
The first problem I have is with git which I am looking at for both hobby and real work use.
As you say doing a normal diff on UTF-8 files seems to work reasonably well. However asking git to do a diff between versions does not work. "git diff" outputs a diff showing all lines of the file were removed and then all lines added back. Somewhere in the middle of all that being the one line change I made which happened to, say, move a resistor symbol. Hence the desire to convert to ASCII only.
Second problem is that BST can be set to always save files as UTF-8 but when using iconv on the resulting files to convert them to ASCII it fails with an "incorrect encoding" on input error. BradC thinks he has a bug in the UTF-8 output of BST.
So unless there is a magic switch in git to make it work with UTF-8 I'm a bit stuck for now.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
bzr diff is fine with UTF-8, it just uses the stock diff.
man git-diff shows there is --color-words, sounds like what you want.
I tried that and it works great, even when my change was converting UTF-8 to ASCII.
I think I could get like git!
Rambling from before I set up got to try it out...
Apart fro mgit, most command line RCS tools present their diff's on a line by line basis. Character by character diffs are usually left for the GUI client to implement.
Pretty sure Meld will does character by character. Gvim definately does a character by character diff. There are probably gvim plugins for git to do a diff on the current file. Gvim also understands diff unified output.
Back in the day, using CVS, I would generate a unified diff of the entire tree, and display in gvim. From there you could split open any file of interest, and then diff it against the RCS, you got a character by character diff that way, that you can edit or revert. You probably still can't get a tool that is more efficient for patch reviews.
Comments
and that doesn't play well with others. So we'll have schematic.spin (that does nothing) and all our source files will be
ASCII clean.
In the meantime, you're on your own.
By the way, we've adopted mercurial for the fsrw project; the thing I like about mercurial (and its command line tool hg)
is that it is really simple (and it's distributed, which means full history and all your changes are on your box, no matter
what sort of network connection you have). But yes, the other tools should do the job too.
Ouch, that's not nice. Half the point of a version control system is to be able to compare source of one version against a different version to see what you broke (err.. changed) last. Committing as binary throws that possibility out the window. Better to convert the Spins to ASCII.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually, I'm not sure it was a deliberate design choice on the part of Parallax. Microsoft chose UTF-16 for Windows unicode handling. My mother tells me if I don't have anything nice to say I should simply not say anything at all.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Missed it by ->" "<- that much!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
Microsoft chose UTF-16 as an *internal* format for Unicode.
Most Microsoft tools, when serializing Unicode text, write it out in UTF-8 by default.
Many developers think just because the in-memory format should be UTF-16, that the external
format should also be, but that's fallacious.
I still don't have anything nice to say about them! [noparse];)[/noparse]
On that point I can't comment. I must admit to having *never* used a Microsoft development tool outside of GWBasic, and I don't recall that supporting Unicode.
Agree completely. And it's not as if translating is difficult (although heater has pointed out I appear to have a bug in my UTF-8 writer, so it's not quite as easy as I thought obviously!). I suspect (and this is pure speculation) it's more that the editor component that was used in the IDE has an inbuilt stream read/writer that generates/reads UTF-16.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Missed it by ->" "<- that much!
I used iconv to convert UTF-16 into UTF-8.
Diff (and hence RCS's) seemed to work OK with UTF-8.
To force it to ASCII, use -t ASCII//IGNORE
I still maintain UTF-8 is good enough. You will note a diff actually shows the change properly after the conversion.
When I did this I found my UTF-8 was just from the dumb box characters around the file header/footer.
By converting to UTF-8 you are not actually changing anything, so getting a update from the author is easy to merge/diff.
I only just discovered iconv and have been playing with it a bit.
The first problem I have is with git which I am looking at for both hobby and real work use.
As you say doing a normal diff on UTF-8 files seems to work reasonably well. However asking git to do a diff between versions does not work. "git diff" outputs a diff showing all lines of the file were removed and then all lines added back. Somewhere in the middle of all that being the one line change I made which happened to, say, move a resistor symbol. Hence the desire to convert to ASCII only.
Second problem is that BST can be set to always save files as UTF-8 but when using iconv on the resulting files to convert them to ASCII it fails with an "incorrect encoding" on input error. BradC thinks he has a bug in the UTF-8 output of BST.
So unless there is a magic switch in git to make it work with UTF-8 I'm a bit stuck for now.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
bzr diff is fine with UTF-8, it just uses the stock diff.
man git-diff shows there is --color-words, sounds like what you want.
I tried that and it works great, even when my change was converting UTF-8 to ASCII.
I think I could get like git!
Rambling from before I set up got to try it out...
Apart fro mgit, most command line RCS tools present their diff's on a line by line basis. Character by character diffs are usually left for the GUI client to implement.
Pretty sure Meld will does character by character. Gvim definately does a character by character diff. There are probably gvim plugins for git to do a diff on the current file. Gvim also understands diff unified output.
Back in the day, using CVS, I would generate a unified diff of the entire tree, and display in gvim. From there you could split open any file of interest, and then diff it against the RCS, you got a character by character diff that way, that you can edit or revert. You probably still can't get a tool that is more efficient for patch reviews.
Think I'll be giving git a good try out.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230