forum issue - In posted code special characters such as < & > etc are changed to
twm47099
Posts: 867
codes such as & lt ;, & amp ;, & gt ; etc. (without the spaces between the & and the letters and the letters and the ; )
This link has C code I posted:
http://forums.parallax.com/discussion/comment/1337700/#Comment_1337700
I notice that now if I enter new code in a code block it works ok (below), but anything that was migrated from the old forums and converted into the <pre> type of code blocks and then back to the current [ code ] type has the problem.
I've found the same in other posted code. It seems to have changed after the last forum update where using code blocks were reinstated.
I've tried copying the code into SimpleIDE, but the original operator does not reappear.
Looking further, single quotes are also change to that format. Note the declaration of "volatile char btstring[] ..." in the link above.
I hope that this can be fixed, because there is a lot of old code in the forum that will no longer work.
Tom
This link has C code I posted:
http://forums.parallax.com/discussion/comment/1337700/#Comment_1337700
I notice that now if I enter new code in a code block it works ok (below), but anything that was migrated from the old forums and converted into the <pre> type of code blocks and then back to the current [ code ] type has the problem.
a > b
I've found the same in other posted code. It seems to have changed after the last forum update where using code blocks were reinstated.
I've tried copying the code into SimpleIDE, but the original operator does not reappear.
Looking further, single quotes are also change to that format. Note the declaration of "volatile char btstring[] ..." in the link above.
I hope that this can be fixed, because there is a lot of old code in the forum that will no longer work.
Tom
Comments
eg This reply used to have readable code :
http://forums.parallax.com/discussion/comment/1339056/#Comment_1339056
is totally mangled.
Symbols "<", ">", "'" have been turned into their HTML entities "<", ">", "'"
Edit: Bloody hell, see what I mean, even that sentence above is mangled. It should read:
"...HTML entities "& l t ;", "& g t ;", "& # 39"
Without the spaces of course. And without being translated to anything else.
I think I saw elsewhere that "@" was translated to entity or other as well. Can't find it now.
Sadly we see this all over the net now a days. With such a mess of syntaxes recursively nested inside each other today nobody can ever get the escaping right.
Think about it. The page is written in PHP, it generates HTML syntax as output, it also has embedded SQL statements. That's 3 different syntaxes in the same document straight off the bat. But now we mix in BB code which includes code snippets. Up to five syntaxes in one document. Of course many more because who know what languages those snippets are in.
Some where along the line it is required to filter out or escape dangerous stuff from user input, like malicious SQL query syntax input by the user that could find it's way to the database and do untold damage. Like those <, > in code snippets that could end up being interpreted as HTML and buggering up the page rendering or used to inject script tags that compromising the user.
Given that these syntaxes are recursively nested inside each other it's impossible to ever get this filtering and escaping right as you can never keep track of what context you are in when.
It is important to keep track of which context you are in other wise you cant tell if a particular string should be escaped or not. For example if I post an code snippet here containing an SQL query it should not be tampered with because might be a bad query on the Parallax data base.
Randomly replacing characters with HTML entities with a regular expression to prevent the user input becoming rendered as HTML is just a disaster. As we see here.
This situation only gets worse we add unicode into the mix in a vain attempt at internationalization.
Guys, this is your most serious issue. Please move it to top priority.
Everything else is a mere distraction.
Agreed, proper support for Code quoting is a foundation item.
Cosmetic stuff can come later.
There are two parts of Code Support
* New posts should simply work properly
* Old posts of code, should continue to appear as code
(or at least not break into pieces.)
I've noticed even quote-context has not migrated well either.
The earlier SW often mangled quote needing needing manual clean-outs, but those manual clean-outs have not been used in test cases, and it makes ported posts harder to follow.
If this is unable to be properly fixed, a workaround would be to colour tint stuff that has been passed thru a script-bot, to alert users the results may be like google translate
Testing in code - oops, I thought they said that would not happen ?
Still, even tho it looks silly, it does cut and paste ok.
How best to remove the automatic linking-to-profile connections when the @ symbol is used is still being investigated.
Now what worries me is that the text it gave me to edit was the mangled version, not what I typed in originally.
That implies to me that the original source text that I typed in is not stored in the database but rather the mangled version.
Surely that makes it impossible for Parallax to ever fix all that broken code automatically?
Tom, how's the code look now?
- ie all leading spaces are removed.
That will break languages that rely on indent
More murky I see is the intermediate code, new forum, but before BBCode.
Users had different ways of doing that & the script-bots likely cannot manage all of that code.
Page-in-question
Indents are now visible
The operators are fixed, and I see the indents are fixed.
But my print statements show up as:
printi("x = %d
", x2);
It appears that where ever I had a "\n" (line feed) it removed the \n and broke the line. There should be a \n after the %d in the print statement above, and it should have been on 1 line.
I'm not sure what it will do with the other print "\" escape sequences.
Tom
I'll see what we can do.