forum issue - In posted code special characters such as < & > etc are changed to

twm47099 · 2015-08-06 04:58

codes such as & lt ;, & amp ;, & gt ; etc. (without the spaces between the & and the letters and the letters and the ; )

This link has C code I posted:
http://forums.parallax.com/discussion/comment/1337700/#Comment_1337700

I notice that now if I enter new code in a code block it works ok (below), but anything that was migrated from the old forums and converted into the <pre> type of code blocks and then back to the current [ code ] type has the problem.

a > b

I've found the same in other posted code. It seems to have changed after the last forum update where using code blocks were reinstated.
I've tried copying the code into SimpleIDE, but the original operator does not reappear.

Looking further, single quotes are also change to that format. Note the declaration of "volatile char btstring[] ..." in the link above.

I hope that this can be fixed, because there is a lot of old code in the forum that will no longer work.

Tom

jmg · 2015-08-06 08:19

The mangling seems erratic, and I'm not sure if they can recover all cases of corruption.

eg This reply used to have readable code :

http://forums.parallax.com/discussion/comment/1339056/#Comment_1339056

potatohead · 2015-08-06 09:48

Better left unsaid

Heater. · 2015-08-06 10:22

Yep, my recent code post here: http://forums.parallax.com/discussion/comment/1338711/#Comment_1338711
is totally mangled.

Symbols "<", ">", "'" have been turned into their HTML entities "<", ">", "&#39"

Edit: Bloody hell, see what I mean, even that sentence above is mangled. It should read:
"...HTML entities "& l t ;", "& g t ;", "& # 39"
Without the spaces of course. And without being translated to anything else.

I think I saw elsewhere that "@" was translated to entity or other as well. Can't find it now.

Sadly we see this all over the net now a days. With such a mess of syntaxes recursively nested inside each other today nobody can ever get the escaping right.

Think about it. The page is written in PHP, it generates HTML syntax as output, it also has embedded SQL statements. That's 3 different syntaxes in the same document straight off the bat. But now we mix in BB code which includes code snippets. Up to five syntaxes in one document. Of course many more because who know what languages those snippets are in.

Some where along the line it is required to filter out or escape dangerous stuff from user input, like malicious SQL query syntax input by the user that could find it's way to the database and do untold damage. Like those <, > in code snippets that could end up being interpreted as HTML and buggering up the page rendering or used to inject script tags that compromising the user.

Given that these syntaxes are recursively nested inside each other it's impossible to ever get this filtering and escaping right as you can never keep track of what context you are in when.

It is important to keep track of which context you are in other wise you cant tell if a particular string should be escaped or not. For example if I post an code snippet here containing an SQL query it should not be tampered with because might be a bad query on the Parallax data base.

Randomly replacing characters with HTML entities with a regular expression to prevent the user input becoming rendered as HTML is just a disaster. As we see here.

This situation only gets worse we add unicode into the mix in a vain attempt at internationalization.

Guys, this is your most serious issue. Please move it to top priority.

potatohead · 2015-08-06 10:31

As far as I am concerned, it is the only issue.

Everything else is a mere distraction.

Heater. · 2015-08-06 13:33

Ah yes, of course @blabla gets rendered as a link to user "blabla". How silly.

jmg · 2015-08-06 21:41

potatohead wrote: »

As far as I am concerned, it is the only issue.

Everything else is a mere distraction.

Agreed, proper support for Code quoting is a foundation item.
Cosmetic stuff can come later.

There are two parts of Code Support
* New posts should simply work properly
* Old posts of code, should continue to appear as code
(or at least not break into pieces.)

I've noticed even quote-context has not migrated well either.
The earlier SW often mangled quote needing needing manual clean-outs, but those manual clean-outs have not been used in test cases, and it makes ported posts harder to follow.

If this is unable to be properly fixed, a workaround would be to colour tint stuff that has been passed thru a script-bot, to alert users the results may be like google translate

jmg · 2015-08-06 21:42

Heater. wrote: »

Ah yes, of course @blabla gets rendered as a link to user "blabla". How silly.

Testing in code - oops, I thought they said that would not happen ?

@blabla

Still, even tho it looks silly, it does cut and paste ok.

Courtney Jacobs · 2015-08-06 21:47

Bump focused on removing emoticons from code first, since that changed the appearance of the code blocks entirely.

How best to remove the automatic linking-to-profile connections when the @ symbol is used is still being investigated.

Heater. · 2015-08-06 21:50

I went back and edited my JS code example as I linked to above.

Now what worries me is that the text it gave me to edit was the mangled version, not what I typed in originally.

That implies to me that the original source text that I typed in is not stored in the database but rather the mangled version.

Surely that makes it impossible for Parallax to ever fix all that broken code automatically?

Bump · 2015-08-06 22:10

twm47099 wrote: »

I hope that this can be fixed, because there is a lot of old code in the forum that will no longer work.

Tom

Tom, how's the code look now?

jmg · 2015-08-06 22:15

Bump wrote: »

how's the code look now?

I've not got the original to see, but there seems to be indent missing ?
- ie all leading spaces are removed.
That will break languages that rely on indent

jmg · 2015-08-06 22:17

Heater. wrote: »

Surely that makes it impossible for Parallax to ever fix all that broken code automatically?

really-old code (earlier forum) they should still have copies of, and it should port correctly, as it was done using code /code

More murky I see is the intermediate code, new forum, but before BBCode.
Users had different ways of doing that & the script-bots likely cannot manage all of that code.

Bump · 2015-08-06 22:33

jmg wrote: »

Bump wrote: »

how's the code look now?

I've not got the original to see, but there seems to be indent missing ?
- ie all leading spaces are removed.
That will break languages that rely on indent

I see indents... resubmitting; do you now see indents on this page?
Page-in-question

jmg · 2015-08-06 22:39

Bump wrote: »

I see indents... resubmitting; do you now see indents on this page?
Page-in-question

Indents are now visible

twm47099 · 2015-08-06 22:52

Bump wrote: »

twm47099 wrote: »

I hope that this can be fixed, because there is a lot of old code in the forum that will no longer work.

Tom

Tom, how's the code look now?

The operators are fixed, and I see the indents are fixed.

But my print statements show up as:
printi("x = %d
", x2);

It appears that where ever I had a "\n" (line feed) it removed the \n and broke the line. There should be a \n after the %d in the print statement above, and it should have been on 1 line.
I'm not sure what it will do with the other print "\" escape sequences.

Tom

Bump · 2015-08-06 23:02

twm47099 wrote: »

It appears that where ever I had a "\n" (line feed) it removed the \n and broke the line. There should be a \n after the %d in the print statement above, and it should have been on 1 line.
I'm not sure what it will do with the other print "\" escape sequences.

Tom

I'll see what we can do.

jmg · 2015-08-06 23:34

twm47099 wrote: »

But my print statements show up as: (should be a \n after the %d)
printi("x = %d
", x2);

checking in new BBcode

            printi("x = %d\n", x2);

forum issue - In posted code special characters such as &lt; &amp; &gt; etc are changed to

Comments

forum issue - In posted code special characters such as < & > etc are changed to