No bounds checking for object instance arrays: an exploitable compiler feature?
Phil Pilgrim (PhiPi)
Posts: 23,514
While coming up with pathological cases to test CLEAN on, I made a discovery regarding subscripted object instances: neither the compiler nor the interpreter do any bounds checking on the subscripts. In fact, you can subscript an object instance that's not even part of an array. Here's some code to illustrate the point:
'Any idea which method gets called by test0[noparse][[/noparse] 1].start? If you guessed test1.stop, you'd be right. This is (apparently) because test1.stop is the first public routine in test_obj1, and test1 is the next object after test0 in the OBJ list.
This is a compiler "feature" which could raise all sorts of havoc. But is it also expoitable? Initially, I thought it might be, as a way to provide a universal character output routine — something like this:
But, really, this is no better — and a lot riskier — than:
So, no useful tricks here, but certainly a possible trap.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
''Top object: CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 OBJ test0 : "test_obj0" test1 : "test_obj1" PUB Start test0[noparse][[/noparse] 1].start ''__________ ''test_obj0: PUB start dira[noparse][[/noparse]0]~~ repeat outa[noparse][[/noparse]0]~~ outa[noparse][[/noparse]0]~ ''__________ ''test_obj1: PUB stop dira[noparse][[/noparse] 1]~~ repeat outa[noparse][[/noparse] 1]~~ outa[noparse][[/noparse] 1]~
'Any idea which method gets called by test0[noparse][[/noparse] 1].start? If you guessed test1.stop, you'd be right. This is (apparently) because test1.stop is the first public routine in test_obj1, and test1 is the next object after test0 in the OBJ list.
This is a compiler "feature" which could raise all sorts of havoc. But is it also expoitable? Initially, I thought it might be, as a way to provide a universal character output routine — something like this:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 SER = 0 TV = 1 SYNTH = 2 OBJ serout : "myserial" tvout : "mytvtext" synout : "myvoicesynth" PUB CharOut(device, straddr) serout[noparse][[/noparse]device].str(straddr)
But, really, this is no better — and a lot riskier — than:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 SER = 0 TV = 1 SYNTH = 2 OBJ serout : "myserial" tvout : "mytvtext" synout : "myvoicesynth" PUB CharOut(device, straddr) case device SER : serout.str(straddr) TV : tvout.str(straddr) SYNTH : synout.str(straddr)
So, no useful tricks here, but certainly a possible trap.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Comments
Might be useful for switching back and forth between TV and VGA drivers...
As I illustrated above, the CASE construct would be much more robust for that.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
I would experiment, to find this answer, but you likely already know.
Given 3 objs all with the same number of methods, would using an
index as you mention select a "method number" or just some offset?
tvtext.spin: defined as
PUB start
PUB stop
PUB str
PUB out
PUB setcolors
vgatext.spin: defined as
PUB start
PUB stop
PUB str
PUB out
PUB setcolors
OBJ
· tv··: "tvtext"
· vga : "vgatext"
tv[noparse][[/noparse]0].start ' calls tvtext.start ?
tv[noparse][[/noparse]1].start ' calls vgatext.start ?
tv[noparse][[/noparse]0].out·· ' calls tvtext.out ?
tv[noparse][[/noparse]1].out·· ' calls vgatext.out ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Yes to all the questions in your code comments. The reason tv[noparse][[/noparse] 1].start calls vgatext.start isn't because the methods have the same name, but because vgatext.start has the same relative index of occurence (among the public methods) in vgatext.spin as tvtext.start has in tvtext.spin. If you were to shuffle the order of these public methods in one of the files, you wouldn't get what you want. You would also get screwed up if the two routines required a different number of parameters. (I haven't verified the latter; but since the number of parameters is set at compile time, I'd be stunned if it weren't true.)
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Yup, makes sense. I kind of suspected the numbering -vs- naming as you've described having mentioning "method number" before.·I have had my nose deep in object reconstruction studies all day today. I wonder how much abuse we can give this thing without creating a problem for such "special coding" going forward.
So, what this means is significant.
Concrete experiments should be done to thoughroughly explore and describe this "feature". Unfortunately, I'll not be able to contribute too much on this for a few weeks because of family needs as of tomorrow morning. I'll still check in as a "casual observer" though.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I'll probably just drop it. It would play immense havoc with my CLEAN program, which I feel is a more robust approach to things like BIOSes. Abusing object indices is an interesting diversion which may work fine if done carefully, but it's too brittle for serious application development. OTOH, it does provide some tantalizing possibilities for code obfuscation!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
I've grown to like Stevenmess2004's DOL model, and am researching alternatives there. It's more of an O/S approach, but the idea does not demand eating up precious prop memory with unused modules. As a run-time solution, it appears to be expandable in a linux-like driver way.
Your work with CLEAN will go a long way to solving memory issues for many users and is akin to finding round pegs for round holes. DOL is kind of an elliptical peg it seems, but holds promise in the "everything to everyone" at run-time arena.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
In terms of a BIOS, if every device driver object had a standard set of functions in the source in the same order with same number of parameters, it would be very easy to just switch between devices in the device routing objct with ...
That's a lot more lightweight than the Case selection. There'd be a need to map which device the user routed to to an index and probably necessary to add a Null device as well to stop code crashing if a non-existent device were selected ( and re-directing output to Null is reasonable anyway ).
By having a "PUB Identify" in every device driver the router can dynamically determine what devices there were. As long as the devices were in sequence with Null as last it can run through device[noparse][[/noparse]n].Identify to see what there is when it first starts or dynamically ...
Your approach makes the technique more robust than I thought it could be. For me, it provides a way to deal with an OBJ list that CLEAN may have pared somewhat. If I use it, I'll have to find a way to identify which ones to keep. This may be as easy as using the CASE construct in the open method (to trigger CLEAN) and returning an index into the object list as the filehandle. One advantage the CASE construct conferred to the I/O routines was inherent bounds checking on the filehandle. This could be added to the indexed methods, too, but at the expense of a little extra overhead.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Another idea for the object indexing method would be to start with empty driver objects to get through compilation then post-compilation inject the required objects into the image and fixups the related pointer tables.
Just out loud thinking, no thought given to the pro's and con's.
How would you handle multiple instances of the same object? I'm thinking about a serial I/O object that can be instantiated several times to accommodate multiple serial channels. Your identify method won't be instance-sensitive, but maybe it could return a null value if the channel is already being used.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
The deviceCode could be split into parts, msb's indicate the type of device, lsb's the instance of it / pin number used. The call into deviceDriver done on the msb's index only, lsb used by the driver itself.
In my Basic Interpreter I wend for a fileHandle array [noparse][[/noparse]0..N] which indicated deviceCode which had been opened, and that also had to start tying in which pins had been allocated to what, whether Input, Output or Both ( similar with file names for the SD Card driver ) and it all got very complicated very quickly, and I never did get that part finished.
I wanted to have a high-level ( interpreted command ) like ...
tv = Open "TV,PAL,INTERLACE"
sin = Open "KEYB"
sout = Open "COM30,9600,N,8,1"
Then just call Print#tv,"Hello" and Print#sout,"Hello" and have everything directed as required. That should be translatable into raw Spin.