Question for Chip re ROM code
Cluso99
Posts: 18,069
Chip,
Would it be possible for you to post the ROM data files for the Font, Sin, Log, etc. You have already posted the interpreter, booter and runner thanks.
How did you build them (instructions)?
I am trying to rebuild the whole ROM file without scrambling and then remove the scrambling from the FPGA source.
I presume I change this
Then I can start to change the booter and interpreter.
Thanks Chip
Would it be possible for you to post the ROM data files for the Font, Sin, Log, etc. You have already posted the interpreter, booter and runner thanks.
How did you build them (instructions)?
I am trying to rebuild the whole ROM file without scrambling and then remove the scrambling from the FPGA source.
I presume I change this
wire [31:0] ramq = !rd ? mem_q : {mem_q[03], mem_q[07], mem_q[21], mem_q[12], // unscramble rom data if cog loading mem_q[06], mem_q[19], mem_q[04], mem_q[17], mem_q[20], mem_q[15], mem_q[08], mem_q[11], mem_q[00], mem_q[14], mem_q[30], mem_q[01], mem_q[23], mem_q[31], mem_q[16], mem_q[05], mem_q[09], mem_q[18], mem_q[25], mem_q[02], mem_q[28], mem_q[22], mem_q[13], mem_q[27], mem_q[29], mem_q[24], mem_q[26], mem_q[10]};to this...
wire [31:0] ramq = !rd ? mem_q[31:0];
Then I can start to change the booter and interpreter.
Thanks Chip
Comments
The font, though, is in a simple graphics file. Some Delphi code assembles the binary data from the image. I can post those things, too.
That line would become:
wire [31:0] ramq = mem_q[31:0];
The ? ternary operator picks the first or second option, separated by a colon, based on whether the value preceding the ? was true or false. In this case you wouldn't be making any decision, but just doing an assignment.
It would be just as easy for me to dump the ROM from a P1 to get everything from $8000..$F003.
How did you build the interpreter, booter and runner so they began at $F004 ? Did you just fill longs up to $F004 in PropTool?
Since the first option for the wire.... was just
wire [31:0] ramq = !rd ? mem_q : {mem_q[03],....
would this also be correct...
wire [31:0] ramq = mem_q;
You need to put a 'PUB somename' at the top of the file and compile it (Alt-L). There you will find the data of interest nestled into the framework of a Spin image. You can do something like 'LONG $77777777' before and after the area of interest so you'll be able to easily see where it is.
Here is the code from Delphi that installed the various sections of ROM into the final image:
And here is the font.bmp file:
font.bmp
Last night I wrote a spin program to dump the bytes from Rom in DAT format for compilation. I also have the unscrambling working in this as well. I just need to find out which parts need to be unscrambled. I can work out what sections need to be unscrambled - interpreter and possibly booter and runner.
Licensing idea...
Could the forum rules be modified to be MIT licensed by default unless clearly stated otherwise as GPL3. An exception shall be verilog codefor the P1 and P2 which shall be GPL3 unless clearly stated otherwise.
We're working on some policy change. Ken said it will be done soon.
About what to unscramble: It's just the interpreter and booter code. The runner code is Spin byte codes. Only program code loaded into a cog from ROM gets unscrambled. So, you could unscramble those sections (interpreter and booter) and then reduce that big ? ternary expression to just a straight assignment.
Since you have been changing the pnut compiler for P2, would it be easy for you to release a P1 version with hub ram limitation lifted to 64KB or more? I am not asking for the orgh etc, just to remove the 32KB hub limitation. I presume Roy will do this for openspin but I havent started to use it yet.
There are some Spin address limitations which limit it to 32KB. I can't remember where these limitations are, at the moment, but I know they are hard and fast. As soon as you try to exceed them, you open a Pandora's box. The trick may be designing some enhancements which don't break the current tools. Even the booter expects a Spin image. It's a showstopper to start changing it all around. In hindsight, it would have been good to design things so that they were more easily expanded. At the time, though, I figured it would be a miracle enough if I got it all working, at all, so I exploited the limitations to make things small. When I started working on the Prop2 Spin, there were many really deep changes.
Delphi is great. :cool: I'm having a lot of fun with the latest Delphi version (XE6 ) and making apps for Android and iOS. Hopefully, I can do a few Propeller chip projects with a Android interface later.
Re:Here is the code from Delphi that installed the various sections of ROM into the final image:
Thanks! Chip. It's very interesting. I'll try it with XE6 sometime.
I was amazed at how you managed to squeeze all that interpreter code into 496 longs.
I don't recall any 32KB limitations but thanks for the heads up.
I don't think any user code calls the booter, but user code does call the interpreter. Therefore, maintaining this entry point is important.
Anyway, once it is soft and unscrambled, we can play and experiment with it.
For a P1.5 version I would expect that the encryption from P2 would be nice, perhaps even mandatory. But that can be decided later.
More important for you to get back to P2 asap.
Another question while we have time...
I have been thinking that for hubexec, the existing JMPRET instruction could be absolute provided it is currently executing from cog, and relative is it is currently executing from hub (ie PC > $1FF which would be PC[x:9] <> 0 and an easy test). Obviously the compiler would need to know but this could be discussed later. Would need to think about the return address, but the goto address would be fine.
Any thoughts?
I was thinking about the same thing and I recall that there was some complication to it. Maybe it had to with branching between cog and hub code.
I think I am going to play with this first. I will make one of my cogs 4KB instead of 2KB and test that out using the existing JMPRET as relative for extended cog addresses.
Here are the ROM files created from the P8X32A. I would be pleased if someone could do a cursory check with the Delphi files as I know nothing about Delphi.
My main concern is to ensure I have the bytes in the correct order. FYI I have checked the starting and ending bytes for the Interpreter, booter and runner.I have incuded the spin program that I used to dump the various ROM sections.
P8X32A_ROM_DUMP_20140812.zip
There are some holes in the ROM that has data and I don't know what they are for, if anything. I have posted those (unscrambled) in the code window.
Just to bump this in case you missed it.
In my last post, I asked about the other code in the Rom. Obviously there is a copyright message. Are the others just random or are they used for something?
There is the Lazarus. It is Delphi compatible. It is free. BST was made with it
I'm curious how much logic the silicon un-scramble was costing ?
59 LEs
43 Combination
2 Registers
Sorry this took me so long. I had to go back into the Delphi files to drum up this code. Here is an expanded set of routines which will answer your questions:
Everything makes sense now. I can create a complete clean unscrambled ROM file(s) in spin from a P1.
The reason I ask is that on a DE-0 nano (with its effectively usable 66kB of block RAM) I'd expect for some applications it might be nice to be able to allocate 48kB for hub SRAM, 16kB for COGRAM and the final 2kB for the SPIN interpreter launched at bootup. The problem with this is if "run" is ever executed it seems there needs to be this runner bytecode located in memory somewhere as well. That pushes us just past the 2kB limit if we want it to fit everything in that 2kB block. I'm thinking the original loader code may not be needed if we can boot directly from FPGA RAM already pre-initialized at startup, so we are getting very close to the 2kB target range except for this runner.src thing. When is it used?
Interpreter $F004-F7A3
Booter $F800-FB93
Runner $FF70-FFFF
There is also a copyright message and web address which can be shrunk but should remain.
The booter is a pasm program and runs first. That is what determines whether to download new code into hub ram (and possibly program to eeprom), or load from eeprom if valid, else stop.
The runner is (I think) called by the interpreter when you start a new spin program.
So currently at least, you must have 4KB devoted to ROM.
BTW I have tried to increase the hub RAM to 50KB and it fails
Thanks!
I just tried it and got your result. Previously I was incorrectly using a function in another object and getting a different result from this. The first two lines below in "start" was how I was testing it and it wasn't doing the right thing for me, while the third line worked. I don't tend to run SPIN objects, typically just the first one in COG 0 and I usually spawn PASM COGs instead, so I am not so familiar with how it works.
As for reset, I cannot see why it wouldn't always call the booter.
Why do you feel the need for more than 44KB of hub plus 4KB ROM in the emulation?
It seems the spare 2kB could only work if there is no runner bytecode called ie. no spin spawning other spin cogs, and that's too limiting. A customized interpreter with runner code held elsewhere in low memory may possibly allow it, but it won't be compatible with standard setups. Maybe some LEs could be used to hold the 24 longs worth of bytecode stored near $FFFF, but that gets fairly wasteful of LEs.
PS. I wasn't planning on moving the interpreter btw, it would want to stay at $F004 where it normally lives.
If it is not simple/possible to get the spare 2kB to fit with the intepreter it's probably not worth pursuing the idea. I have another idea for that 2kB now anyway, as a palette RAM...
I don't think you understand me.I found that you can only configure a total of 64KB, not 66KB. I don't know why, but it fails.
You are going to require 4KB of hub from $F000-$FFFF from what I can see.
While you may get away with putting the booter into cog, the interpreter and runner combined are > 2KB by a small amount.
What I have currently is all 48KB of hub RAM, with the 32-48KB being remapped as 48-64KB as well. ie it is double mapped.
This way I can preload the whole 16KB of ROM (which also appears in 32-48KB) and then overwrite what I dont want as RAM.
This is great for testing, but ultimately I don't think this can work in real silicon.
I havve spaces and comments beginning with // and they seem fine.
The file I am generating (well it just outputs from spin on a P1 to the terminal, and i cut and paste into proptool and save as a *.spin file.
The verilog is loading the file into the rom section without complaining. Hoping i can test its working correctly over the w/e.
Didnt get time to post the code and file before i retired for the evening, so i will do that in the morning.
BTW i have found that you can check the syntax by just compiling/checking a single file. Saves heaps of time. So i am gradually finding my way around both quartus and verilog
Well actually that would be ideal, but to date Cluso has found issues with being able to use that last 2kB of the 66kB total I was originally talking about. I don't know why it wouldn't be usable memory but he wasn't able to make it work as extra hub RAM it seems (at least yet). Maybe something else in the P1 design stops us from using it, from what I can tell in the data sheets it's supposed to be there inside the Cyclone IV part.
The palette RAM reuse idea could still work in the loader part of the top 4kB of hub memory. It needs to be a dual ported RAM though as my future SDRAM/gfx controller will need to read from it at high speed independently from the prop core. It only needs to be 256 x 18 bits/24 bits wide, which is only taking 1kB so it wouldn't necessarily clobber the runner.src byte code.
This is a good idea, and you could actually FPGA-Boot-preload all 1~8 COGS and any of the HUB using $readmemh.
That could be useful for final FPGA field deployments, as it saves a boot step, but is less suited to any eventual ASIC.
Better may be a dual approach,
a) * Support pre-load of all memory areas, using Verilog $readmemh options.
b) * Develop a minimal P2-like tiny boot ROM, that flips in on Prop-RESET, and then removes itself.
As in P2, the sole purpose of this, is to locally load-run more code. This code could have i2c and SPI choices.
a) is suited to FPGA and especially FLASH based FPGA/CPLD, and b) is closer to real silicon in P1 and P2.