No, I don't want to touch the Verilog. I just need to know the variable names in the C++ code. I see pin_out, pin_in and pin_dir in the trace file, but the prints don't make much sense to me. I see references to a top_veri__DOT__pin_out element, and I'm guessing that's the variable I'm looking for.
I added a printf to main.cpp that prints top->top_veri__DOT__pin_out and top->top_veri__DOT__pin_dir. I started seeing some activity at line 33030, which corresponds to cycle 264240. That's about 22 msecs after startup at 12MHz. I see values of 0x10000000 and 0x30000000, which match up with the EEPROM on pins 28 and 29. If I get around to it I'll try interfacing my EEPROM simulator to see if I can load a program.
Do we have any performance comparisons between the Verilator-based simulation system and any of the established Propeller simulation tools?
I can't help thinking that the Verilog derived simulators will be slower (read: less efficient in terms of host PC clock cycles) than a hand-crafted native software simulator designed by a squishy earthling. Although, ignoring speed I reckon the translated/transcompiled Verilog method is likely to be unbeatable for fidelity to the real world hardware.
Vtop_veri takes about 34 seconds to run 80 million cycles on my PC. spinsim takes about 2.25 seconds to run a Spin repeat loop for the equivalent of 80 million cycles. That's a factor 15-to-1. spinsim does use a shortcut where it only executes every 4 cycles. It doesn't simulate the separate fetch, read src, read dst and write cycles. It also doesn't simulate the hardware counters.
It's not an exact comparison since they're not running the same PASM code. Vtop_veri is running the booter code where it is sitting in a loop trying to read a nonexistent EEPROM. The spinsim test is running the Spin interpreter that is running an empty repeat loop.
I can't help thinking that the Verilog derived simulators will be slower
You are so right! It takes 8192 cog clocks to load a cog. That accounts for about 1/4 of the simulation with 1000000 steps. It takes 32 simulation steps to generate 1 clock cycle of RCFAST. Loading a cog takes 640uS on a real Propeller. So, without saving waveform data that is 1/114 real time. Bypassing the clock divider might provide a nice speed boost.
I added a printf to main.cpp that prints top->top_veri__DOT__pin_out and top->top_veri__DOT__pin_dir. I started seeing some activity at line 33030, which corresponds to cycle 264240. That's about 22 msecs after startup at 12MHz. I see values of 0x10000000 and 0x30000000, which match up with the EEPROM on pins 28 and 29. If I get around to it I'll try interfacing my EEPROM simulator to see if I can load a program.
Those seem hi cycle counts to the start of EE ?
There is a UART timeout default, but RXD=L can skip that, and the SCL is slowed down with dummy codes, so maybe a Sim-version of EE-BOOT rom can speed this.
I wonder if there is a way to also avoid doing ~ 295,000 SCLs over the whole 32kEE. ( ~ 11.8M+ sysclks)
That would need some EOF indicator fed back to Loader, or some file-size-count value fed into the loader ?
Do we have any performance comparisons between the Verilator-based simulation system and any of the established Propeller simulation tools?
I can't help thinking that the Verilog derived simulators will be slower (read: less efficient in terms of host PC clock cycles) than a hand-crafted native software simulator designed by a squishy earthling. Although, ignoring speed I reckon the translated/transcompiled Verilog method is likely to be unbeatable for fidelity to the real world hardware.
I stand happily to be corrected
Of course it is a trade off, but the appeal of accurate simulation is exactly that - accurate simulation.
In many cases, quite small pieces of code would be tested, so raw speed is less important than good results and ease of use.
I found an older thread that would make a great simulator test :
This times the 'flight-time' of WAITPNE and OUT opcodes, for Pin-In to Pin-Out delays
HW tests come in as [6~7 cycles + 9.31ns (Tco)], Sim might not get that Tco right, but it should nail the flight time ?
I tried the verilog EEPROM models from Microchip and ST. They weren't drop-in ready.
The cogs seem to be fully functional. What if we used another Propeller to emulate the EEPROM? :cool: There is an i2c slave in the obex. Hopefully it will be easy to adapt it into a PASM EEPROM emulator. Then, we can just add another cog with 64kB of memory attached to it. The special cog will have its cog ram pre-loaded with the EEPROM emulator. It will still be a 100% verilog simulation.
I tried the verilog EEPROM models from Microchip and ST. They weren't drop-in ready.
The cogs seem to be fully functional. What if we used another Propeller to emulate the EEPROM? :cool: There is an i2c slave in the obex. Hopefully it will be easy to adapt it into a PASM EEPROM emulator. Then, we can just add another cog with 64kB of memory attached to it. The special cog will have its cog ram pre-loaded with the EEPROM emulator. It will still be a 100% verilog simulation.
Sounds like a lot of work, and ultimately slower to run, than either fixing / writing a Verilog EE model, or doing a P1V that can read direct from a file, skipping the whole Emulate EE delay.
I've attached the EEPROM simulator from spinsim. You just need to call EEPromInit to initialize it, and then CheckEEProm on each clock cycle. EEPromInit will read the file eeprom.dat, or create it if it doesn't exist. If you pass a file name to it, it will open the file and read it into the EEPROM buffer overwriting whatever it read from eeprom.dat. Alternatively, you can create eeprom.dat manually by just copying a P1 binary file to it.
The argument to CheckEEProm is the current value of the output pins. It returns an updated value of the pins. If you write to the EEProm you should call EEPromClose to write the new contents to the eeprom.dat file. I'm not sure how verilator handles floating pins. You may have to simulate a pull-up resistor by setting a pin value to 1 if the dir pin is zero.
It takes 3 minutes to load from EEPROM. I think we really want to load the RAM directly.
Good progress
Even tho this is slow, it does reasonable fully exercise the simulator, and It works great! gives confirmation the overall concept is solid.
Very promising! Perhaps we can verify the cycle-correctness of this simulation against a real Prop (don't see why it would be different, apart from some obscure startup condition that hasn't been set right, which can be rectified easily enough).
That might be useful since we seem to have a running conjecture that a hand-crafted and well optimised (e.g. not Verilog derived) implementation of a Propeller simulator might be more "host-PC clock cycle efficient" than the Verilog method.
If we can verify it, could we use this system as a "golden standard" against which one could compare other hand-crafted fast simulations?
A fast simulator which can be integrity checked against a solid (but possibly slow) Verilog model would be the best of both worlds. One sim to rule them all.
Well, the Propeller was not built from Verilog. Chip wrote the P1V Verilog version many years later. So there is some chance there are differences between the silicon and the Verilog behavior. All be it slim.
I guess there might also be bugs in the Verilog to C++ translation.
All worth some seriously nerdy investigation and testing.
A choice quote: "Every polygon of the Propeller's mask artwork was made here at Parallax. We designed our own logic, RAMs, ROMs, PLLs, band-gap references,..."
So yes, Chip did indeed will it into existence!
So there is some room for the real Prop silicon not behaving as the later Verilog version.
The magnitude of that article Heater posted (here) is extraordinary (shockingly so -- in a positive way) on many levels. I strongly urge any serious engineers among us to read it in detail. Particularly look at the HALT testing categories and figures and ruminate on them for a while. Wow. This has changed my perception of the Propeller 1 very considerably in the last hour or two. Heater, thank you so much, once again. My current project (a from-scratch simulator) was slowing down a bit but is now resuming with renewed enthusiasm!
It's nice that verilator is working out well. The time taken to simulate loading from EEPROM is much faster than I expected, but it would be nice to eliminate that for most tests.
I took a look at that linked document, and it seems that some of the arguments about design quality that were made for the P1 don't apply to the P2. Parallax is now going to use synthesis like everyone else. And it's not clear to me that the FIB machine and e-beam prober will work with the P2 process geometry?
Also I'm guessing that CDM ESD testing eventually had to be done for P1?
It would be good to see some discussion about test coverage in manufacturing. Big companies will want a convincing argument made here. This presentation about what test coverage is required to achieve a certain DPM (defects per million) seems pretty easy to follow - https://www.ee.iitb.ac.in/~viren/Courses/2012/EE709/Lecture3.pdf
I'm very sure the old e-beam prober and whatever are no use on whatever new technology the P2 is made in. And even if it were, what can you do? All those transistors are laid out by synthesis software from Verilog. How do you know what is what?
You worry too much. "Big companies" like Intel have thousands of bugs in their silicon. Back in the day we had a two inch thick document, under NDA from Intel, describing all the bugs in their new 286 processor.
My take on this is that designing a chip, in Verilog, is getting more like my writing code in Javascript. Which depends on the JS engine being correct. Which depends on the C++ compiler that built it being correct. Which depends on the assembler being correct. Which depends on the processors instruction set being correct. Which depends on the HDL synthesis of the processor being correct. Which depends on the foundry chip build skills being correct....
What are you referring to by "You worry too much"? If it's about my coverage comment that has nothing to do with design errata. It's about catching manufacturing defects. How many bad chips are likely to be shipped to customers. Any major company will ask their suppliers about this and many other esoteric things. Now that Parallax is using a more standard flow I would assume that it has DFT logic and this will all be known. If not it's painful to develop tests with adequate coverage. If one doesn't verify the coverage, then it's probably poor. This has been known for decades.
BTW - I've actually seen 65 nm and below ASICs FIBed. It's like patching a binary. You pullup the design database and hopefully you figure something out. You may not get exactly what you want. Note that designs often contain spare gates to make chip spins possible without all layer changes. You can change both the high level RTL and the low level gates and formally verify that they are equivalent.
Certainly manufacturing defects have to be tested for. Like any other product.
I'm amazed that one can "patch" chips with today´s feature sizes. And, as you say, the complexity of having to reverse engineer whatever the synthesis tools created for you. Sounds like something you really want to avoid having to do.
Comments
Accessing Signals in Verilator Models
http://embecosm.com/appnotes/ean6/html/ch06s02.html
Could be as simple as this:
http://embecosm.com/appnotes/ean6/html/ch06s02s02.html
Which requires some annotation to be added to the Verilog. Which is a bit of a pain.
I can't help thinking that the Verilog derived simulators will be slower (read: less efficient in terms of host PC clock cycles) than a hand-crafted native software simulator designed by a squishy earthling. Although, ignoring speed I reckon the translated/transcompiled Verilog method is likely to be unbeatable for fidelity to the real world hardware.
I stand happily to be corrected
It's not an exact comparison since they're not running the same PASM code. Vtop_veri is running the booter code where it is sitting in a loop trying to read a nonexistent EEPROM. The spinsim test is running the Spin interpreter that is running an empty repeat loop.
You are so right! It takes 8192 cog clocks to load a cog. That accounts for about 1/4 of the simulation with 1000000 steps. It takes 32 simulation steps to generate 1 clock cycle of RCFAST. Loading a cog takes 640uS on a real Propeller. So, without saving waveform data that is 1/114 real time. Bypassing the clock divider might provide a nice speed boost.
Those seem hi cycle counts to the start of EE ?
There is a UART timeout default, but RXD=L can skip that, and the SCL is slowed down with dummy codes, so maybe a Sim-version of EE-BOOT rom can speed this.
I wonder if there is a way to also avoid doing ~ 295,000 SCLs over the whole 32kEE. ( ~ 11.8M+ sysclks)
That would need some EOF indicator fed back to Loader, or some file-size-count value fed into the loader ?
It currently takes just over 1s to EE load, (tho that is at 12M) so any improvements in loading time will help here..
Of course it is a trade off, but the appeal of accurate simulation is exactly that - accurate simulation.
In many cases, quite small pieces of code would be tested, so raw speed is less important than good results and ease of use.
I found an older thread that would make a great simulator test :
http://forums.parallax.com/discussion/155143/waitpxx-detailed-timing
This times the 'flight-time' of WAITPNE and OUT opcodes, for Pin-In to Pin-Out delays
HW tests come in as [6~7 cycles + 9.31ns (Tco)], Sim might not get that Tco right, but it should nail the flight time ?
Anyway, there is lots of interesting work going on here.
The cogs seem to be fully functional. What if we used another Propeller to emulate the EEPROM? :cool: There is an i2c slave in the obex. Hopefully it will be easy to adapt it into a PASM EEPROM emulator. Then, we can just add another cog with 64kB of memory attached to it. The special cog will have its cog ram pre-loaded with the EEPROM emulator. It will still be a 100% verilog simulation.
I find this post that mentions needing adding a pull-up before i2c read worked ? Maybe that helps ?
http://www.microchip.com/forums/m863460.aspx
and there is also this, another source that does seem to include memory init from file, as an option
http://www.cypress.com/documentation/models/fm24v02-verilog
The argument to CheckEEProm is the current value of the output pins. It returns an updated value of the pins. If you write to the EEProm you should call EEPromClose to write the new contents to the eeprom.dat file. I'm not sure how verilator handles floating pins. You may have to simulate a pull-up resistor by setting a pin value to 1 if the dir pin is zero.
It takes 3 minutes to load from EEPROM. I think we really want to load the RAM directly.
Good progress
Even tho this is slow, it does reasonable fully exercise the simulator, and It works great! gives confirmation the overall concept is solid.
That might be useful since we seem to have a running conjecture that a hand-crafted and well optimised (e.g. not Verilog derived) implementation of a Propeller simulator might be more "host-PC clock cycle efficient" than the Verilog method.
If we can verify it, could we use this system as a "golden standard" against which one could compare other hand-crafted fast simulations?
A fast simulator which can be integrity checked against a solid (but possibly slow) Verilog model would be the best of both worlds.
One sim to rule them all.
I guess there might also be bugs in the Verilog to C++ translation.
All worth some seriously nerdy investigation and testing.
I didn't know that. How was it created then? Surely not Tape-Out and Letraset?
Did Chip simply will it into existence? I wouldn't put it past him.
Pretty much almost Tape-Out and Letraset.
To get an idea, read this brilliant story: "Why the Propeller Works": http://www.parallax.com/propeller/qna/Content/HomeTopics/WhyWorks.htm
A choice quote: "Every polygon of the Propeller's mask artwork was made here at Parallax. We designed our own logic, RAMs, ROMs, PLLs, band-gap references,..."
So yes, Chip did indeed will it into existence!
So there is some room for the real Prop silicon not behaving as the later Verilog version.
Holy mackerel, that's an awesome article! Thanks for linking that.
Edit: read it multiple times now. Astonishing.
You know, what's 'is name?
Yep. It is astonishing.
I took a look at that linked document, and it seems that some of the arguments about design quality that were made for the P1 don't apply to the P2. Parallax is now going to use synthesis like everyone else. And it's not clear to me that the FIB machine and e-beam prober will work with the P2 process geometry?
Also I'm guessing that CDM ESD testing eventually had to be done for P1?
It would be good to see some discussion about test coverage in manufacturing. Big companies will want a convincing argument made here. This presentation about what test coverage is required to achieve a certain DPM (defects per million) seems pretty easy to follow - https://www.ee.iitb.ac.in/~viren/Courses/2012/EE709/Lecture3.pdf
You worry too much. "Big companies" like Intel have thousands of bugs in their silicon. Back in the day we had a two inch thick document, under NDA from Intel, describing all the bugs in their new 286 processor.
My take on this is that designing a chip, in Verilog, is getting more like my writing code in Javascript. Which depends on the JS engine being correct. Which depends on the C++ compiler that built it being correct. Which depends on the assembler being correct. Which depends on the processors instruction set being correct. Which depends on the HDL synthesis of the processor being correct. Which depends on the foundry chip build skills being correct....
Certainly manufacturing defects have to be tested for. Like any other product.
I'm amazed that one can "patch" chips with today´s feature sizes. And, as you say, the complexity of having to reverse engineer whatever the synthesis tools created for you. Sounds like something you really want to avoid having to do.