Vinculum Speed ?

rokicki · 2007-04-05 19:31

That's the fun thing, and the thing I think you can figure out, but I will help if you really want me to, is how to coordinate the
reading and the sending. It's not that hard; think of how you'd do it if you had a friend doing the reading and another friend
doing the writing and a big old shared memory; what would you guys have to say to each other to arbitrate use of that buffer.

I only mentioned the cog limitation in case you were doing a bunch of other things at the same time. (For instance, the SD
assembly itself takes a cog, the TV takes a cog, the keyboard takes a cog---you have eight to work with total.)

I'm not trying to be coy, and I'd be happy to offer suggestions on how to do the arbitration, but I think it will be fun for you
to figure it out and play with it. It is, after all, your design.

Mike Green · 2007-04-05 19:33

On question is whether you have enough memory available for a second buffer. If you have memory for two buffers (or one buffer of twice the needed size), you can have your routines alternate buffers. The SD read routine reads into one buffer while the master-to-slave routine transmits from the second buffer. When they're both done, they swap buffers (actually swap buffer addresses).

rokicki · 2007-04-05 19:38

Right, but don't limit your thinking to two buffers. Consider using three or four, for instance, or even eight, or more, in order to compensate for reading jitter.

Or just a big ring buffer.

Essentially, you want to be prepared for the fact that some reads may occur slower than others. If you use the high-level SD read routines, every once in a
while it needs to actually read *two* blocks to get one block of data (the second block is the appropriate FAT table entry) and this will introduce jitter.
(If you guarantee that the file is contiguous, and you skip the filesystem code and just read sequential blocks, this source of jitter goes away.)

I will note that the SD read routines do do a memory copy to load the data, and this slows things down a bit. That memory copy can be eliminated in some
cases (I just haven't added the code to do so) if the reads are always in chunks of 512 bytes or multiples thereof.

Post Edited (rokicki) : 4/5/2007 7:48:31 PM GMT

BTX · 2007-04-05 20:26

Rokicki said...

2. After the speed test print the return value (r) just to make sure you actually read that many bytes. (The return value
is the count of bytes read.) If it doesn't match something is wrong (maybe the file is too short or something).

The (r) value is 4001 in the example, like we hope.... just I ask for read 4001 bytes.

Rokicki said...

3. Not sure what this is here for; you don't need it:
repeat while tbuf[noparse][[/noparse]4000] == $00

I implement that, to know the exactly momment, that the SD rutine wrote the last byte of data in the buffer. Is that not correct ?

If I erase that line, I will be reading the time, that the last instruction takes in·spin ? don't ? Or the code:

····· r := sdfat.pread(@tbuf, 4000)
is still, while reading the data ?

Rokicki said...

That's not a bad speed for a first go. The next thing to test is, how fast can you ship data from one prop to the other.
This will almost certainly take assembly language. It would be interesting to see how fast you can do it in spin though.

To test how fast I can send the data from one prop to another, I'll have some problems now, because I've not still designed that board.
I only could try to do it, by connecting my actual board, to the 'demo board', but would be complicated the connections between them (wire lenghts) and will have differents clocks each props, (Chip reccomend me, to use one "chip·oscillator" for all master and slaves).

To do that comunication, I think to send a first·byte "of address" plus·a "logic one" in a ·common line to all props, indicating that the byte at the bus is an address, then send the bytes for the addresed prop. The addresed prop will respons too, with a another common line·between them (OR'ed with diodes)·to let the master to send the next byte.
Another way, could be, to use one I/O of the master for each slave (16 I/O total for this) plus eight for bus, plus four for SD, plus two for the EEPROM....although it coul be a faster code, I think that's better to have more I/O pins free...·(so, an easy PCB design and safety under noises ...taking in care that it'll be single side PCB).

What do you think about it ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-05 20:28

Sorry guys I'm lefting now, I'll come back at night .... for answers.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

rokicki · 2007-04-05 20:44

Alberto,

To test sending between props, at least for now, just do it all on the demo board with a single prop. Use one cog for writing, and one cog for
reading, and have them communicate just like they would if they were separate props. At the speeds you will probably be working at, this
is probably just fine (especially if you use a simple asynchronous signalling mechanism such as the one you describe).

By the way, no need for diodes; instead of transitioning the acknowledges between 0 driven and 1 driven, transition them between 0 driven
and undriven, and use a pullup resistor.

BTX · 2007-04-06 01:24

HI.

Mike, Excellent idea !! in that way I can be working at the same time with both rutines....

Rokicki, Excellent idea too !! I don't need the board made, to test the master-slave code.... (I also will not use diodes)

I take out·mine 'repeat xxxxxxxx' line of the code.

About the buffers, I see that the maximun speed that·I get, is when the buffer size is 4000 bytes. (don't know why...but is about your code 'I suposse'). More bytes decrease·my read speed and minus too.

To have more than two buffers will take a lot of memory too, but if in example we have 'N' buffers, I think a 'mix' of the Mike idea and mine.

Flags to indicate the state of·each one of the·'N' buffers, when·the card·finish reading some data and the buffer is full, check for a valid flag and·get that buffer, for the next data. The master-slave rutine must do some similar, except that it choice the previous buffer used by the SD code to get the data. So, more small buffers,·and·there is not· lose time. In fact, I need to send with·each slave, 1152 bytes out/time.· I could aplicate the same method for the slaves, to comunicate with the rest of the electronics.

The file/s that will be in the card, are longer/s like 20Mb, ·How ?·I guarantee that the file is contiguous.

If you mean that is not necessary the FAT file system in that case, I could not copy the file·easily from a PC ? I must do a rutine to send it by USB port and save to the card with the SD code...It will take me more cogs, memory, pins, and time to pass the data from the PC and save it in the card. is that correct ?

Another point...you give me a great idea...perhaps I could use a keyboard and a TV, to have a screen showing me, some info·about the system state. All that, in the Master prop, but only while it wont take my speed down.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

rokicki · 2007-04-06 18:01

Adding keyboard/monitor support will not slow things down. It will increase power consumption marginally, but no big deal.

To guarantee the file is contiguous, simply freshly format the card and then copy the file and it should be contiguous.
(You only will get noncontiguous files if you ever delete or overwrite a file.) But I wouldn't worry about this right now.
Just stick with exactly what you are doing. 20MB is not a problem at all.

If I were you I would probably start with a ring buffer of 8K, as 16 512-byte blocks. Or if it's easier for you to think about
just do it as 4 2048-byte blocks. Use 2048, though, not 2000 and not 2001. (It should be a multiple of 512 for the best
speed.) The reason to use more than two buffers is in case one operation is occasionally slow; since you only have the two
buffers, you don't have any real ability for the reading routines to get far enough ahead to make a difference.

Later on I can make a special version of the SD routines for you that will eliminate the memory copy in read() and thus be
a tad faster than what you are using.

BTX · 2007-04-06 19:29

Rokicki.
All understood. You've the things very clear...I tried with 512, 1024, 2048, etc and it read faster...390 Kb with my card.

Rokicki said...
Later on I can make a special version of the SD routines for you that will eliminate the memory copy in read() and thus be
a tad faster than what you are using.

That will be great for me!!!!! I suposse so, that I must solve the·master and slave code now, testing it with the actual speed, and leave the faster version for later, correct ?.
I'm trying just now to do the master code with·'shared memory' between·'abuf and a bbuf', but i'm stopped now at secondary assembler problem, due my low experience with it.

Let me know please,·If·this code work ?

Sending   mov           adrrptr,par                        ' adrrptr address of the abuf
          add           adrrptr,toadd                      ' add 2048/4 to the address of abus to know if data was read by SD 
          rdbyte        status,adrrptr   wz                ' read that address pf the abuf
    if_nz jmp           #use_abuff                         ' uses the: use_abuf piece of code to send data to slaves from abuf
          add           adrrptr,toadd                      ' add 2048/4 more to the address (same as before)
          rdbyte        status,adrrptr   wz                ' same 
    if_nz jmp           #use_bbuff                         ' same
          jmp           #Sending                           ' If there are not data in abuf or bbuf loop

·In this piece of code, I read the data, and put it in the apropiate buffer, and then save a aditional byte of data in that buffer, to check when the data was full save, so If I have a buffer of 2048 to save the file data, I do it 2049 to save the 'flag' in the aditional location. Each time the SD code·save the data in the buffer, I do: abuf[noparse][[/noparse]2048] := 1, if not abuf[noparse][[/noparse]2048]·:= 0, since SD code save the data from locations 0 to 2047.
This is NOT working....why ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

Envio editado por (BTX) : 4/6/2007 7:45:41 PM GMT

Mike Green · 2007-04-06 19:44

Alberto,
When you display code fragments, it helps to also show the variables. Based on the comments (like "add 2048/4"), you're mixing up main memory addresses (byte addressing) with cog addresses (long addressing). You should add 2048 to adrrptr the first time and 2049 the second time.

BTX · 2007-04-06 20:00

Sorry Mike ..
I've this:

var
'
   long cog
   
PUB Start(Adrr) : Success

   Stop
   Success :=  (Cog := cognew(@Sending, Adrr) + 1) ' El dato contenido en direc tiene que ser = 0  para que no mande data a los TLC

PUB Stop
{{Stop toggling process, if any.}}

  if Cog
    cogstop(Cog~ - 1)

dat
        org
Sending   mov           adrrptr,par                        ' 
          add           adrrptr,toadd
          rdbyte        status,adrrptr   wz
    if_nz jmp           #use_abuff
          add           adrrptr,toadd
          rdbyte        status,adrrptr   wz
    if_nz jmp           #use_bbuff
          jmp           #Sending

use_abuff nop                                               ' still doing

use_bbuff nop

          call          #delay                            ' Only temporary to check if work it with the spin rutine
          mov           temp,par
          add           temp,toadd
          wrbyte        zero,temp                          ' 
          jmp           #Sending

'------------------------------------------------------------------------------------------------------------------------------
Delay
             mov       t8,delt                         ' 
reta         nop 
             djnz      t8,#reta                          ' Lose 200 clk cycles
Delay_ret    ret
'------------------------------------------------------------------------------------------------------------------------------
delt     long 100000
zero     long 0
toadd    long 512
adrrptr  byte 1
status   res  1
t8       res  1
temp     res  1

·The code must loop at 'Sending' unless the SD code save all data in·abuff, In the spin code there is a abuf[noparse][[/noparse]2048]:=1 to indicate this. after the SD code finish to read.
Hope it is clear now.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-06 20:17

Great Mike ..!!
Correct, I fix it....why to say the you're a genius too ?... all forum members know it.
Thanks.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

Mike Green · 2007-04-06 20:24

Alberto,
You've declared adrrptr as a byte. It's not a byte. You need to declare it as a long.

rokicki · 2007-04-06 21:23

Frankly, I was hoping you would do the top-level management with a spin routine (it's not performance critical after all).
Only the lowest-level SD I/O and prop->prop communication routines should be in assembly, I think. Why mess with
assembly when you don't have to? Spin code is generally more compact anyway.

And I would even start by writing the prop->prop routines in spin, to tell the truth.

What I was hoping to see was a prop->prop sender routine, in Spin, and a prop->prop receiver routine, in Spin, and
a top-level object to start both of these and then report on their speeds/progress. Once this works well we can
recode each in assembly (and test them, spin vs assembly and assembly vs spin, rather than trying to debug both
simultaneously).

BTX · 2007-04-06 21:24

Mike.
You're right again...... "that's an address and can't be a byte" unless the address is under 255. (but I'm not sure too, if it never can....)

I just correct it, I don't know now...how it seems to work ?? (perhaps, like I was checking only the abuf, and it got an address under 255 ??..i'm not a lucky man..) but don't care of that, It is working too, in the correct way.

Thanks.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-07 05:07

Here's my first try in spin to read the data in the card, I get 368Kb, not so bad to be in spin.
I'll go by the Master code on Saturday.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-07 23:35

Hi All !!
Here is the Master code in spin.
At this·time, I have the SD reader code, working togheter with the Master code, only two cogs are still used.
I'm going now, for the 'Slave' code, I'll do it in the same prop, to let check the speed, after.

It seems·to work fine, but I was unable to find the way to check the speed of it all (SD and Master), "supossing that the·'Slaves', will answering faster·the requests from the Master".

Here's the actual code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-08 02:04

Hi again..I'm speedy this time.
Here is all the code, Master and Slave.

Tom.
Please, check it for me. ok?
Too much hours programming...seems to be ok, but.........
Thanks !.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

BTX · 2007-04-09 16:23

Some suggest about it ?
Thanks.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

PointDog · 2007-04-12 04:27

Hi, all,
Sorry to butt in, and I know that BTX has since decided to use an SD card instead of the Vinculum VDAP chip, but as I mentioned on my earlier post I am using one, and if any other propeller-heads are searching for answers, they'll probably come across this thread. I have the vdap working perfectly and would like to share some gotchas. I haven't posted my driver code because it's in MPLAB C30, but this info will help propeller users anyway.

First BIG thing that can cause MAJOR headaches is power supply. I put a 1uf and a 100nf in parallel near the chip, because it would randomly stop working. This seemed to fix it.

The reset pin is extremely sensitive to noise, perhaps an RC setup would be good rather than tying the pin directly to the power supply rail, but remember, you _WILL_ need to have a way to re-set the vdap from your code (connect the reset to an IO pin on the prop, with a capacitor to ground) because certain errors can cause the vinculum to lock up irrecoverably.

The transfer speed is dependent on the device attached, I tried an old 128mb flash disk that i got as a freebie and got 56kB/sec write and 88kB/sec read. But my code is more focused on robustness and error handling than speed. I noticed also that using the command "WRF 256" got abut 14kb/s write, but increasing the block size to "WRF 1024", increased the speed to the 56kb/sec mentioned above.

The 0x90 (IPA) and 0x91 (IPH) commands do not work in shortened command mode. (Using VDAP firmware version 2.19 with the command monitor port in FIFO mode.) Even though they are listed in the datasheet.

The command 0x10 will never be used because the chip must already be in shortened command mode to use it.

The comand ECS will never be used because the chip must already be in extended command mode to use it.

The FIFO RD and WR pins are switched on the ver 1.07 VDAP Firmware data sheet. -- Thanks for pointing this out BTX!

the 0x08 command (WRF) actualy returns 2x <prompt>, after the data has been written. This can cause the expected data stream to be offset by 1 byte on microcontroller systems.

The WRF command will return a prompt if the first part of the command is ok. (WRF<space><size><0x0D>) and then a "Bad Command<0x0D>" if there was a problem with the <data><0x0D> part of the command. So a good WRF command will return <prompt><0x0D><prompt><0x0D>, but a bad command will return <prompt><0x0D>BC<0x0D>, and I assume that a failed write will return <prompt><0x0D>CF<0x0D>.

Nowhere in the datasheet is the maximum size of a write (or read) operation mentioned. In actual fact, when using ECS command mode, and IPA input mode the maximum write size using WRF is 9,999,999 bytes because the chip automatically starts to write data to the open file after 8 ascii characters, regardless of whether it sees an 0x0D character at the end of the command or not. This behavior should be doccumented, or better still the chip should return a "Command failed" when it recieves the 0x0D byte, if the data size requested is longer than the allowed number of characters. The current behavior can result in an unrecoverable "lock up" if the WRF command format is incorrect, because the chip will not abort a write operation, even if the disk is removed.

If the flash disk is removed during a WRF operation the chip will lock up unrecoverably, requiring a reset, or power down. This behavior should be doccumented, or changed.

The VDIF (Ver 1.06) had to be used to get some of the required information, because the VDAP 1.07 datasheet did not contain all of the information about the disk interface, some of the discrepancies are:

a)The information in section 2.3 (Start Sequence) of the VDIF datasheet is not in the VDAP datasheet.

b)The information on Page 9 of the VDAP datasheet, regarding the difference between ASCII and HEX input mode is not in the VDIF datasheet.

I hope this helps someone.

BTW are there any ICD or ICE tools available for the prop yet?

Mike Green · 2007-04-12 04:47

Thanks PointDog, that's very helpful.
Mike

BTX · 2007-04-12 13:22

Thanks so much PointDog for the info..
But i've decided to use the SD, beceause no enough speed for me with vinculum, I obtain only about 45-48Kb reading it, and I know, that with some code improvements would be possible to obtain more, but, how much more ?, sure it will be far from my needs. (500Kb - 700Kb)
I was thinking that, if VMUSIC reads at least 128K to get the data (I don't know which firmware it uses), there will be a way to do it, but not much more of that speed.

The problem of "why I can't get more speed with it", I think it becames from the time to read the FAT with the VDIP (FTDI people confirm it to me), once your command had readed the data, you could extract it from the VDIP faster, in FIFO mode.
Like you comment it depends of the disk used....and with the last 2.19 firmware it works ok, I've included too, a reset pin directly from the propeller, if not, was impossible to syncronize my code with the hardware.

About the 0x10 command you are right too, I was broking my head until some people from FTDI told me that.
And is correct too, a "CF" message is sent from the VDIP, if a command failed.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Regards.

Alberto.

bambino · 2007-04-12 13:44

PointDog said...
BTW are there any ICD or ICE tools available for the prop yet?

There are no hardwire tools other than a debug terminal. Their is some success with GEAR, and POD.
I may be wrong but I think Gear is a ICE and POD is an ICD.

PointDog · 2007-04-13 06:20

BTX,
you are definitely right, if USB 2.0 full speed is used, the bus runs at 12Mbits/sec, even with a standard serial protocol with no error checking, and no retries, framing etc the max speed is 1.2Mbytes/sec. I think that the "real" transfer rate of the Full speed USB bus is probably down around the 800kb a second anyway, taking into account all the other data that is passed along the bus to control and sync the devices. If you change your mind I can do a test run with a read of a large file, using large blocks, but I think it will still be well below your requirements.

All the best with your project, and to all members of this forum, keep it up. This is probably the most active and helpful microcontroller forum around, If I could bear to forget my years of experience with PIC's and C, I'd be using props for everything.

Vinculum Speed ?

Comments