Software Defined Radio - thoughts on P2 application
bob_g4bby
Posts: 440
Just thinking aloud here,
Over the past 20 years, the amateur radio scene has seen a disruptive change in hardware design brought about by software defined radio. Hams pioneered it in fact. Here's an article that introduced it to many of us https://sites.google.com/site/thesdrinstitute/A-Software-Defined-Radio-for-the-Masses. No more superheterodyne circuitry.
The sdr principle is simple: On receive a small part of the radio spectrum (say 3.7-3.72 MHz) is mixed with an RF oscillator (say at 3.7MHz). After low-pass filtering the difference signal spans 0-8kHz or maybe as high as 0-192kHz. Actually two signals are produced, 90deg out of phase with each other using a Tayloe detector (Dan Tayloe perfected it). These two signals are digitised by a cheap stereo, low-noise 24bit sound card chip into two streams of integers, usually labelled I (in phase) and Q (quadrature).
From there on, the signal path is all digital until the sound output dac, amplifier and speaker. The platforms that hams have used range from multicore PCs, through RasPIs down to DSPIC33 chips. A pal of mine in northern England has developed an HF radio with the DSPIC33 - he can be heard most weeks chatting with other hams all over Europe. He gets good signal quality reports. He's also a keen cave explorer, and has developed 'cave radios' for surface-to-ground communications and for cave surveying. (VLF signals penetrate rock to a degree). He runs his radios at between 8k - 16ksamples/second - adequate for the 3kHz wide signals used on-air.
He's programmed the dsp on a 70MHz dspic33 in C, on a sample-by-sample basis - he doesn't process arrays of samples due to compute/memory constraints. The DSPIC is used to control the radio hardware as well - tuning, band change etc. He's recently switched to using a dual-core DSPIC and split the tasks. One core does the hardware control and the other the DSP. He still complains that the cpus are only 16 bit and that requires all sorts of maths tricks to avoid signal degradation. Quite a lot of original work and also looking back to the older dsp techniques for ideas - he's very clever to get so much out of that cpu.
So it begs the question: what could be achieved with a 200MHz 32 bit P2 with cordic engine and dsp instructions? The signal processing breaks down into a number of stages - maybe around six - for which the COGs would seem to be a natural fit, data being handshaked between the stages in a pipeline. Probably an external A/D would still be needed for receiver input because of very low noise requirement, but the P2s' internal converters could be used for mic input, speaker output and possibly transmit output (the latter then being mixed up to the required RF band by other circuitry). DSP sample rates might improve on the 16ksample/s that my pal achieved. 32bit arithmetic with 64 bit intermediate results would be much easier to manage. There might be enough processing time spare to do a spectrum display - always good for searching for signals.
Most of the DSP primitives would be assembly code for speed. Gluing the primitives together to make the complete radio - TAQOZ every time - forth is great for interactive prototyping. To provide stimuli and measure responses, the programming language LabView for Windows is fantastic and the Community edition is free for non-commercial use. It suits hardware engineers very well; like forth you get quick results. High speed serial link(s) could link P2 and PC during development to exercise the dsp in real-time. For a standalone radio, then there are plenty of small displays we know the propeller can drive and would make good front panels.
The latest SD radios use very high speed A/Ds to convert the whole of the short wave band to a digital stream which is then processed by a large fpga. Both these devices are power hogs, expensive and a dog to develop - the fpga tools are very slow. Thus this technique is not the best for portable narrowband equipment, but great for high speed data and the intelligence community!
If we stick to the lower speed sdr technique, the P2 is more simple to program than most cpus and well suited for battery operation, 2-3W at most. Not forgetting it would save quite a few other parts too.
I'm looking forward to see how much 'radio' can be squeezed into the P2 - I'm sure it's more than people think. Any other radio heads - have you been thinking about this too?
Cheers, Bob G4BBY
Over the past 20 years, the amateur radio scene has seen a disruptive change in hardware design brought about by software defined radio. Hams pioneered it in fact. Here's an article that introduced it to many of us https://sites.google.com/site/thesdrinstitute/A-Software-Defined-Radio-for-the-Masses. No more superheterodyne circuitry.
The sdr principle is simple: On receive a small part of the radio spectrum (say 3.7-3.72 MHz) is mixed with an RF oscillator (say at 3.7MHz). After low-pass filtering the difference signal spans 0-8kHz or maybe as high as 0-192kHz. Actually two signals are produced, 90deg out of phase with each other using a Tayloe detector (Dan Tayloe perfected it). These two signals are digitised by a cheap stereo, low-noise 24bit sound card chip into two streams of integers, usually labelled I (in phase) and Q (quadrature).
From there on, the signal path is all digital until the sound output dac, amplifier and speaker. The platforms that hams have used range from multicore PCs, through RasPIs down to DSPIC33 chips. A pal of mine in northern England has developed an HF radio with the DSPIC33 - he can be heard most weeks chatting with other hams all over Europe. He gets good signal quality reports. He's also a keen cave explorer, and has developed 'cave radios' for surface-to-ground communications and for cave surveying. (VLF signals penetrate rock to a degree). He runs his radios at between 8k - 16ksamples/second - adequate for the 3kHz wide signals used on-air.
He's programmed the dsp on a 70MHz dspic33 in C, on a sample-by-sample basis - he doesn't process arrays of samples due to compute/memory constraints. The DSPIC is used to control the radio hardware as well - tuning, band change etc. He's recently switched to using a dual-core DSPIC and split the tasks. One core does the hardware control and the other the DSP. He still complains that the cpus are only 16 bit and that requires all sorts of maths tricks to avoid signal degradation. Quite a lot of original work and also looking back to the older dsp techniques for ideas - he's very clever to get so much out of that cpu.
So it begs the question: what could be achieved with a 200MHz 32 bit P2 with cordic engine and dsp instructions? The signal processing breaks down into a number of stages - maybe around six - for which the COGs would seem to be a natural fit, data being handshaked between the stages in a pipeline. Probably an external A/D would still be needed for receiver input because of very low noise requirement, but the P2s' internal converters could be used for mic input, speaker output and possibly transmit output (the latter then being mixed up to the required RF band by other circuitry). DSP sample rates might improve on the 16ksample/s that my pal achieved. 32bit arithmetic with 64 bit intermediate results would be much easier to manage. There might be enough processing time spare to do a spectrum display - always good for searching for signals.
Most of the DSP primitives would be assembly code for speed. Gluing the primitives together to make the complete radio - TAQOZ every time - forth is great for interactive prototyping. To provide stimuli and measure responses, the programming language LabView for Windows is fantastic and the Community edition is free for non-commercial use. It suits hardware engineers very well; like forth you get quick results. High speed serial link(s) could link P2 and PC during development to exercise the dsp in real-time. For a standalone radio, then there are plenty of small displays we know the propeller can drive and would make good front panels.
The latest SD radios use very high speed A/Ds to convert the whole of the short wave band to a digital stream which is then processed by a large fpga. Both these devices are power hogs, expensive and a dog to develop - the fpga tools are very slow. Thus this technique is not the best for portable narrowband equipment, but great for high speed data and the intelligence community!
If we stick to the lower speed sdr technique, the P2 is more simple to program than most cpus and well suited for battery operation, 2-3W at most. Not forgetting it would save quite a few other parts too.
I'm looking forward to see how much 'radio' can be squeezed into the P2 - I'm sure it's more than people think. Any other radio heads - have you been thinking about this too?
Cheers, Bob G4BBY
Comments
This is AM modulation, sounds like, right?
Wonder if you could generate the 3.7 MHz with the Prop2... Well, I'm sure you could, but maybe have to pick a special crystal frequency...
ADC with 20 kHz bandwidth sounds like should work...
But, googling, I found this right away:
https://www.analog.com/media/en/training-seminars/design-handbooks/Software-Defined-Radio-for-Engineers-2018/SDR4Engineers.pdf
It talks about this ADF7030 chip. Looks like it does everything for $5.
Hard to beat that...
Looks like you could control it over SPI with your P2 though...
Actually, Mouser doesn't seem to have anything that works at 3.7 MHz...
Can the RTL-SDR work for this?
This one appears like it could: https://www.amazon.com/RTL-SDR-Blog-RTL2832U-Software-Defined/dp/B011HVUEME
Would be a LOT easier, if it did...
https://www.rtl-sdr.com/rtl-sdr-direct-sampling-mode/
My pal has developed code to receive and transmit AM, FM, single sideband, double sideband, CW and so on. That's 'just' down to the maths.
I like the ACORN - the 50W output is a plus, many such radios are low power only. Great for portable. The chinese transceivers are difficult competitiion even if some of them have poor signal purity.
I use a Multus Proficio, bought as semi-populated pcb together with a 7" chinese tablet - see www.qrz.com and type in my callsign G4BBY. It only delivers 6W, so I added a 100W amplifier kit.
The P2 should be able to generate a fairly clean RF signal directly.* The streamer has Goertel mode which was intended to output sine waves and observe the output. There is also the chroma modulator, which is a quadrature modulator. Both of these operate like a Direct Digital Synthesizer.
*This assumes you don't use the P2 PLL. It's phase noise is not quite good enough. The P2 could use a 80-125MHz crystal oscillator to keep the phase noise under control. Unfortunately, the reduced clock rate will reduce ADC performance. But we can run 4 ADCs in parallel. There is a chance that the DAC is not up to serious RF usage.
Goertzel mode has a slight annoyance for RF receiver usage: the sampling rate will be based on a whole number of cycles of the digital oscillator. So when changing tuning frequencies, the sample rate will change as well.
As regards P2 a/d for receiver input - it does need trying, not to dismiss it before that. I wonder whether the P2s' extensive power supply system and decoupling network helps tame noise levels enough?
Running ADCs in parallel - my chinese oscilloscope does that to achieve higher performance at lower cost.
Most multiband HF sdrs cover the range 2-30MHz. Some to 60MHz.
I remember when PC based sdrs first appeared, that the mother-board sound chips were noticeably noisier than the high end sound cards from M-Audio and the like. Better than 16 bit, with the lowest s/n spec and highest dynamic range were the norm for best HF sdr, before high speed direct sampling took over.
Haven't done much with it lately...couldn't nail down the cause of the staircase effect on some signals.
I'd love to see what can be done in HF though...every once in awhile I'll lurk on HF to see what parts of the world I can pick up here.
LMS8001 is a single chip up/down RF frequency shifter with continuous coverage up to 10 GHz.
https://limemicro.com/technology/lms8001/
Specs:https://limemicro.com/technology/
It also has a companion board:
LMS8001 Companion
The LMS8001 Companion board provides a highly integrated, highly configurable, four-channel frequency shifter platform, utilising the LMS8001A integrated circuit.
https://www.crowdsupply.com/lime-micro/limesdr-mini/updates/lms8001-companion-extends-coverage-to-10-ghz
1. Lower power for battery operation - run the cpu at the lowest clock speed the application needs. Avoid unnecessary activity within the chip
2. Control of SPI devices such as RF oscillators, low noise A/D by SMARTPIN - no driver chip needed and reduces code size and retains full bus speed
3. Measurement of V or I within radio requires no extra a/ds - for the transmitter o/p power meter
4. Rotary user controls can be read - no extra parts and little software needed to read shaft encoders or analogue voltage thanks to SMARTPIN
5. No external decoders for bandswitch selection needed - a massive 64 I/O pins to use!
6. Saving radio settings - very good flash memory support
7. Saving and playing signal recordings, station frequency lists, providing bespoke radio applications, upgrades, bug fixes - very good SD card support
8. Microphone input - no external a/d needed
9. Speaker output - no external d/a needed
10. Choice of sample-by-sample or 2^n buffer by buffer dsp - no need for external RAM, internal 512kbyte hub ram means there is enough room for either
11. Front panel display - no need for driver chips, good hardware / software support of many display types from 2 line LCD to hdmi
12. P.C. link - USB / serial support from SMARTPIN to enable PC control, data transfer
13. The above chip savings means a smaller pcb is required, so good for handheld devices
14. Easy expansion - Two or more P2s can communicate over high speed serial link to share the radio processing or provide data decoding - SMARTPIN makes this trivial
15. The cordic solver and other dsp related instructions bring fft and other time consuming dsp at audio bandwidths within reach on a low cost, low power platform
16. Faster dsp - use of N multiple COGS linked in a signal 'pipeline' greatly reduces the time taken to process a signal buffer as compared with single core platforms like dspic33
The P2 still reduces the parts count overall and promises some serious signal processing.
The amplifier is driven by a frequency modulated signal and powered from an amplitude modulated source, which 'magically' combine to produce SSB. This forum is developing a small HF transceiver using that principle, albeit with a very poor choice of processor ;-)
Why mention this in a P2 forum? The microphone input is passed through a hilbert transform to produce in-phase and quadrature signal streams. These are converted to polar format ( amplitude / phase ) : The amplitude signal then varies the power amplifier supply voltage. The phase signal is used to vary the frequency of an RF oscillator, most suitable are direct digital synthesis devices which are capable of glitch free, rapid frequency changes but it's possible things like SI5351 would do. So the P2 cartesian to polar instruction in the cordic solver would speed the signal path significantly and the PWM feature of a smartpin would be useful for the power supply drive signal. Makes yer fink, dunnit?
Actually, there's a method that works better than the Hilbert transform for producing quadrature signal components. I describe it here:
https://forums.parallax.com/discussion/comment/1020855/#Comment_1020855
It's based upon this paper by Clay S. Turner:
https://www.iro.umontreal.ca/~mignotte/IFT3205/Documents/TipsAndTricks/AnEfficientAnalyticSignalGenerator.pdf
-Phil
I'd be a buyer. Sounds like a great project and involves two of my favorite things, radios and Propellers! Please keep us informed, I'd love to follow along.
http://forums.parallax.com/discussion/171196/pnut-spin2-latest-version-v34u-debugger-improved/p1
It has
ADC_to_VGA_millivolts.spin2
VGA_1280x1024_text_160x85.spin2
vga_text_demo.spin2
Plus others
When I eventually get a P2, I'd like to use multiple cogs, arranged in a signal chain, each doing a part of the dsp needed for software radio. I can see that the cordic engine would be useful in many parts of the signal path.
1. Can the cordic engine be shared by cogs?
2. Can cordic instructions from different cogs be interleaved?
3. Is the result automatically routed back to the caller as if it alone were using the engine?
4. How can the program sense the % utilisation of the engine?
Cheers, Bob
1 & 2. From my understanding, a new cordic instruction can be issued on every clock. However, as each cog really only gets a slot every 8 clocks, a cog can only issue a cordic instruction every 8 clocks, although the other cogs can intersperse cordic instructions between these.
3. yes
4. only be software but no need to do this
The P1 wasn't fast enough to receive WWV, but what about the P2.
I would love to be able to calibrate timers such as the mechanical stopwatch I just got.
As Cluso already said each cog gets access to the CORDIC every 8th clock cycle. So it doesn't matter how many other cogs use it. It won't slow down computation.
You only have to be careful if you interleave CORDIC commands on the same cog, I mean if you start another computation before the first (of the same cog) is finished. This is possible but you have to disable interrupts to avoid loosing results due to pipeline "traffic jam".
Might be possible but I don't know how. But theoretically utilisation could be analyzed at compile time.
Other basic thoughts - the P2 is specced at 180MHz but the prototype version C has been overclocked by several folks to 300MHz. Assuming the incoming I-Q signal stream to be 96 ksamples/s max, the number of (typical) 2-clock instructions per I-Q sample pair would be 937 instructions @180MHz and 1562 instructions when pushed to 300MHz.
With (say) a 2048 long buffer, each cog has to perform what it needs to do in 1920 kilo instructions @ 180MHz clock to 3200 kilo instructions @ 300MHz. Some initial tests in performing an FFT on an I-Q buffer would give some idea as to whether the P2 is up to the type of sdr type I know most about, which typically runs on a PC. This sounds like a tall order - but sdrs running on PCs are typically using 10 - 15% of the cpu resource to avoid gaps in reception due to Windows not being a real-time OS. However, the P2 can be run at very nearly full load if no interrupts are used in the signal path. There are plenty of examples of working sdrs on DSpic33 processors - these are less powerful than the P2, so the P2 is likely to be up to the task and then some I suspect.
Because the complete application may end up pushing the P2 pretty hard, assume it will get warm. Choose a development board with higher power regulators, good decoupling and a decent 'heatplane'.
Sample FFT code, not fully optimized:
https://forums.parallax.com/discussion/170948/1024-point-fft-in-79-longs
How to stream audio data back and forth between LabView on a Windows PC and TAQOZ on the P2? Well, the terminal serial port 'only' runs at 921600 baud, but that's not fast enough for realtime.
In the 'good old days' experimenters used to use the 8 bit 'parallel printer port' to drive external logic, it was quite simple to do. Nowadays, Windows tends to prohibit such shenanigans.
I had bought a PCIe parallel port card a while ago, but my favourite language Labview 2020 no longer supports reading and writing to the bare parallel port. So I wrote my own LabView interface:-
The two vis I wrote are attached, together with a demo that just toggles the outputs slowly. LabView 2020 Community edition is free from National Instruments for non-commercial use - the quickest way to write a measurement and control program I know - and it's not crippled in any way.
Postscript: LabView isn't fast enough. By bit-banging, I could toggle an output at around 30kHz at best, so not quite up to audio sample rate speed. Nevertheless, it might prove useful sometime. Instead of handing off bytes to the dll, passing a pointer to a buffer of data would probably do it, the dll ( modified to handle arrays)
being written in a faster language. The source code is available.
The subject of scheduling of tasks in a radio is a pretty fundamental issue. In the past, with a 1 or 2 core processor it was pretty easy to partition the work. Now the P2 (and other micros) with more than 2 cores make the best approach less obvious.
A software radio processes a stream of IQ data from the receiver front end or the microphone (or it's equivalent). A sample of IQ data must be processed every X uS or a buffer of IQ data must be processed every Y mS My friend Ron G4GXO, with a dual core DSPIC33, fixed the allocation of radio tasks. Core 1 did all the signal processing, core 2 did all the non-signal processing. This was so that Core 1 completed the signal processing in time. If it did not, the radio would be useless. So - a radio is a mixture of tasks that must complete in a timeframe and tasks that don't have to complete in that timeframe, but must complete with varying levels of urgency.
In using an 8 core processor like the P2, the same approach as with the DSPIC33 could be taken, a fixed task allocation scheme. N cores must complete the signal path chores in time for the next 'drum beat', 8-N cores cycle round a number of non-real time tasks as fast as they can.
The question is: Would a dynamic (but keep it simple) scheduling mechanism be 'better' than static allocation of tasks? I read a little about scheduling in this paper
Scheduling of tasks perhaps is not the key issue: What really matters with a radio is the scheduling of data and when it is ready for the next step. The computation of the signal through a number of stages in the right order in the small space that is the on-board ram.
I like using Taqoz - it's good for experimentation. I wonder idly if all N 'free' cores could be made to check a task table and, like a well organised team, pick up, execute and tick off the tasks in a self-balancing fashion, all handling both time critical and non-critical tasks? If this allowed the user to continue to interact with Taqoz too, that would allow the radio operation to be inspected as it was running.
Multiple cores sharing a common set of data acting together to process a time-critical signal path implies:-
Reading the Propeller 2 Documentation, both of the above may be helped by using the up to 16 Locks (sometimes called semaphores) described on page 67.
To economise on the limited number of Locks available, if there are many shared variables, then these can be grouped, each group being protected by one Lock. Alternatively a section of code could be locked, so that a common resource was assigned to one cog until released.
Regular synchronism of cores, could be achieved by setting required bits in a 'Rendezvous' byte for the cores required to sync. Each bit of the byte would signal that the corresponding core had reached the synchronising 'rendezvous' in it's program. (core 0 resets bit 0 etc.) Once the byte was all zero, then all cores would set off running their programs - a 'starting gun' or 'drum beat' kind of signal. This is a feature I have pinched from the LabView language, in which multiloop programs are very easy to write, but where needed the rendezvous feature keeps 'em in sync.
Taqoz does have COGATN and POLLATN, but doesn't provide the Lock or WAITATN feature - but using the Assembler, they were easy to add
My sdr will have a direct conversion receiver with a tayloe detector that delivers an IQ analogue signal in the band 0Hz-~100kHz. After a LP filter, a cheap PCM1808 ( ebay has 1000s ) stereo codec outputs an IQ digital stream, 24 bits per sample. The PCM1808 requires upto 20 MHz clock which could be provided by a smartpin or one channel of an SI5351 clock generator. The pcm1808 data interface is I2S, maybe in master mode with a smartpin receiver with sample rate somewhere 16 - 100ks/s,. The stream will be stored in two buffers, which alternate, one for writing by the i2s receiver, one for reading by the receiver dsp. Once an array is full or maybe full enough, ATN flag(s) would be raised to start waiting cogs.
The cog handling the receiver input i2s may also have time to do microphone and speaker channels via smartpin a/d and pcm converters.
Thinking forward to writing dsp words in inline assembler within Taqoz, it's probably preferable to store I and Q samples in adjacent array addresses. Since indirect addressing is used to index arrays, this means the use of just increment or decrement to access an IQ sample. Array size also tbd. From experience with PC based sdr, the longer the array, the lower the cpu load. However hub ram is limited, so that will limit the maximum array size and there will be much use of [A]=function([A],[B]) to economise on space. If (say) 1024 IQ samples are stored per array, then each is 8kbytes in size.
Not sure what use Taqoz makes of LUT RAM. If this is unused, it's useful extra space and setup for pipeline between cogs. LUT is too small for IQ arrays would be useful for local variables, or constants like filter responses. As LUT ram totals 4k longs across 8 cogs and it's faster than hub ram, it's a good resource to tap into.
A timing plan will be essential to divide the signal processing up between the minimum number of cogs, the task being a little like using a project management tool to minimise project time. The tool predicts when each task has to start and all the dependencies feed into the critical path to meet the end time. ( a little before the next array of codec samples is ready ) Another analogy is those patterns of upright dominoes that you used to set up and knock down - it was all self-sequencing.