The Art of Recording the Right Amount of Data
The ability to record data and process the data offline is used in all engineering fields. Depending on the field the amount of data can vary greatly. When signals in the electromagnetic spectrum need to be recorded the amount of data is dependent on the bandwidth of the signal. As signal bandwidths become wider and wider more and more data need to be stored, transferred, and analyzed. This article focuses on applications such as COMINT, ELINT and SATCOM, which all produce a high amount of raw data, and on how to avoid getting unnecessarily large data files which make further offline processing challenging.
More data than ever is being produced. Most produced data will never be used.
– By Anders Svensson, Chief Technical Officer
Large Data Files
To acquire and work with large data files can be separated into three different stages:
- Record. Recording takes place when the acquired data is saved to disk. The disks must be fast enough, large enough and the whole chain from acquisition to disks must meet the requirements.
- Offload Data. If the data is not or cannot be processed locally where it first is stored, the data must be transferred to some other system. This can be done in many ways but all of them come down to two different general paths. Either the data is physically moved to another system, e.g., by using a USB-drive or removing the disks and transferring them to the other system or the data is being transferred electronically, e.g., sent over the internet.
- Analyze Data. The raw data has limited value until it is analyzed. All data that was recorded must therefore be processed by a CPU, GPU, FPGA, or some other kind of processing mechanism.
Example – Event Horizon Telescope
An extreme example of this is the Event Horizon Telescope that produced the data for the famous first picture of a black hole. When the massive amount of data, 5 peta bytes, had been acquired the fastest way to transfer all the data was to physically ship the hard drives to process the data. For some applications that works but it depends on many factors such as available technology, budget, timeline etc.
Handle Large Data Files
Sample and Data Rates
The Nyquist theorem states that a signal must be acquired at a rate of at least twice the bandwidth, but it is desired to have a higher sample rate due to anti-alias filter roll-off and other parameters. In signal processing and RF it is common to work with IQ-data (In-phase and Quadrature data). The explanation of IQ-data can be found in both literature and online. Each sample in time is a pair of I and Q. A common oversampling factor is IQ rate=1.25*bandwidth which gives enough room for the antialiasing filter. A modern vector signal analyzer can have bandwidths of 800 MHz and beyond and with the oversampling just mentioned, the IQ-rate is 1 GHz. With 16-bit I and 16-bit Q the data rate becomes 4GByte/s. In RF monitoring applications the systems often need to monitor the spectrum for long periods of time and 24/7 operation is not uncommon. A data rate of 4 GB/s produces 345.6 TB of data per day which requires a lot of storage space and the storage space can also be quite expensive since it must be able to have high write speeds. One way around this is to only store the last hours of data and continuously delete the oldest recorded data. Storage devices do however have a lifespan and a measure of that is TBW, Terabytes Written. Consumer grade SSDs have a TBW up to 600 TBW for a 1 TB model. With the data rate mentioned above, 600 TBW would give less than 42 hours of recording time on a single device, which is not much for 24/7 operation. By increasing the number of drives or buying industry grade disk units with a TBW up to 2000 TBW for a 1 TB model, the time to failure improves but also the cost. Thus, recording all data is not always an option. Something else to consider is how to process large amount of data to be able to extract the needed information and where to do it. When the data has been stored and needs to be processed other bottlenecks become apparent. It is often a little bit faster to read data from the storage media than writing to it, but we are always limited by the slowest link in the system. Even if you have a RAID array with multiple NVMe drives that theoretically would deliver 72 GB/s just based on the number of drives and read/write speed, the data backplane, e.g., a PCIe bus, will limit the transfer rate. This bottleneck will be there no matter if the data is to be sent to the CPU for processing or sent to an external device like a back-end server. It does not make that much of a difference if you have the largest computational cluster in the world if you cannot send data to it fast enough. Because of this, it is important to find ways to only record the data that is needed. By doing that we will be able to reduce storage space, transfer time as well as processing time. The results are shorter time from recording to result and reduced cost.
Solutions
Data Reduction Techniques
- Reduce record time
- Reduce signal bandwidth
- Reduce signal bit width
- Compress data
- Analyze and demodulate data
Reduce Record Time
Recording the signal of interest only during the time it is active is an effective way to reduce the data size if there is much time without a signal present. Some applications require very short bursts of recording, radar pulses is an example. The system retriggering time then becomes an important factor, i.e., how fast it can start a new recording after the previous one has been finished. Automatically triggered recordings can suffer from that the event leading up to the activated trigger is not included because the signal had not reached the threshold level yet. It is therefore important to buffer the incoming data so when the trigger occurs, to also include pre-trigger samples in the recorded data.
Reduce Signal Bandwidth
When searching for unknown signals a wider bandwidth is often used. Instead of capturing the whole bandwidth the signal of interest can be extracted using digital signal processing. If the band being monitored is 765 MHz wide but the signal of interest is only 5 MHz wide, the data size can be reduced by 153 times by just recording the frequency band that is of interest. Extracting a narrowband signal from a wideband input can be accomplished with a Digital
Downconverter, DDC also known as digital drop receiver, DDR. More advanced systems have multiple DDCs that can be used independently to capture multiple narrowband signals simultaneously.
Reduce Signal Bit Width
If the signal does not have a high dynamic range, it could be possible to reduce the number of bits and still represent the signal sufficiently. By going from 16 bits to 8 bits the required storage space is halved. It is possible to have bit widths that are not a multiple of a byte as well, but it makes it more complicated to read and write the data. An example of how this can be done is with 12-bit data where four 12-bit samples is stored as three 16-bit samples. This method is called bit packing. Usually this is not preferred since it complicates the reading process. Even if the data is stored in a binary format, most CPU based file functions operate with bytes.Bit packing makes it necessary to unpack the 12-bit data and put it in a 16-bit number format, which can take time.
Compress Data
Data compression is something many are familiar with, e.g., when working with files and combining them into a zip-file. Compressing acquired data from the RF spectrum is hard to do since the compression most of the times must be lossless so that the original signal can be reconstructed. A truly random signal cannot be compressed. If there are statistical properties of the signal that can be used, the signal can be compressed to some degree but doing that statistical analysis in real-time is difficult and it is not a common practice to compress the data.
Analyze and Demodulate Data
A possible third step in the recording process is analyzing the data and only store the analysis results. By analyzing the signal much of the overhead required to send the actual message can be discarded. The earlier in the system the signal can be analyzed the better from a data storage and transfer perspective. One example of this is demodulating audio transmissions where it might be enough to have 16-bit audio sampled at 8kHz instead of a 50 kHz wide IQ signal. The best option is if the signal can be analyzed in real-time, that way only the result need to be stored instead of the entire signal. The second best option is to analyze the signal after it has been stored locally but before it is transferred to another system. Even if the storage space temporarily requires accommodating the original signal, once the analysis is done and the results are transferred the original file can be deleted, thus saving space. Further, transferring only the results reduces the required transfer rate.
Automatic Signal Detection
The above mentioned technologies can all be used manually by letting the operator configure the bandwidth and then start and stop the recording. A better way is to have the recording system automatically detect and record the signals of interest.
To determine when and for how long duration the signal is active or how much bandwidth it occupies, different kinds of signal detection algorithms can be used. Most general methods rely on detecting if there is an increase in energy. Two common methods are a frequency mask trigger and an analog edge trigger. The analog edge detection measures the signal level or power of the entire band and triggers when it goes above a threshold. Even if it is called analog edge, it is often done digitally. With this method it is possible to detect very short signals and get a very accurate trigger time for when the signal started. One downside is that all frequencies in the band will be triggered on, so if there are multiple signals in the band it may trigger on the wrong signal. The frequency mask trigger is built on Fast Fourier Transforms, FFT. The FFT divides the spectrum into many frequency bins and the signal level in each bin can be compared to a threshold. An FFT is calculated on a certain number of samples, e.g., 1024. A longer FFT gives better frequency resolution but requires more samples and thus has worse time resolution. For triggered applications, even if not all data is saved, all data must be processed in real-time to be able to detect if a trigger condition has occurred. As mentioned before, when the bandwidths become wider the amount of data increases which puts more strain on the triggering task. Processing 4GB/s in a deterministic way on a CPU is challenging which is why the task of triggering is typically handed off to hardware like an FPGA which continuously will be able to process all data being sent to it. A common measure of FFT-based systems is Probability of Intercept, POI. 100% Probability Of Intercept is a measure for how long time a signal must be active so that the system is guaranteed to detect it at full amplitude with 100% certainty. For this to happen, the signal must be present during at least one entire FFT and by changing the parameters sample rate, FFT-length and FFT overlap the POI number can be tailored to the application.
Overlap is the percentage the FFT samples overlap. An example where a signal is sampled with a sample rate of 1GS/s and 1024 point FFT that overlaps by 50% gives:
ODEN
This article has provided some examples of the different challenges regarding data recording. Which method or combination of methods that would be most effective for reducing data is dependent on your application and your requirements. The two most effective methods of reducing the data required for offline processing is by limiting the data in time bandbandwidth.
An example of system capable doing this is ODEN, which is an intelligent recorder system that utilizes several of the technologies in a new innovative way. With up to 765 MHz of instantaneous bandwidth it covers a wide span and can capture very wide signals. Thanks to its built in functionality the size of the recorded signals is reduced as much as possible without jeopardizing data quality.
It has a frequency mask trigger with up to 32768 points, making it possible to create very fine masks to trigger only on the frequencies that you really want to trigger on. The frequency trigger mask is used to automatically initiate recordings, both wideband and narrowband signals. The frequency band can be divided into multiple sub-bands with its own recording rules that define what happens when the frequency mask is triggered. The system could then either record the full instantaneous bandwidth, the sub-band, or individual narrowband signals. The latter is made possible with Novator Solutions channelization technology which enables ODEN to automatically extract up to 128 individual narrowband signals simultaneously. ODEN automatically calculates the frequency of each signal and begins the recording. When the stop trigger condition is met, the recording automatically stops. Thanks to this innovative technology it is possible to reduce the signal size both in time and bandwidth to a new level.
ODEN is also equipped with a powerful embedded computer that can perform all analysis tasks you would ordinarily do on your CPU. By analyzing the data directly on the recorder, it can drastically reduce the amount of data that need to be sent over networks or offloaded to external drives. ODEN is prepared to let you add your own custom signal post processing to extract the parameters that are of interest to you before transferring the data.
Download the article or learn more about ODEN and its data reduction features at:
www.novatorsolutions.com/oden