Documentation:Tutorial:LargeSignals - MdsWiki
Navigation
Personal tools

From MdsWiki

Jump to: navigation, search

Handling large signals in MDSplus

In the previous section we have seen how creating pulse files and filling them with data. Data may represent a variety of formats, from scalars to multidimensional arrays and complex types. In particular, we have seen how the "signal" data type is very useful to represent the time evolution of a given quantity. There are however some limitations in the signal usage:

  • The number of samples is a signal cannot exceed few billion (~4 GSamples) because the length of arrays handled in MDSplus is stored in a 32 variable
  • In practice the maximum number of manageable samples in a signal is even smaller because of the memory requirement and the long access time that will make a program very likely crash when accessing a very large signal, or in any case data access would take an unacceptable time.

MDSplus provides the concept of segmented data for handling large signals. When a signal is stored in segments, there is no limit in its dimension and data readout is efficiently managed. Basically, a signal is stored in a segmented node in chunks (aka segments). At any time it is possible to add a new chunk, that is, to enlarge the signal by adding new samples. This feature is useful for long lasting experiment because in this way, the signal samples acquired so far are accessible, even if the signal is still growing.

When reading a segmented node, the inner layers of MDSplus will stick segments together in order to return a signal data type composed of all the signal sample and the associated timebase. However, if the number of samples actually stored in the segmented node exceeds the maximum number of samples in a MDSplus array, signal readout will fail and, again, even if the total number of samples is less, the time and the required memory resources for reading the large signal would be unacceptable.

To overcome this limitation there are two possible solutions:

  • Read each segment using method getSegment(segmentNumber)
  • use method setTimeContext to read (a portion of) the (possibly resampled) signal.

Using the first solution, the signal corresponding to the given segment is returned. It is however necessary to handle portions of signal, possibly making the program more complicated.
Using the second solution, the program is the same as for traditional signals and methods getData() and data() will return the desired samples, leaving the inner data access layers of MDSplus handle the join of different segments and resampling. The definition of the region of interest (ROI) and of the resampling interval is carried out by Tree method:

setTimeContext(startTime, endTime, delta)

the arguments are optional. When startTime (endTime) is missing (i.e. defined as null in java, C++ and as None in python) no start time (end time) is defined in the ROI. When delta is missing, no resampling is done.

method setTimeContext() is a global, that is, all subsequent readouts of segmented nodes (even when they are referred in an expression being evaluated) will use the defined ROI.
In order to reset the ROI, setTimingContext() will be called wit all the three mapameters defines as None (python) or null (C++, java).

It is recommended to always use setTimeContext() when handling large signals: MDSplus performs the required management of segments minimizing the use of memory resources. For example, useless segments, i.e. outside the ROI, are simply skipped when building the resulting signal, with a dramatic reduction in access time.

In the following C++ example a very large signal, composed of one billion samples and describing a signal acquired at 1 MHz for 1000 seconds (from time 0 to time 1000) is built and stored in field HUGE_SIGNAL of pulse file big_tree in segments of 1 million samples each.

#include <mdsobjects.h>
#include <iostream>
int main(int argc, char *argv[])
{
  try {
    //Open the model
    MDSplus::Tree *tree = new MDSplus::Tree("big_tree", -1);
    //Create shot 1
    tree->createPulse(1);
    delete tree;
    //Open shot 1
    tree = new MDSplus::Tree("big_tree", 1);
    
    //Get the node object
    MDSplus::TreeNode *signalNode = tree->getNode("HUGE_SIGNAL");
     
     //Build 1000 segments of 1MSamples each
    int count = 0;
    float *buf = new float[1000000]; 
    for(int segIdx = 0; segIdx < 1000; segIdx++)
    {
      std::cout << "Building segment" << segIdx << std::endl; 
      for(int i = 0; i < 1000000; i++)
      {
	buf[i] = sin(count/1000.);
	count++;
      }
      //Build the timebase using the Range datatype
      //The Range data type specifies start time, end timwe and time interval
      MDSplus::Data *startTime = new MDSplus::Float64(segIdx);
      MDSplus::Data *endTime = new MDSplus::Float64(segIdx+1);
      MDSplus::Data *delta = new MDSplus::Float64(1E-6);
      MDSplus::Data *segDimension = new MDSplus::Range(startTime, endTime, delta);
      
      //Build the segment data from the float buffer
      MDSplus::Array *segData = new MDSplus::Float32Array(buf, 1000000);

      signalNode->makeSegment(startTime, endTime, segDimension, segData);
       
      //Free stuff. NOTE startTime, endTiem and delta do not need to be deallocated 
      //since they have been passed to a Data constructor 
      MDSplus::deleteData(segDimension);
      MDSplus::deleteData(segData);
    }
  }catch(MDSplus::MdsException &exc)
  {
    std::cout << exc.what();
  }
  
  return 0;
}

In the following example, the whole signal is read in a python program, resampled at 10 kHz:

>>> from MDSplus import *
>>> t = Tree('big_tree',1)
>>> Tree.setTimeContext(None, None, 1E-4)
>>> n= t.getNode('HUGE_SIGNAL')
>>> sig = n.data()
>>> sig
array([ 0.09983341,  0.19866933,  0.29552022, ..., -0.61119074,
       -0.52912086, -0.44176418], dtype=float32)
>>> time=n.getDimensionAt(0).data()
>>> time
array([  1.00000000e-04,   2.00000000e-04,   3.00000000e-04, ...,
         9.99999700e+02,   9.99999800e+02,   9.99999900e+02])

In the following code snippet, a time window between times 0.5 and 0.5001 is read, with no resampling

>>> t.setTimeContext(0.5,0.50001,None)
>>> sig1=n.data()
>>> time1=n.getDimensionAt(0).data()
>>> sig1
array([-0.4677718 , -0.46865541, -0.46953857, -0.47042125, -0.47130346,
      -0.47218519, -0.47306645, -0.47394723, -0.47482756, -0.47570738,
      -0.47658676], dtype=float32)
>>> time1
array([ 0.5     ,  0.500001,  0.500002,  0.500003,  0.500004,  0.500005,
        0.500006,  0.500007,  0.500008,  0.500009,  0.50001 ])
>>>

Finally, ROI is reset with the following command

>>> Tree.setTimeContext(None, None, None)

Further improving access time of resampled signals

We have see so far how using setTimeContext() to handle large signal readout. In particular, resampling is mandatory unless getting a very tiny portion of the signal in time. Improved resampling efficiency, with the consequent reduction of the data access times, is available in MDSplus by making a very small change in the method used when building large signals. In this case, it is necessary to reserve a new tree node that is going to contain a reampled version of the signal, built at the time signal segments are written. The following C++ code is almost the same as the previous example, except for the use of a new node (HUGE_RESAMP in the example) and method makeSegmentResampled() in place of makeSegment()

#include <mdsobjects.h>
#include <iostream>

int main(int argc, char *argv[])
{
  try {
    //Open the model
    MDSplus::Tree *tree = new MDSplus::Tree("big_tree", -1);
    //Create shot 1
    tree->createPulse(1);
    delete tree;
    //Open shot 1
    tree = new MDSplus::Tree("big_tree", 1);
    
    //Get the node object
    MDSplus::TreeNode *signalNode = tree->getNode("HUGE_SIGNAL");
    MDSplus::TreeNode *resampledNode = tree->getNode("HUGE_RESAMP");
    
    //Build 1000 segments of 1MSamples each
    int count = 0;
    float *buf = new float[1000000]; 
    for(int segIdx = 0; segIdx < 1000; segIdx++)
    {
      std::cout << "Building segment" << segIdx << std::endl; 
      for(int i = 0; i < 1000000; i++)
      {
	buf[i] = sin(count/1000.);
	count++;
      }
      //Build the timebase using the Range datatype
      //The Range data type specifies start time, end timwe and time interval
      MDSplus::Data *startTime = new MDSplus::Float64(segIdx);
      MDSplus::Data *endTime = new MDSplus::Float64(segIdx+1);
      MDSplus::Data *delta = new MDSplus::Float64(1E-6);
      MDSplus::Data *segDimension = new MDSplus::Range(startTime, endTime, delta);
      
      //Build the segment data from the float buffer
      MDSplus::Array *segData = new MDSplus::Float32Array(buf, 1000000);
      
      signalNode->makeSegmentResampled(startTime, endTime, segDimension, segData, resampledNode);
      
      //Free stuff. NOTE startTime, endTiem and delta do not need to be deallocated 
      //since they have been passed to a Data constructor 
      MDSplus::deleteData(segDimension);
      MDSplus::deleteData(segData);
    }
  }catch(MDSplus::MdsException &exc)
  {
    std::cout << exc.what();
  }
  
  return 0;
}

method makeSegmentResampled() requires and additional argument, i.e the node that is going to receive the resampled version of the signal. Nothing is changed in signal readout. MDSplus in fact keeps in the node metadata all the information that is needed to decide whether carring out on the fly resampling based on the original signal or its resampled version. The final result is a sensible reduction of access time when performing large signal resampling. Note that the node containing the resampled version of the signal is only for internal MDSplus usage.

Visualization of large signals using jScope

Dynamic resampling of large waveforms is automatically performed by jScope, when using the MDSDataProvider data source (the recommended one). When jScope is requested to visualize an entire signal, jScope finds out what is the required resampling factor (depending on the number of original signal samples) and asks the data provider (mdsip) server for a resampled version of the signal. When zooming a displayed waveform, resolution is possibly lost, and therefore jScope dynamically requests the data provider the signal for the ROI corresponding to the zoomed window. The user will likely experience a small delay just after zooming before getting the required signal resolution (depending on how fast is the mdsip server and the network connection). In order to avoid aliasing when carrying out signal resampling, jScope actually requests to the data server the minumum and maximum value of every resampling interval, rather than the base resampled value. This is all performed in a completely transparent manner, but, in order to avoid aliasing for large signals in case an additional tree node is used to keep a resampled version of the signal, the use of method makeSegmentMinMax() in place of makeSegmentResampled() is preferred. makeSegmentMinMax() will store in the support node all the information required to avoid aliasing in jScope.
Note that the use of either makeSegmentMinMax() or makeSegmentResampled() is not mandatory. However, for very large signals the time required for data access (and jScope visualization) can be widely reduced in this way.