LongPulseExtensions - MdsWiki
Personal tools

From MdsWiki

Jump to: navigation, search


Segmented Records

Segmented records were added to MDSplus to support the ability to append data to an MDSplus signal node and to retrieve subsampled data without retrieving the entire signal from the MDSplus datafile. This feature is particularly useful for handling data that is collected over a period of time such as long pulse experiments or storage of trend data. With segmented records, data can be stored in linked blocks. Blocks of data can be added to this list incrementally. Functions are provided to determine the number of segments/blocks of data stored in a node and the start and end times of each block of data. These segments can be retrieved individually so it is no longer necessary to retrieve the entire data set for a node. Special handling of segmented records have been added to the expression evaluator to enable you to specify a time region of interest and optionally a sub-sampling delta time to use when retrieving data from nodes containing segmented records.

Storing Segmented Records

There are currently two types of segmented records. The first type is designed for storing data from transient recorders where many time slices are buffered by the recorder. Using a double buffering scheme a buffer of data can be stored in a segment while the next buffer is being recorded. Each segment consists of a data block and a dimension description describing the time base for the segment. The second type is designed for measurements that are obtained one time stamp at a time. This type of data is stored one row at a time and a segment is completed when a application specific number of rows have been recorded. Each row consists of a set of data values and a corresponding time stamp. In both types of segmented records, each segment is indexed by a start time and end time. The time stamps are stored as a 64-bit unsigned integer value.

The process of storing segmented data is essentially stateless in the sense that it would be possible for a program to begin storing segmented data and then exit and then another program could be run to continue writing data to the same node.

Support Routines for Writing Segmented Records

There are several routines found in the TreeShr library to support storage and retrieval of segmented records. In addition, TDI (MDSplus expression evaluator) functions have been written to simplify the calling of these TreeShr routines. The following table lists the TreeShr routines for storing segmented data.

TreeShr Routines
Routine Format Description
TreeBeginSegment int status=TreeBeginSegment(int nid, struct descriptor *start, struct descriptor *end, struct descriptor *dim, struct descriptor_a *initialData, int idx) Begin a new segment
TreePutSegment int status=TreePutSegment(int nid, int rowidx, struct descriptor_a *data) Put data into segment
TreeUpdateSegment int status=TreeUpdateSegment(int nid, struct descriptor *start, struct descriptor *end, struct descriptor *dim, int idx) Update the start, end and dimension info for a segment
TreeBeginTimestampedSegment int status=TreeBeginTimestampedSegment(int nid, struct descriptor_a *initialValue, int idx) Begin timestamped data segment
TreePutTimestampedSegment int status=TreePutTimestampedSegment(int nid, _int64 *timestamp, struct descriptor_a *rowdata) Store row of timestamped data
TreePutRow int status=TreePutRow(int nid, int bufsize, _int64 *timestamp, struct descriptor_a *rowdata) Simple timestamped row appender

There are matching TDI helper functions for most of these entry points.

TDI Functions
Routine Format Description
BeginSegment status=BeginSegment(node, start, end, dim, initialData [, idx]) Begin a new segment
PutSegment status=PutSegment(node, rowidx, data) Put data into segment
BeginTimestampedSegment status=BeginTimestampedSegment(node, initialValue [, idx]) Begin timestamped data segment
PutTimestampedSegment status=PutTimestampedSegment(node, timestamp_64, rowdata) Store row of timestamped data
PutRow status=PutRow(node, bufsize, timestamp_64, rowdata) Simple timestamped row appender

Writing Normal Segments

When writing "normal segments" (using the "BeginSegment","PutSegment" routines) you must first call BeginSegment to reserve space for the segment, to initialize the data and to define the start time, end time and dimension definition for the segment. Normally you would call this with idx=-1 which will continue to append segments each time this routine is called. If you need to go back and overwrite segments you can specify an idx greated than -1 to indicate which segment you are overwriting. Idx values greater than or equal to the total number of segments stored for this node will be rejected. The initialData array will specify the data type of the values for all segments. The initialData argument also sets the shape of the data array in the segment. The last dimension in the array specifies the number of "rows" in the segment. For example if initialData was an array[100,200,300] then the segment would contain 300 rows where each row is 100x200. Subsequent segments may contain more rows but the shape of each row must match the first segment. If the initialData was a single dimension array then the number of rows would be the size of the array and each row would consist of a scalar value.

Once the segment has been initialized with the BeginSegment call, data can be added using the PutSegment call. The rowidx argument specifies the offset in the segment to begin storing the data. Use -1 for the rowidx to store rows sequentially. The data array must match in data type and shape as was specified in the BeginSegment call. If, for example, BeginSegment was called with initialData being a 32-bit integer array of [100,200,300] then PutSegment could only be called with a data array which is a 32-bit integer array of either [100,200] or [100,200,n]. The latter case will store multiple rows in one call. An error will be returned if the segment was already full. If the segment was full but the data contains more rows than will fit in the segment, the data will be truncated to the number of rows that will fit.

IDL> mdsopen,'mytree',42
IDL> status=mdsvalue('BeginSegment(mynode, $1, $2, make_dim(*,$1 : $2 : $3),$4)',.2,10.,.2,fltarr(100,200,50))
IDL> for i=0,49 do status=mdsvalue('PutSegment(mynode,-1,$)',fltarr(100,200)+i)

Writing Timestamped Data

When writing "timestamped data" (using the "BeginTimestampedSegment","PutTimestampedSegment", or "PutRow" routines) rows are added one row at a time. Generally the simple PutRow routine is used for this type of data. This function calls the BeginTimestamedSegment and PutTimestampedSegment routine for you. The PutRow routine is called with a bufsize argument indicating the number of rows to store per segment. The first time PutRow is called to store data into a node a new segment is created reserving enough room to hold bufsize rows of the size and shape of the rowdata argument. Each row will be identified by a 64-bit timestamp specified by the timestamp_64 argument. Each row should have a timestamp which is greater than the previous row but this restriction is not enforced. This design decision was based on the premise that you would not want to stop recording data if there was a hardware glitch which caused a large timestamp to be stored.

Each row is appended to the previous rows until the segment fills. At that point a new segment is automatically created allocating bufsize rows for storage. The BeginTimestampedSegment or PutTimestampedSegment routines can be used if you want to add a complete segment of rows. Overwriting timestamped segments is not currently supported.

The following is a very simple example of storing timestamped data using IDL:

IDL> mdsopen,'mytree',shot
IDL> data=[1.,2.,3.]
IDL> for timestamp=1ul,10000ul do begin $
dummy=mdsvalue('PutRow(mynode,1000,$,$)',timestamp,data+(timestamp * 10))

In this simple example the timestamps are not really date/time but simply an index. Note: timestamps must be greater than zero.

The following example show that the data was indeed stored. It uses some of the retrieval routines described later below:

IDL> print,mdsvalue('GetNumSegments(mynode)')
IDL> print,mdsvalue('GetSegment(mynode,0)')
     11.0000      12.0000      13.0000
     21.0000      22.0000      23.0000
     31.0000      32.0000      33.0000
     41.0000      42.0000      43.0000
     51.0000      52.0000      53.0000
     9981.00      9982.00      9983.00
     9991.00      9992.00      9993.00
     10001.0      10002.0      10003.0
IDL> print,mdsvalue('dim_of(GetSegment(mynode,0))')
                    1                     2                     3                     4                     5                     6
                    7                     8                     9                    10                    11                    12
                   13                    14                    15                    16                    17                    18
                  985                   986                   987                   988                   989                   990
                  991                   992                   993                   994                   995                   996
                  997                   998                   999                  1000
IDL> dummy=mdsvalue('SetTimeContext(1qu,10qu)')
IDL> y=mdsvalue('mynode')
IDL> help,y
Y               FLOAT     = Array[3, 10]

Reading Data Stored In Segmented Records

Segmented Records contain blocks of data in the data file which are indexed by start and end times. There are TreeShr routines and matching TDI functions for accessing the index values and the individual segments of data so you can implement functions for retrieving the data from the segments as needed. There is also a built in feature which allows you to specify a region of interest by specifying a start time, end time and optionally a delta time. When nodes containing segmented records are referenced in TDI expressions, the data from these nodes are automatically sub-sampled using this region of interest settings. This enables you to use regular TDI expressions the same as you would for referencing any node in the tree but instead of retrieving the entire time history of the signal, MDSplus will only retrieve the segments containing data in this region of interest. This will greatly improve performance and reduce computer resource requirements when compared to retrieving the entire time history and then subscripting this using TDI subscripting. Default re-sampling procedures are provided but you can provide your own re-sampling procedures on a per node basis. (See the Overriding Re-Sampling Procedures section below.)

Functions for Reading Segmented Records

There are several TreeShr routines and matching TDI functions for accessing segmented records.

TreeShr Routines

The following table describes the routines available for accessing segmented data.

Routine Format Description
TreeSetTimeContext int status=TreeSetTimeContext( struct descriptor *start, struct descriptor *end, struct descriptor *delta) Set region of interest parameters
TreeGetNumSegments int status=TreeGetNumSegments(int nid, int *num) Get number of segments stored
TreeGetSegmentLimits int status=TreeGetSegmentLimits(int nid, int segidx, struct descriptor_xd *start, struct descriptor_xd *end) Get segment start and end time
TreeGetSegment int status=TreeGetSegment(int nid, int segidx, struct descriptor_xd *data, struct descriptor_xd *dim) Get segment data and dimension
TDI Functions

The following table lists the TDI functions available for reading segmented data.

Routine Format Description
SetTimeContext status=SetTimeContext( start, end, delta) Set region of interest parameters
GetNumSegments num=GetNumSegments(node) Get number of segments stored
GetSegmentLimits limits=GetSegmentLimits(node, segidx) Get segment start and end time
GetSegment signal=GetSegment(node, segidx) Get segment data and dimension

Examples to be added by Tom.

Default Re-sampling behavior

Once a region of interest has been defined by calling TreeSetTimeContext, subsequent calls to TreeGetRecord will return a (possibly subsampled) subset of data, starting at the start time and ending at the end time, where start and end time have been passed as arguments to TreeetTimeContext.
When a region of interest is defined, TreeGetRecord passes control to routine XTreeGetTimedRecord() defined in library xtreeshr, wich in turn reads the stored data segments, resamples each segment based on the start, end and delta parameters and then merges the resulting segments to produce the output signal. The improved efficiency in such an operation is due to the fact that data segments which lie outside the specified time range are not read at all. Therefore it is possible to achieve a very efficient data retrieval even for very large signals, provided they have been stored using an adequate number of segments. Suppose, for example, that you need to retrieve the subsampled data for a given time interval for a very large signal. Not using segments, it would be necessary to read first the whole signal, discarding then all data not lying within the specified interval. Using an adequate number of segments, the time and the memory resources required to retrieve data for the specified time interval depend only on the amount of data within the specified time interval, and not on the dimension of the whole signal.

Overriding Re-sampling Procedures

It is possible to override the default resampling by providing a new version of the resample function. By default, routine XTreeGetTimedRecord calls routine XTreeDefaultResample each time a data segment has to be resampled. It is possible to instruct MDSplus to call a different resampling routine for a given data item by developing a TDI function with the following arguments:
- IN signal, the current segment to be resampled, passed as a signal;
- IN start, the start time;
- IN end, the end time;
- IN delta, the delta time;
The TDI function will then return the resampled version of the passed signal.
To make MDSplus aware that this TDI function has to be invoked when retrieving resampled signals, it is necessary to call routine TreeSetXNci(int nid, char *xnciname, struct descriptor *value), where xnciname is set to "ResampleFun", and value is the name of the user provided resampling function (passed by descriptor).
In this way it is possible to integrate in MDSplus different resampling strategies such as interpolation, not provided by the default resampling implementation.