Apply a function to a whole Raster Attribute Table (RAT), block by block, so as to avoid using large amounts of memory. Transparently takes care of the details of reading and writing columns from the RAT.
This was written in rough mimicry of the RIOS image applier functionality.
The most important components are the
rios.ratapplier.apply() function, and
rios.ratapplier.RatApplierControls class. Pretty much everything else is for internal
use only. The docstring for the
rios.ratapplier.apply() function gives a simple example
of its use.
In order to work through the RAT(s) block by block, we rely on having available routines to read/write only a part of the RAT. This is available with GDAL 1.11 or later. If this is not available, we fudge the same thing by reading/writing whole columns, i.e. the block size is the full length of the RAT. This last case is not efficient with memory, but at least provides the same functionality.
- class rios.ratapplier.BlockCollection(ratAssoc, state, allGdalHandles)¶
Hold a set of RatBlockAssociation objects, for all currently open RATs
Clear all caches
Called after the block loop completes, to reset the row count of each output RAT, in case it had been over-allocated.
In some raster formats, this will not reclaim space, but we still would like the row count to be correct.
- writeCache(outputRatHandleNameList, controls, state)¶
Write all cached data blocks
- class rios.ratapplier.GdalHandles(ratHandle, update=False, sharedDS=None)¶
Hang onto all the required GDAL objects relating to a given opened RAT. Attributes are:
ds The gdal.Dataset object
band The gdal.Band object
gdalRat The gdal.RasterAttributeTable object
columnNdxByName A lookup table to get column index from column name
- class rios.ratapplier.GdalHandlesCollection(inRats, outRats)¶
A set of all the GdalHandles objects
Check the consistency of the set of input RATs opened on the current instance. It is kind of assumed that the output rats will become consistent, although this is by no means guaranteed.
Checks the current set of filenames in use, and if it finds one with the same filename as the given ratHandle, assumes that it is already open, but with a different layer number. If so, return the gdal.Dataset associated with it, so it can be shared. If not found, return None.
Return the number of rows in the RATs of all files. Actually just returns the row count of the first input RAT, assuming that they are all the same (see self.checkConsistency())
- class rios.ratapplier.OtherArguments¶
Simple empty class which can be used to pass arbitrary arguments in and out of the apply() function, to the user function. Anything stored on this object persists between iterations over blocks.
- class rios.ratapplier.RatApplierControls¶
Controls object for the ratapplier. An instance of this class can be given to the apply() function, to control its behaviour.
- outputRowCountHandling(method=0, totalsize=None, incrementsize=None)¶
Determine how the row count of the output RAT(s) is handled. The method argument can be one of the following constants:
RCM_EQUALS_INPUT Output RAT(s) have same number of rows as input RAT(s)
RCM_FIXED Output row count is set to a fixed size
RCM_INCREMENT Output row count is incremented as required
The totalsize and incrementsize arguments, if given, should be int.
totalsize is used to set the output row count when the method is RCM_FIXED. It is required, if the method is RCM_FIXED.
incrementsize is used to determine how much the row count is incremented by, if the method is RCM_INCREMENT. If not given, it defaults to the length of the block being written.
The most common case if the default (i.e. RCM_EQUALS_INPUT). If the output RAT row count will be different from the input, and the count can be known in advance, then you should use RCM_FIXED to set that size. Only if the output RAT row count cannot be determined in advance should you use RCM_INCREMENT.
For some raster formats, using RCM_INCREMENT will result in wasted space, depending on the incrementsize used. Caution is recommended.
Change the number of rows used per block
Set the progress display object. Default is no progress object.
Set the total number of rows to be processed. This is normally only useful when doing something like writing an output RAT without any input RAT, so the number of rows is otherwise undefined.
- class rios.ratapplier.RatApplierState(rowCount)¶
Current state of RAT applier. An instance of this class is passed as the first argument to the user function.
blockNdx Index number of current block (first block is zero, second block is 1, …)
startrow RAT row number of first row of current block (first row is zero)
blockLen Number of rows in current block
inputRowNumbers Row numbers in whole input RAT(s) corresponding to current block
rowCount The total number of rows in the input RAT(s)
- setBlock(i, requestedBlockLen)¶
Sets the state to be pointing at the i-th block. i starts at zero.
- class rios.ratapplier.RatAssociations¶
Class associating external raster attribute tables with internal names. Each attribute defined on this object should be a RatHandle object.
Return a list of the names of the RatHandle objects defined on this object
- class rios.ratapplier.RatBlockAssociation(state, gdalHandles)¶
Hold one or more blocks of data from RAT columns of a single RAT. This class is kind of at the heart of the module.
Most generic attributes on this class are blocks of data read from and written to the RAT, and so are not actually attributes at all, but are managed by the __setattr__/__getattr__ over-ride methods. Their names are the names of the columns to which they correspond. However, there are a number of genuine attributes which also need to be present, for internal use, and it is obviously important that their names not be the same as any columns. Since we obviously cannot guarantee this, we have named them beginning with “Z__”, in the hope that no-one ever has a column with a name like this. These are all created within the __init__ method.
The main purpose of using __getattr__ is to avoid reading columns which the userFunc is not actually using. As a consequence, one also needs to use __setattr__ to handle the data the same way.
Clear the current cache of data blocks
If the row count for this RAT has been over-allocated, reset it back to the actual number of rows we wrote.
Return the usage of the given column
- guessNewRowCount(rowsToWrite, controls, state)¶
When we are writing to a new RAT, and we find that we need to write more rows than it currently has, we guess what we should set the row count to be, depending on how the controls have told us to do this.
- setUsage(columnName, usage)¶
Set the usage of the given column.
- writeCache(controls, state)¶
Write all cached data blocks. Creates the columns if they do not already exist.
- class rios.ratapplier.RatHandle(filename, layernum=1)¶
A handle onto the RAT for a single image layer. This is used as an easy way for the user to nominate both a filename and a layer number.
- rios.ratapplier.apply(userFunc, inRats, outRats, otherargs=None, controls=None)¶
Apply the given function across the whole of the given raster attribute tables. The attribute table is processing one chunk at a time allowing very large tables without running out of memory.
All raster files must already exist, but new columns can be created.
Normal pattern is something like the following:
inRats = ratapplier.RatAssociations() outRats = ratapplier.RatAssociations() inRats.vegclass = ratapplier.RatHandle('vegclass.kea') outRats.vegclass = ratapplier.RatHandle('vegclass.kea') ratapplier.apply(myFunc, inRats, outRats) def myFunc(info, inputs, outputs): outputs.vegclass.colSum = inputs.vegclass.col1 + inputs.vegclass.col2
rios.ratapplier.RatHandledefaults to using the RAT from the first layer of the image which is usual for thematic imagery. This can be overridden using the layernum parameter. The names of the columns are reflected in the names of the fields on the inputs and outputs parameters and multiple input and output RAT’s can be specified
The otherargs argument can be any object, and is typically an instance of
rios.ratapplier.OtherArguments. It will be passed in to each call of the user function, unchanged between calls, so that other values can be passed in, and calculated quantities passed back. The values stored on this object are not directly associated with rows of the RAT, and must be managed entirely by the user. If it is not required, it need not be passed.
The controls object is an instance of the
rios.ratapplier.RatApplierControlsclass, and is only required if the default control settings are to be changed.
The info object which is passed to the user function is an instance of the
By default new columns are marked as ‘Generic’. If they need to be marked as having a specific usage, the following syntax is used:
def addCols(info, inputs, outputs): "Add two columns and output" outputs.outimg.colSum = inputs.inimg.col1 + inputs.inimg.col4 outputs.outImg.Red = someRedValue # some calculated red value, in 0-255 range outputs.outImg.setUsage('Red', gdal.GFU_Red)
Since the RAT is now read one chunk at a time calling numpy functions like mean() etc will only return statistics for the current chunk, not globally. The solution is to use the
from rios.fileinfo import RatStats columnsOfInterest = ['col1', 'col4'] ratStatsObj = RatStats('file.img', columnlist=columnsOfInterest) print(ratStatsObj.col1.mean, ratStatsObj.col4.mean)
Each column attribute is an instance of
rios.fileinfo.ColumnStatsand is intended to be passed through the apply function via the otherargs mechanism.
- rios.ratapplier.copyRAT(input, output, progress=None)¶
Given an input and output filenames copies the RAT from the input and writes it to the output.
- rios.ratapplier.internalCopyRAT(info, inputs, outputs, otherArgs)¶
Called from copyRAT. Copies the RAT
- rios.ratapplier.RCM_EQUALS_INPUT = 0¶
Same as input
- rios.ratapplier.RCM_FIXED = 1¶
- rios.ratapplier.RCM_INCREMENT = 2¶
Incremented as required