ratapplier¶
Apply a function to a whole Raster Attribute Table (RAT), block by block, so as to avoid using large amounts of memory. Transparently takes care of the details of reading and writing columns from the RAT.
This was written in rough mimicry of the RIOS image applier functionality.
The most important components are the rios.ratapplier.apply() function, and
the rios.ratapplier.RatApplierControls class. Pretty much everything else is for internal
use only. The docstring for the rios.ratapplier.apply() function gives a simple example
of its use.
In order to work through the RAT(s) block by block, we rely on having available routines to read/write only a part of the RAT. This is available with GDAL 1.11 or later. If this is not available, we fudge the same thing by reading/writing whole columns, i.e. the block size is the full length of the RAT. This last case is not efficient with memory, but at least provides the same functionality.
- class rios.ratapplier.BlockCollection(ratAssoc, state, allFileHandles, controls)[source]¶
Hold a set of RatBlockAssociation objects, for all currently open RATs
- class rios.ratapplier.FileHandles(ratHandle, update=False, sharedDS=None)[source]¶
Hang onto all the required file-related objects relating to a given opened RAT. For a GDAL RAT, these are the GDAL objects, for a Zarr-based RAT, just the RatZarr object. The unused objects are None.
Attributes are:
ds The gdal.Dataset object
band The gdal.Band object
gdalRat The gdal.RasterAttributeTable object
columnNdxByName A lookup table to get column index from column name
rz The RatZarr object
- class rios.ratapplier.FileHandlesCollection(inRats, outRats)[source]¶
A set of all the FileHandles objects
- checkConsistency()[source]¶
Check the consistency of the set of input RATs opened on the current instance. It is kind of assumed that the output rats will become consistent, although this is by no means guaranteed.
- checkExistingDS(ratHandle)[source]¶
Checks the current set of filenames in use, and if it finds one with the same filename as the given ratHandle, assumes that it is already open, but with a different layer number. If so, return the gdal.Dataset associated with it, so it can be shared. If not found, return None.
- class rios.ratapplier.OtherArguments[source]¶
Simple empty class which can be used to pass arbitrary arguments in and out of the apply() function, to the user function. Anything stored on this object persists between iterations over blocks.
- class rios.ratapplier.RatApplierControls[source]¶
Controls object for the ratapplier. An instance of this class can be given to the apply() function, to control its behaviour.
- outputRowCountHandling(method=0, totalsize=None, incrementsize=None)[source]¶
Determine how the row count of the output RAT(s) is handled. The method argument can be one of the following constants:
RCM_EQUALS_INPUT Output RAT(s) have same number of rows as input RAT(s)
RCM_FIXED Output row count is set to a fixed size
RCM_INCREMENT Output row count is incremented as required
The totalsize and incrementsize arguments, if given, should be int.
totalsize is used to set the output row count when the method is RCM_FIXED. It is required, if the method is RCM_FIXED.
incrementsize is used to determine how much the row count is incremented by, if the method is RCM_INCREMENT. If not given, it defaults to the length of the block being written.
The most common case if the default (i.e. RCM_EQUALS_INPUT). If the output RAT row count will be different from the input, and the count can be known in advance, then you should use RCM_FIXED to set that size. Only if the output RAT row count cannot be determined in advance should you use RCM_INCREMENT.
For some raster formats, using RCM_INCREMENT will result in wasted space, depending on the incrementsize used. Caution is recommended.
- setRowCount(rowCount)[source]¶
Set the total number of rows to be processed. This is normally only useful when doing something like writing an output RAT without any input RAT, so the number of rows is otherwise undefined.
- setUseStringDType(useStringDType)[source]¶
Set whether to use the numpy-2.x StringDType when reading GFT_String columns. If this is True, then when data is read from a GFT_String column, it will be converted to StringDType (i.e. an array of variable-length strings) before presenting it to the user.
The default is the old behaviour, i.e. the returned string arrays are fixed-width bytes string arrays.
If StringDType is unavailable (numpy < 2.0), this flag is always False.
- class rios.ratapplier.RatApplierState(rowCount)[source]¶
Current state of RAT applier. An instance of this class is passed as the first argument to the user function.
Attributes:
blockNdx Index number of current block (first block is zero, second block is 1, …)
startrow RAT row number of first row of current block (first row is zero)
blockLen Number of rows in current block
inputRowNumbers Row numbers in whole input RAT(s) corresponding to current block
rowCount The total number of rows in the input RAT(s)
- class rios.ratapplier.RatAssociations[source]¶
Class associating external raster attribute tables with internal names. Each attribute defined on this object should be a RatHandle object.
- class rios.ratapplier.RatBlockAssociation(state, fileHandles, controls)[source]¶
Hold one or more blocks of data from RAT columns of a single RAT. This class is kind of at the heart of the module.
Most generic attributes on this class are blocks of data read from and written to the RAT, and so are not actually attributes at all, but are managed by the __setattr__/__getattr__ over-ride methods. Their names are the names of the columns to which they correspond. However, there are a number of genuine attributes which also need to be present, for internal use, and it is obviously important that their names not be the same as any columns. Since we obviously cannot guarantee this, we have named them beginning with “Z__”, in the hope that no-one ever has a column with a name like this. These are all created within the __init__ method.
The main purpose of using __getattr__ is to avoid reading columns which the userFunc is not actually using. As a consequence, one also needs to use __setattr__ to handle the data the same way.
- checkZarrfileParams()[source]¶
Check if the Zarr file has only just been created. If so, initialize it with suitable parameters
- finaliseRowCount()[source]¶
If the row count for this RAT has been over-allocated, reset it back to the actual number of rows we wrote.
- class rios.ratapplier.RatHandle(filename, layernum=1)[source]¶
A handle onto the RAT for a single image layer. This is used as an easy way for the user to nominate both a filename and a layer number.
- class rios.ratapplier.RatZarrHandle(filename)[source]¶
Equivalent of RatHandle, but for a RAT stored in a Zarr file
New in version 2.0.9
- rios.ratapplier.apply(userFunc, inRats, outRats, otherargs=None, controls=None)[source]¶
Apply the given function across the whole of the given raster attribute tables. The attribute table is processing one chunk at a time allowing very large tables without running out of memory.
All raster files must already exist, but new columns can be created.
Normal pattern is something like the following:
inRats = ratapplier.RatAssociations() outRats = ratapplier.RatAssociations() inRats.vegclass = ratapplier.RatHandle('vegclass.kea') outRats.vegclass = ratapplier.RatHandle('vegclass.kea') ratapplier.apply(myFunc, inRats, outRats) def myFunc(info, inputs, outputs): outputs.vegclass.colSum = inputs.vegclass.col1 + inputs.vegclass.col2
The
rios.ratapplier.RatHandledefaults to using the RAT from the first layer of the image which is usual for thematic imagery. This can be overridden using the layernum parameter. The names of the columns are reflected in the names of the fields on the inputs and outputs parameters and multiple input and output RAT’s can be specifiedThe otherargs argument can be any object, and is typically an instance of
rios.ratapplier.OtherArguments. It will be passed in to each call of the user function, unchanged between calls, so that other values can be passed in, and calculated quantities passed back. The values stored on this object are not directly associated with rows of the RAT, and must be managed entirely by the user. If it is not required, it need not be passed.The controls object is an instance of the
rios.ratapplier.RatApplierControlsclass, and is only required if the default control settings are to be changed.The info object which is passed to the user function is an instance of the
rios.ratapplier.RatApplierStateclass.By default new columns are marked as ‘Generic’. If they need to be marked as having a specific usage, the following syntax is used:
def addCols(info, inputs, outputs): "Add two columns and output" outputs.outimg.colSum = inputs.inimg.col1 + inputs.inimg.col4 outputs.outImg.Red = someRedValue # some calculated red value, in 0-255 range outputs.outImg.setUsage('Red', gdal.GFU_Red)
Statistics
Since the RAT is now read one chunk at a time calling numpy functions like mean() etc will only return statistics for the current chunk, not globally. The solution is to use the
rios.fileinfo.RatStatsclass:from rios.fileinfo import RatStats columnsOfInterest = ['col1', 'col4'] ratStatsObj = RatStats('file.img', columnlist=columnsOfInterest) print(ratStatsObj.col1.mean, ratStatsObj.col4.mean)
Each column attribute is an instance of
rios.fileinfo.ColumnStatsand is intended to be passed through the apply function via the otherargs mechanism.Non-GDAL RATs
An alternative form of RAT is supported, based on Zarr arrays. Instead of using the RatHandle class to connect to a RAT in a GDAL file, use the RatZarrHandle class to associate with a Zarr file. This has a very specific internal structure, and is intended for use in writing columns outside of the GDAL raster file, if it cannot be written for some reason. The main use case is to read and/or write a RAT stored on AWS S3, but it also works the same if the file is on local disk.
Example:
inRats.vegclass = ratapplier.RatHandle('vegclass.kea') outRats.extra = ratapplier.RatZarrHandle('s3://mybucket/extra.zarr')
This can then used in the same way as a GDAL-based RAT.
It requires the ratzarr package (https://github.com/ubarsc/ratzarr).
- rios.ratapplier.copyRAT(infile, outfile, progress=None, omitColumns=None)[source]¶
Given an input and output filenames copies the RAT from the input and writes it to the output.
If omitColumns is set, then it should be a sequence of columns names that are to be omitted from the copying. For example, the ‘Histogram’ column may need to be omitted so that the pixel counts stay the correct values in the output image.
infile and outfile can both be RatZarr files. If the output file does not exist, it will be created as a RatZarr file.
- rios.ratapplier.internalCopyRAT(info, inputs, outputs, otherArgs)[source]¶
Called from copyRAT. Copies the RAT
- rios.ratapplier.RCM_EQUALS_INPUT = 0¶
Same as input
- rios.ratapplier.RCM_FIXED = 1¶
Fixed size
- rios.ratapplier.RCM_INCREMENT = 2¶
Incremented as required