Opened 4 years ago
Last modified 2 years ago
#153 new enhancement
Requirements related to specific standard names
Reported by: | martin.juckes | Owned by: | cf-conventions@… |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | cf-conventions | Version: | |
Keywords: | Cc: |
Description (last modified by martin.juckes)
A significant number of standard names contain, in their definitions, explicit specifications for additional required metadata. For instance, if the standard_name is region then there are constraints on the allowed values of the data variable. The standard name descriptions cannot include examples or markup, and the specification of the rules is not as clear as in the convention text. It also appears that the rules are not checked by the CF checker (at least not the few that I have looked at in detail) and I think the best way to get consistent checking would be to first create a well structured summary of these rules in the conventions document.
The specific proposal is add a new Appendix which lists the rules with examples where appropriate.
It will take some time to complete the list. I propose that we add a provisional list, after agreeing the format and approach, and work towards completion later.
Appendix D: Rules associated with standard names
Some standard names bring additional constraints on the meta-data and/or data values of the variables they are associated with. This appendix list such names, grouped according to the types of constraint, and provides usage examples where needed.
Required Coordinates
A common constraint involves the requirement that a particular coordinate or set of coordinates be present. The following table lists the rules and associated standard names. An explanation of each rule follows below.
Rule | Description | |
---|---|---|
Standard Name(s) | ||
Required coordinate(s) | ||
1 | Area Fraction | The fractional area in a cell covered by a particulate area type. |
area_fraction | ||
area_type | ||
2 | Lifted from | Parameters defined in terms of lifting from a reference level |
atmosphere_convective_available_potential_energy, atmosphere_convective_inhibition, atmosphere_level_of_free_convection, atmosphere_lifting_condensation_level | ||
original_air_pressure_of_lifted_parcel | ||
3 | Lifting range | Parameter defined in terms of lifting through a specified range |
temperature_difference_between_ambient_air_and_air_lifted_adiabatically | ||
original_air_pressure_of_lifted_parcel,final_air_pressure_of_lifted_parcel | ||
4 | Radiances | For radiance variables a direction must be specified |
downwelling_photosynthetic_photon_radiance_in_sea_water and others | ||
zenith_angle | ||
5 | Reference state | Variables which depend on reference air temperature and humidity |
mass_concentration_of_pm_*_ambient_aerosol_in_air, mass_fraction_of_pm_*_ambient_aerosol_in_air | ||
air_temperature, relative_humidity | ||
6 | Wavelength | Functions of wavelength |
*_per_unit_wavelength_in_air | ||
radiation_wavelength |
In all cases, the structure follows the same pattern, illustrated by the following examples for case 1. Area Fraction:
float cropcover(lat,lon); cropcover:standard_name = 'area_fraction'; cropcover:coordinates = 'crop'; cropcover:units = '1'; character crop(nchar); crop:standard_name = 'area_type'; data: crop = 'crop';
Other rules
Regions, Area Types, Cyclone keywords
If the standard name is in the list below then the variables must either be of character type or use flag values to associate each element to a character string. The string values must match the vocabularies or rules specified below.
Standard Name | Vocabulary |
region | CF standard region |
area_type | CF area type |
scene_type_of_dvorak_ tropical_cyclone_cloud_region | uniform_central_dense_overcast; embedded_center; irregular_central_dense_overcast; curved_band; shear |
scene_type_of_dvorak_ tropical_cyclone_eye_region | clear_ragged_or_obscured_eye; pinhole_eye; large_eye; no_eye |
automated_tropical_cyclone_ forecasting_system_storm_identifier | LLnnnnnn, LL=AL, SL, EP, CP, WP, IO, or SH. n an integer |
Quantities representing a layer average or sum
Many "layer" quantities require vertical coordinates with bounds.
- *_atmosphere_layer[_*];
- *_ocean_layer[_*];
- *_soil_layer[_*];
Variation of variables in sigma coordinates due to surface pressure change
change_in_energy_content_of_atmosphere_layer_due_to_change_in_sigma_coordinate_wrt_surface_pressure: must have a vertical coordinate variable (axis=Z).
float deltae(sig); deltae:standard_name = 'change_in_energy_content_of_atmosphere_layer_due_to_change_in_sigma_coordinate_wrt_surface_pressure'; deltae:units = 'J m-2'; float sig(sig); sig:axis = 'Z'; sig:standard_name = 'atmosphere_sigma_coordinate'; sig:bounds = 'sig_bnds'; sig:units = '1'; float sig_bnds(2,sig); # required because of _atmosphere_layer
Temporal change
Time rate of change or displacement over time require bounds on time coordinate:
- change_over_time_*;
- *_displacement;
Integrals over the whole atmosphere
When the quantity is defined to be an integral over the whole atmosphere the spatial coordinates should be absent or defined with bounds representing the whole atmosphere:
- [tendency_of_]atmosphere_moles_of_*"
Comments for discussion
In some cases the wording of standard_name definitions could be interpreted as a recommendation or suggestion rather than a requirement. If some of these are intended only as suggestions, that should be flagged.
Attachments (4)
Change History (20)
comment:1 Changed 4 years ago by martin.juckes
- Description modified (diff)
comment:2 Changed 4 years ago by martin.juckes
- Description modified (diff)
- Summary changed from Requires related to specific standard names to Requirements related to specific standard names
comment:3 Changed 4 years ago by jonathan
comment:4 Changed 4 years ago by martin.juckes
Dear Jonathan,
I'm happy with that proposal .. perhaps with one extra line in the conventions document to say that "Use of some standard names introduces additional constraints on the variable attributes and/or values, as detailed in link to: Requirements Related to Specific Standard Names . "
I also agree on the need for a machine readable form. I was thinking that something would be needed to assist the proof reading. E.g. a JSON file which can be used to generate a spreadsheet displaying the definitions of all the variables listed under each constraint.
I think the more legible wiki form is also necessary, in order to provide the usage examples. In the earlier email discusion, Roy Lowry suggested encoding the rules in RDF and serving them through the NERC Vocab Server alongside the standard names. This would be neat, but I think it may be worth generating wiki and JSON versions first, in order to get a clearer view of the range of constraints that we are dealing with,
regards, Martin
comment:5 Changed 4 years ago by jonathan
Dear Martin
OK, good. I agree we also need a human-readable version - wiki and JSON would be fine.
Best wishes
Jonathan
comment:6 Changed 4 years ago by martin.juckes
Dear Jonathan,
On 2nd thoughts, however, in connection with updating "frequently and easily", we need to be careful about backward compatibility. E.g. if we introduce it in parallel with CF-1.7, files which were considered valid under CF-1.6 might become invalid. We want, I think, such files to continue to be considered as vaild under CF-1.6, hence the checker should not use this extension when checking against earlier convention versions. This differs from the policy with the standard name list, for which the latest is always used. This implies, I think, that this document would need to be clearly versioned in a way which makes the link to convention versions clear, eg. we might start with 1.7.00 and increment to 1.7.01 etc until the convention moves to 1.8.
I can see that we want flexibility to add rules about new standard names when the standard name table is updated, and this is far more frequent that convention updates. We need to be careful about dealing with rules for existing standard names which might have been overlooked. Once we have a 1.7.00 version, we should not change any rules for existing standard names until 1.8.00 is launched, though we could perhaps add advisory notes where appropriate.
Does this sound workable?
regards, Martin
comment:7 Changed 4 years ago by jonathan
Dear Martin
I appreciate your caution but I think we can be a bit more relaxed. This new document does not have any information which isn't already in the standard name table, so it may be regarded as an adjunct to that. Hence I think the new document should have the same version numbers as the standard name table, though it will probably not be updated every time. Although at the moment the CF checker doesn't verify that the constraints are satisfied, it could already have done so - it's a matter of implementation, not the convention. The constraints are not new. We're simply making it easier to check them. We aren't changing anything about the convention. We also have a choice to make about whether the checker would regard not meeting these constraints as an error (i.e. breaking a requirement) or bad practice (i.e. not respecting a recommendation).
Best wishes
Jonathan
comment:8 Changed 4 years ago by ros
Sorry, coming to this rather late.... Happy to have these rules listed in a separate document so long as it is easily readable by the CF Checker. With the standard-name and area-types tables being in XML that would obviously require the least amount of work, but not against another format if that would be more appropriate to fit other requirements.
The extra rules has obviously been overlooked and not made it into the CF Checker document. I'm happy to add these in in the next release.
Cheers, Ros.
comment:9 Changed 4 years ago by martin.juckes
Dear Ros,
OK, I think the rules could be expressed in a simple XML format, one element per rule. e.g. <rule target="area_fraction" requiredCoordinate="area_type"/> to specify that "area_fraction" must have and area_type variable as a coordinate. Where a variable has more than one required coordinate, I would list this in two separate XML elements. Other rules would be of the form <rule target="...." requiredAxis="Z"/>, to specify that a variable must have a dimension or coordinate with "axis=Z" <rule target="...." requiredBoundAxis="Z"/> to specify additional that the dimension or coordinate in question must have bounds set.
The region and area_type rules have a choice, which we could encode as follows:
<choice target="region"> <option type="char"\> <option attribute="flag_values"\> <\choice> <rule target="region" dataValuesIn="CF Standard Region"\>
The data values referred to in the last case should be interpreted as th flag values if present. On the other hand, it may be easier to just have a named test for these two, rather than using a complex schema like this.
Cheers, Martin
comment:10 Changed 4 years ago by ros
Dear Martin,
I think that would work absolutely fine from my point of view. (I have just briefly looked at reading JSON, having no experience with it, into python which does look pretty easy if that was deemed to be more appropriate.)
One thing we would also need to do is put the Standardised Region List into a readable format as it currently only appears to exist on the website as an HTML list. I have just discovered that I did put the check of valid regions into the checker but it never got released - I'll include it in the next release.
Regards, Ros.
comment:11 Changed 4 years ago by martin.juckes
Dear Ros,
Loading json files is trivial ... a one line command and then you get a python object (e.g. a list or a dictionary). You then have to parse the object .. the advantage of XML is that there is a well tested approach to enforcing structure on the object, which, I find, tends to make parsing more reliable. I can easily export it as json as well, as I suspect that will make it more accessible to others. Thinking about defining the structure, it will be easier to have rules of the form: <rule targ="some_standard_name"><requiredAxis value="Z"></rule>.
The region/area_type rule could be encoded more simply as
<rule targ="region"><charOrFlagIn value="CF Standard Region"\><\rule>
This would then give 4 rules to encode: requiredCoordinate, requiredAxis, requiredBoundAxis, charOrFlagIn.
regards, Martin
comment:12 Changed 4 years ago by martin.juckes
Dear Ros,
I agree that we need a machine readable version of the Standardised Region List .. I'll start a new ticket for that,
regards, Martin
Changed 4 years ago by martin.juckes
CF Standard Name Rules Schema (based on CF Standard Name Schema)
comment:13 Changed 4 years ago by martin.juckes
Hello Ros,
After looking at the schema of the standard name list, I've adapted that for the rules, with typical entries of the form:
<rule id="moisture_content_of_soil_layer.boundAxis"> <description>The soil layer must be described by a bounds attribute on a vertical coordinate.</description> <target>moisture_content_of_soil_layer</target> <requiredBoundAxis>Z</requiredBoundAxis> </rule>
The "id" has to be unique, so may need to be extended beyond the standard name which the rule is intended to apply to. Ideally, the target standard name should be constrained by the schema to be in the CF list, but I haven't implemented that yet. attachment:CF_Standard_Name_Rules.xml is a demo XML document with 3 rules, and attachment:CF_Standard_Name_Rules.json is the same in JSON. The schema for the XML is attachment:CFStandardNameRules-1.1.xsd.
This approach makes it easier to impose the schema rules on the names of the rules and the associated restrictions on the values (e.g. "requiredBoundAxis" should take a value "X", "Y", ...).
cheers, Martin
comment:14 Changed 4 years ago by martin.juckes
I've updated the format slightly again, making it a straight list of records, each containing 'rule', 'id', 'label', 'comment', 'target', 'value'. The 'target' is the standard name that this rule applies to, 'rule' identifies the type of rule, and 'value' is additional information needed by the rule. I'm using this now to verify specifications related to these rules in the data request.
comment:15 Changed 2 years ago by martin.juckes
- Description modified (diff)
Adding a rule for names which represent the integral of a quantity over the whole atmosphere and hence should not have spatial coordinates.
comment:16 Changed 2 years ago by martin.juckes
- Description modified (diff)
Extended the rules to cover 3 names holding tropical cyclone keywords.
Dear Martin
I think this is a good idea, thank you, and I agree with the proposal except that I suggest it would be better to have it as a separate document on the standard name page, like the guidelines, rather than as an appendix in the CF convention document. That is because
The CF checker could consult it in either case. For use by the CF checker or other software, I suppose that this list of constraints should be made available in an form convenient for reading by programs. Ros's opinion would be useful.
Best wishes
Jonathan