WIPO ST.26 Breakdown - Part V
Continuing our way through DNA requirements, the next point indicates that any uracil in DNA or thymine in RNA are considered modified nucleotides and, in both instances, must be included as a "t, with a feature key "modified_base", the qualifier "mod_base", with "OTHER" as the qualifier and "note" defining as either uracil or thymine.
An example of this type of feature/qualifier is below. As indicated in a previous blog, the inclusion of uracil as a "t" vs. a "u" differs from ST.25 and requires more manual update of sequences for them to be properly included in the Sequence Listing. It also makes differentiating RNA and DNA impossible without referencing the molecule type tag.
<INSDFeature>
<INSDFeature_key>modified_base</INSDFeature_key>
<INSDFeature_location>15</INSDFeature_location>
<INSDFeature_quals>
<INSDQualifier>
<INSDQualifer_name>mod_base</INSDQualifier_name>
<INSDQualifer_value>OTHER</INSDQualifer_value>
</INSDQualifer>
</INSDQualifer>
<INSDQualifer_name>note</INSDQualifier_name>
<INSDQualifer_value>uracil</INSDQualifer_value>
</INSDQualifer>
</INSDFeature_quals>
</INSDFeature>
A sequence including an unknown position must also include features as shown above, with the position(s) of the unknowns included as "n" and the feature key "unsure."
Finally, if there are consecutive positions with the same modifications, the features can be combined into a single feature by using the syntax "x..y" in the "<INSDFeature_location>" field.
While this is the end of the nucleotide section of the standard (for now), there are still many items/scenarios that will be covered as we move into the structure of the sequences in the examples section.
I hope you can take away from these breakdowns the complexity of the new standard and the level of detail required to ensure that the sequences are included accurately in the Sequence Listing.
If you have any questions, please reach out to me to discuss.