Structure of a Sequence Listing in XML (WIPO ST.26 format)
As stated in the Standard, a sequence listing comprises two portions: 1) the general information part and 2) the sequence data part. Examples of a valid sequence listing file can be downloaded from the WIPO website here.
Specific requirements of the XML file are as follows:
1. The first line must contain "<?xml version="1.0" encoding="UTF-8"?>"
2. The second line must include contain "<!DOCTYPE ST26SequenceListing PUBLIC "-//WIPO//DTD Sequence Listing 1.3//EN""ST26SequenceListing_V1_3.dtd">.
3. The entire sequence listing must be in a single file.
4. The file must be encoded in Unicode UTF-8.
Taking a closer look at the general information element, the following attributes/elements are required.
· dtdVersion
· nonEnglishFreeTextLanguageCode
· ApplicationIdentification (mandatory if the application has been filed and received an application number)
· IPOfficeCode
· ApplicationNumberText
· FilingDate (mandatory if the application has been filed and received an application number)
· ApplicantFileReference (optional if an application number has been included)
· EarliestPriorityApplicationIdentification
· ApplicantName
· ApplicantNameLatin (mandatory if applicant name includes non-Latin characters)
· InventionTitle (mandatory in the language of filing)
· SequenceTotalQuantity
It is important to note that certain fields must be indicated in the element as they are referred to in the language of the filing, as well as a transition or transliteration if there are non-Latin characters.
Fortunately, these items can be easily entered into WIPO sequence. The sequence data portion will be broken down next post and is slightly more complicated.
Only 17 days to go!
WIPO ST.26 Breakdown – Part VIII
As mentioned in the last post, this post is going to focus on sequences with D-amino acids. As you are not aware, all sequences with D-amino acids are required to be included in a Sequence Listing (unless fewer than 4 amino acids in length). This is a drastic change from ST.25 which did not require these sequences. This means that, going forward, if a Sequence Listing was submitted in a parent application under ST.25, it will be necessary to review the application for any discloses with D-amino acids that will need to be added and included in the Sequence Listing for submission under ST.26.
In addition, since each D-amino acid is considered a modified residue, the ST.26 listing will need to include a feature/qualifier for each D-amino acid position with the full name of the D-amino acid residue as shown below.
<INSDFeature>
<INSDFeature_key>SITE</INSDFeature_key>
<INSDFeature_location>9</INSDFeature_location>
<INSDFeature_quals>
<INSDQualifier>
<INSDQualifier_name>note</INSDQualifer>
<INSDQualifier_value>D-Arginine</INSDQualifer_value>
<INSDQualifier>
<INSDFeature_quals>
</INSDFeature>
As you can tell, depending on the sequence, defining each D-amino acid position will add significant time and room for error, especially considering that these sequences were not even required previously.
The overall impact of this new requirement to include D-amino acids will be seen in the overall increase in the number of sequences in Sequence Listings, as well as the increased number of features/qualifiers. The burden will be on the applicant to ensure these sequences are both included and annotated properly.
Any questions, please reach out to me!
WIPO ST.26 Breakdown - Part VII
We now move into the section explaining amino acids, the first portion of which is fairly redundant with the information in this post, as well as ST.25. For example, amino acid sequences must be disclosed from N-term to C-term. Furthermore, as with DNA sequences, if a sequence is circular the applicant must choose the amino acid in residue position 1.
One difference to note is that unlike in ST.25, residue numbering must start with 1. Under ST.25, applicants could include signal sequences which would incorporate negative numbering for the amino acids comprising the signal sequence. Under ST.26, this information will need to be incorporated into the Sequence Listing in a different manner to be maintain, especially instances where this information may not be present elsewhere in the application.
Like ST.25, if internal terminator symbols are present in a sequence, each portion must be assigned its own unique identifier and be included in the Sequence Listing separately. The exception being any portions that are fewer than 4 amino acids which MUST NOT be included in the Sequence Listing under the ST.26 guidelines. Under ST.25, these often-small peptides were easily incorporated into the Sequence Listing, however, greater caution will need to be used to ensure that none of these fragments are under 4 amino acids in length.
In the instance of modified amino acids, the unmodified residue should be included within the sequence itself with a feature defining the modified residue. If the modification cannot be represented by any other symbol an "X" will represent the position within the sequence.
We will further breakdown the inclusion of modifications in the next post, including the inclusion of D-amino acids. As sequences containing D-amino acids are now required to be included in Sequence Listings and these positions required modifications, ST.26 will result in significantly more feature information being included in Sequence Listings, which will increase the time spent preparing listings as well as the opportunity for error.
The next post will focus entirely on D-amino acid sequences as this is one of the largest changes to sequence inclusion requirements.
With implementation less than two months away, let me know if you have any questions.
WIPO ST.26 Breakdown - Part VI
We now move into the section explaining amino acids, the first portion of which is fairly redundant with the information in this post, as well as ST.25. For example, amino acid sequences must be disclosed from N-term to C-term. Furthermore, as with DNA sequences, if a sequence is circular the applicant must choose the amino acid in residue position 1.
One difference to note is that unlike in ST.25, residue numbering must start with 1. Under ST.25, applicants could include signal sequences which would incorporate negative numbering for the amino acids encompassing the signal sequence. Under ST.26, this information will need to be incorporated into the Sequence Listing in a different manner to be maintain, especially instances where this information may not be present elsewhere in the application.
Like ST.25, if internal terminator symbols are present in a sequence, each portion must be assigned its own unique identifier and be included separately. The exception being any portions that are fewer than 4 amino acids which MUST NOT be included in the Sequence Listing under the ST.26 guidelines. Under ST.25, these often-small peptides were easily incorporated into the Sequence Listing, however, greater caution will need to be used to ensure that none of these fragments are under 4 amino acids in length.
In the instance of modified amino acids, the unmodified residue should be included within the sequence itself with a feature defining the modified residue. If the modification cannot be represented by any other symbol an "X" will represent the position within the sequence.
We will further breakdown the inclusion of modifications in the next post, including the inclusion of D-amino acids. As sequences containing D-amino acids are now required to be included in Sequence Listings and these positions required modifications, ST.26 will result in significantly more feature information being included in Sequence Listings.
The next post will focus entirely on D-amino acid sequences as this is one of the largest changes to sequence inclusion requirements.
WIPO ST.26 Breakdown - Part V
Continuing our way through DNA requirements, the next point indicates that any uracil in DNA or thymine in RNA are considered modified nucleotides and, in both instances, must be included as a "t, with a feature key "modified_base", the qualifier "mod_base", with "OTHER" as the qualifier and "note" defining as either uracil or thymine.
An example of this type of feature/qualifier is below. As indicated in a previous blog, the inclusion of uracil as a "t" vs. a "u" differs from ST.25 and requires more manual update of sequences for them to be properly included in the Sequence Listing. It also makes differentiating RNA and DNA impossible without referencing the molecule type tag.
<INSDFeature>
<INSDFeature_key>modified_base</INSDFeature_key>
<INSDFeature_location>15</INSDFeature_location>
<INSDFeature_quals>
<INSDQualifier>
<INSDQualifer_name>mod_base</INSDQualifier_name>
<INSDQualifer_value>OTHER</INSDQualifer_value>
</INSDQualifer>
</INSDQualifer>
<INSDQualifer_name>note</INSDQualifier_name>
<INSDQualifer_value>uracil</INSDQualifer_value>
</INSDQualifer>
</INSDFeature_quals>
</INSDFeature>
A sequence including an unknown position must also include features as shown above, with the position(s) of the unknowns included as "n" and the feature key "unsure."
Finally, if there are consecutive positions with the same modifications, the features can be combined into a single feature by using the syntax "x..y" in the "<INSDFeature_location>" field.
While this is the end of the nucleotide section of the standard (for now), there are still many items/scenarios that will be covered as we move into the structure of the sequences in the examples section.
I hope you can take away from these breakdowns the complexity of the new standard and the level of detail required to ensure that the sequences are included accurately in the Sequence Listing.
If you have any questions, please reach out to me to discuss.
WIPO ST.26 Breakdown - Part IV
As we continue to work our way through the "representation of sequences" section, the focus remains on nucleotide sequences, specifically the complicated concepts of modified nucleotides. The first item discussed is that any "ambiguity symbol" (n, m, r, w, s, y, k, v, h, d or b) must be included as the most concise option. For example, a position that may be "a" or "g" must be included as an "r" rather than an "n". The symbol "n" is always interpreted as "a", "c", "g" or "t/u" unless further defined in a feature table. "n" can also only be a nucleotide and each "n" represents a single nucleotide.
It is important to note that if "n" represents "a", "c", "g" or "t/u" no features or qualifiers are required in the feature table. This differs from ST.25 which requires a "modified_base" feature with a description in the <223> line for all "n" positions regardless of if it is a standard "a", "c", "g" or "t/u."
Getting even more complicated, we begin to dive into the language surrounding modified nucleotides. ST.26 states: "A modified nucleotide must be further described in the feature table…using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Annex I (see Section 2, Table 2) as the qualifier value; if the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier (see paragraphs 97 and 98).”
The requirements for modification above are much more complicated than WIPO ST.25 as it requires the applicant to understand and be aware of the modifications in Section 2, Table 2 rather than being able to account for all modifications in the same manner. If the modification is known and described with an abbreviation in Table 2, it must be abbreviated in the listing. If the modification is not in Table 2 it must be described with a full unabbreviated name. In theory, it makes sense to standardize common modifications utilize their abbreviations, however, it is also known that sequence disclosures in patent applications are wildly inconsistent which may make the application of this rule complicated.
With about 3 months left before the implementation of ST.26, please contact me if you have any questions.
WIPO ST.26 Breakdown - Part III
As mentioned previously, the "Representation of Sequences" section goes into detail about the requirements and format of the sequence listing portion of the XML document therefore this section will be broken down into multiple posts.
As with WIPO ST.25, the first paragraph indicates that each sequence must be assigned a unique identifier. One noted difference is that unlike in ST.25 where a sequence could be identified as "residues 1-20 of SEQ ID NO: 1" (for example), this is no longer acceptable. All sequences require a unique number, regardless of if it is identical to a region of a longer sequence. As with ST.25, "000" placeholders are still acceptable if there is no sequence corresponding to an identifier.
Moving into nucleotide sequences, all sequences must be included in 5' to 3' direction or in the direction that mimics 5'-3' for modified molecules. Where a double stranded sequence is disclosed, either a single sequence or both sequences may be included assuming they are fully complementary, however, both must be included in 5'-3' orientation. If not fully complementary, both sequences must be included.
as with WIPO ST.25, the first nucleotide presented in a sequence is always position 1 except for sequences that are circular in which case the applicant may choose which nucleotide it wants designated as 1.
All nucleotides in a sequence must be represented using the following symbols (a, c, g, t, m, r, w, s, y, k, v, h, d, b, n) and be lowercase characters. You'll note that "u" is not an acceptable character for nucleotide sequences, therefore, unlike ST.25, all uracil residues in RNA molecules must be included as "t". This is particularly important to remember if a Sequence Listing is being converted from ST.25 format to ST.26 as ST.25 sequences will have "u" characters within the RNA sequences.
The items above represent just a small portion of the requirements for nucleotide sequences. The section continues to describe modified nucleotides which will be discussed in the next post.
Again, if you have any questions, please reach out.
WIPO ST.26 Breakdown – Part II
Working our way through WIPO Standard ST.26, the next section titled "SCOPE" provides further clarification regarding the XML document. Specifically, that the sequence listing must be a single file in XML format. This is a big change from the ASCII text format currently being used.
The file itself must contain a general information part and a sequence data part. It is noted that the general information part is solely for association of the sequence listing to the patent application. The sequence data part is composed of sequence data elements each containing information about a single sequence. The feature keys and qualifiers are based on INSDC and UniProt specifications.
The scope section continues to clarify the types of sequences to be included which are:
1. An unbranched sequence or a linear region of a branched sequence containing 10 or more specifically defined nucleotides, wherein the adjacent nucleotides are joined by phosphodiester linkage or a chemical bond that mimics the arrangement of nucleotides in a naturally occurring molecule.
2. An unbranched sequence or a linear portion of a branched sequence containing four or more specifically defined amino acids, wherein the amino acids form a single peptide backbone (adjacent amino acids are joined by peptide bonds).
*I emphasized the bold portion above as this is a new item that was added to ST.26 that was not previously required. The language in WIPO ST.25 stated, "branched sequences, sequences with fewer than four specifically defined nucleotides or amino acids as well as sequences comprising nucleotides or amino acids other than those listed in Appendix 2, Tables 1, 2, 3 and 4, are specifically excluded from this definition." It is important to remember when preparing a sequence listing for ST.26. If converting a ST.25 sequence listing these sequences most likely are not present in an ST.25 listing and will need to be added. Additionally, there are residues (O and U) that are "specifically defined" that were previously undefined and included as an X in a ST.25 formatted listing. Straight conversion of a ST.25 listing may not be possible for these reasons, among others.
The final portion of this section emphasizes that sequences that do not meet the requirements above must not be included.
The next section ("REPRESENTATION OF SEQUENCES") goes into the details of the XML sequence data portion, which will be broken down in future posts. In the meantime, if you have any questions or concerns regarding ST.26, please contact me to discuss.
WIPO ST.26 Breakdown - Part I
As we settle into 2022, the year of ST.26 implementation, I’ll breakdown the standard and guidance document and point out important changes to Sequence Listing preparation. I’ll begin with the first section by highlighting various definitions that impact the preparation of a compliant Sequence Listing.
If you are familiar with Sequence Listings than you are familiar with DNA and protein sequences and their components; however, it is important to under stand how the Standard defines the components to include them accurately in a Sequence Listing. Below is a breakdown of some key definitions and the impact they have on Sequence Listing preparation.
The Definitions:
"Amino acids" – Any of the amino acids “A”, “R”, “N”, “D”, “C”, “Q”, “E”, “G”, “H”, “I”, “L”, “K”, “M”, “F”, “P”, “O”, “S”, “U”, “T”, “W”, “Y”, or “V,” regardless of whether they are L- or D- oriented and/or have modified or synthetic side chains. A peptide nucleic acid (PNA) is not considered an amino acid but is considered a nucleic acid.
What this means – The new ST.26 definition for amino acids broadens the sequences that are required to be included in a Sequence Listing when the standard takes effect. First, the amino acids listed include "O" and "U" which were classified as "X" under ST.25. The definition also includes D-amino acids which under ST.25 were not required, and a sequence containing a single D-amino acid would not be required in and ST.25 Sequence Listing. All D-amino acid sequences will now be required with features/qualifiers for the D-amino acid positions.
"enumeration of its residues" – the disclosure of a sequence in a patent application by listing the residues in order. This includes by name, abbreviation, symbol or structure.
What this means – Again, the definition broadens the scope of the sequences that need to be included in an ST.26 Sequence Listing, especially regarding structures. Previously, under ST.25, sequences disclosed as structures were not required to be included except under rare instances that the structure was claimed and the examiner specifically requested their inclusion. Depending on the disclosure, this change could have a large impact on preparation as structures take time to manually review and determine the residues (especially if they are modified residues) prior to being imported into WIPO sequence.
"nucleotide" – Any nucleotide of a, c, g, t, m, r, w, s, y, k, v, h, d, b, or n or any nucleotide analog with a backbone of a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate or "an analogue of a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate, which when forming the backbone of a nucleic acid analogue, results in an arrangement of nucleobases that mimics the arrangement of nucleobases in nucleic acids containing a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate backbone, wherein the nucleic acid analogue is capable of base pairing with a complementary nucleic acid." This definition includes peptide nucleic acids, glycol nucleic acids, threose nucleic acids, morpholinos and cyclohexenyl nucleic acids.
What this means – As with the definition of amino acids, the nucleic acid definition is significantly broader than the definition in ST.25 with PNA, threose nucleic acids, morpholinos and glycol nucleic acids all now required. These molecule types will also require features/qualifiers with the appropriate definitions which will add complexity to the ST.26 Sequence Listing.
"may" – any optional and permissible approach but not a requirement.
"must" – a requirement of the standard. Failure to include will result in a noncompliant listing.
"must not" – prohibition of the standard. Inclusion will result in a noncompliant listing.
"should" – strongly encouraged approach but not required.
"should not" – strongly discouraged approach but not prohibited.
What these mean – ST.26 more clearly outlines items that are required versus prohibited. The most obvious "must not" is the prohibition of sequences with fewer than 10 specifically defined nucleotides or 4 specifically defined amino acids. These sequences were previously accepted in ST.25 (although not required), therefore, caution must be used if converting a ST.25 listing to ST.26. Many features/qualifiers fall under the "should"/"should not" categories, therefore, careful review of the guidance document is required to fully understand the instances various features/qualifiers should be included. It is also yet to be seen how different patent offices will interpret the “should”/”should not” definitions and if the “gray area” will lead to issues with the acceptance of Sequence Listings.
To review the standard yourself, the most recently adopted version of the standard is available here.
I will continue to breakdown the standard and guidance documents in future blogs. If you have any questions or comments regarding ST.26, please do not hesitate to reach out to me.
Next steps for WIPO ST.26
For years now, we have anticipated the implementation of WIPO ST.26 on January 1, 2022. This date has been noted in calendars around the world (at least mine) for a very long time. So, what has changed?
Looking back to the end of last year, the Committee of WIPO Standards met from November 30th to December 4th and approved several revisions to the Standard. They also discussed the continued progress and development of WIPO Sequence which was first available in beta form on April 7, 2020. The now stable version 1.0.0 is available to download and test free of cost to all applicants and 4 helpful webinars were provided to guide applicants on the standard and its use (foreign language versions are upcoming). The final English webinar took place on May 19th and everything seemed right on track. So, what has happened since then?
A month or so ago, the WIPO Sequence website noted that patent offices were having difficulty completing the steps required for implementation before January 1, 2022 and that the CWS was recommending a postponement to July 1, 2022. This decision is anticipated to take place at the WIPO General Assembly meeting next month.
The concern: the amended PCT Regulations are not expected to be adopted until the PCT Assembly meeting in October, therefore, members of various Offices have indicated to the International Bureau that they will be unable to recognize these changes or amend their national processes in time for the January 1st implementation. As a result, the Sequence Listing Task Force has recommended and agreed upon a 6-month postponement of the implementation. The International Bureau issued a circular and the responses from 29 member states indicated the following:
· All responses supported the postponement.
· While some offices were ready for implementation, they recognized the need to maintain a single date and simultaneous transition.
· Offices viewed the additional time as useful to perfect the WIPO Sequence suite, including adding further detail to the user guide and operations manual.
· Refresher webinars will be provided closer to the implementation.
As the responses to the circular were overwhelmingly in favor of the postponement, it is all but guaranteed that the new implementation date will be July 1, 2022. We will know for sure at the conclusion of next month’s meeting and I will keep you posted on the outcome!
Will Conversion from ST.25 to ST.26 be Possible?
Sequence Listings have been submitted to patent offices under WIPO Standard ST.25 for over 20 years. Applicants have processes in place for preparing such listings; however, on January 1, 2022, everything will change. In an ideal world, a simple process would convert compliant ST.25 sequence listings to the proper ST.26 format. Unfortunately, this ideal scenario is not possible for several reasons, some of which are outlined below.
1. The requirements for sequence inclusion are not the same. Sequences previously allowed under ST.25 are no longer permitted and vice versa. This means, ST.25 sequence listings may need to have sequences removed and/or added to be compliant.
2. ST.26 requires controlled vocabulary for DNA or RNA molecules in a mol_type qualifier (including genomic DNA, genomic RNA, mRNA, tRNA, rRNA, other DNA, other RNA, transcribed RNA, viral cRNA, unassigned DNA and unassigned RNA) which is information not present in ST.25 listings. It is suggested that the most generic value be selected to avoid added subject matter unless this information is clearly described in the application.
3. ST.26 includes default values for variant positions, specifically X or Xaa, which includes any of the 22 amino acids listed in Annex I. This default value, if not further defined, may represent added or deleted subject matter in comparison to the definitions of “any amino acid” in a ST.25 formatted listing. Caution will need to be used to ensure definitions of variables are carried over appropriately.
4. ST.25 sequence listings include uracil as “u” characters in RNA sequences. ST.26 will only permit inclusion of uracil as “t” characters.
5. ST.25 sequence listings may contain abbreviations for modified positions that are not present in the list of modified nucleotides/amino acids in ST.26. ST.26 only permits the inclusion of the full unabbreviated names for these modifications. If the abbreviation is known in the art and only represents a single modified nucleotide/amino acid, the unabbreviated name should not constitute new matter. If the abbreviated name is not known in the art or could represent more than one modified nucleotide/amino acid, compliance is not possible without introduction of added subject matter.
6. ST.25 contains feature keys that are not present in ST.26. Guidance has been provided for how to handle these feature keys to avoid adding or deleting subject matter, however, these items will need to be manually reviewed and entered and cannot simply be converted.
7. In ST.25, synthetic sequences were described as “artificial sequence”, whereas they will need to be listed as “synthetic construct” for ST.26. The same is true from “unknown” sequences which will now be classified as “unassigned.”
8. ST.25 allows for the inclusion of sequence publication information including but not limited to GenBank accession numbers. There are no equivalent fields in ST.26, therefore, this information would need to be included in the body of the application.
This list exemplifies some of the major difference when attempting to convert an ST.25 listing to ST.26, however, it is in no way comprehensive. Additional challenges are present involving some of the more intricate components of sequence listings.
If you have any questions regarding the transition to ST.26 on January 1, 2022, please do not hesitate to contact SmartBased IP Services.
ST.25 vs. ST.26 - A Comparison
As we are all aware at this point, there is a large difference between ASCII text format and XML. While this in and of itself is a dramatic change, the transition is further complicated by the differences in sequence requirements between the two standards.
A recent WIPO webinar provided a detailed comparison of the differences from which I have summarized some key information.
Required sequences:
ST.25: L-amino acid sequences with at least 4 specifically defined residues, DNA with at least 10 specifically defined nucleotides
ST.26: L- and D-amino acid sequences with at least 4 specifically defined residues, linear portions of branched sequences, DNA with at least 10 specifically defined nucleotides and nucleotide analogs.
Permitted sequences:
ST.25: DNA and protein sequences shorter than required length.
Prohibited sequences:
ST.26: DNA and protein sequences shorter than required length.
Organism designations:
ST.25: “artificial sequence” and “unknown” designations accepted.
ST.26: “synthetic construct” and “unidentified” designations accepted.
While some of the items above will be addressed by the WIPO sequence tool, it is important to understand these items when creating a ST.26 sequence listing. Furthermore, the differences complicate the process of converting a ST.25 sequence listing to ST.26 and mean that a straight conversion will not always be possible. We’ll break down some of these complications in a future blog.
WIPO will continue to provide webinars/training on the new standard and the use of their WIPO sequence tool which is available for download and trial here. I encourage everyone to download and test the tool prior to the transition in January 2022.
ST.26 - A Brief History
How Did We Get Here?
In my last post, we talked about what ST.26 is. Now let’s take a closer look at the timeline since its proposal leading up to its implementation on January 1, 2022.
It all began in October 2010 when the Sequence Listing Task Force was created to “Prepare a recommendation on the presentation of nucleotide and amino acid sequence listings based on the eXtensible Markup Language (XML) for adoption as a WIPO standard. The proposal of the new WIPO standard should be presented along with a report on the impact of the said standard on the current WIPO Standard ST.25, including the proposed necessary changes to the Standard ST.25.”
The EPO was designated the Task Force Leader for the standard and after several years of discussion, the standard entitled WIPO ST.26 was formally adopted in March 2016.
Once the standard was adopted, conversations shifted to the transition options, as well as the requirements of an authoring/validation tool.
In May/June of 2017 at the fifth session of the CWS, the Task Force decided that the transition would proceed under the “big bang” scenario in which all Intellectual Property Offices (IPOs) would transition at the same time and that the date of transition would be January 2022.
Also at the fifth session, the International Bureau informed the CWS that it would develop a common authoring and validation software. As it was envisioned, the software tool would be used by both applicants and IPOs to prepare and validate Sequence Listings.
At the sixth and seventh sessions held in October 2018 and July 2019, additional revisions were made to the standard to address free text qualifiers and the option for the inclusion of language dependent free text within the XML Sequence Listing. The seventh session also discussed the status of the Sequence tool. The session also accepted a revised WIPO Standard which was published as version 1.3.
After the seventh session, conversations and meetings were moved online to discuss the language dependent free text and feature location formats to align with various databases including UniProt. The International Bureau also requested that IPOs provide their implementation plans, to which 22 offices responded and 4 had already provided.
In July 2020 the Task Force met online to discuss the modifications to the main body and Annex C of the Administration instructions which will be necessary to permit the filing and processing of international applications with WIPO ST.26-compliant sequence listings from January 1, 2022.
The PCT Working Group in October 2020 discussed the proposals to amend the PCT Regulations with their view to submit the amendments to the PCT assembly for consideration at its next session in the first half of 2021. The International Bureau will also work and consult with the IPOs to ensure legal provisions are in place to enter ST.26 into force on January 1, 2022.
What is ST.26?
Or How to Stop Worrying and Learn to Love ST.26
The current Sequence Listing standard, ST.25, was implemented in 1998, over 20 years ago. In the field of biotechnology, 22 years equates to decades of innovation and changes in patent disclosure. As such, a new standard, abbreviated WIPO ST.26, was proposed and approved for implementation on January 1, 2022.
Unique to this standard is the new format which makes the shift from an ASCII text file to an XML file. Additional sequence requirements have been made to capture sequences and molecule types not required in current Sequence Listings.
The goal of the new XML format is to facilitate the transfer and sharing of sequence information across patent offices and database producers which will also allow for more comprehensive sequence searches. The standardization should also allow for applicants to prepare and file the same Sequence Listing across multiple international, national and regional patent offices.
To give a quick glimpse at how dramatic the format change is, here is an example of a short DNA sequence in the old and new format:
Current format: ST.25
<210> 1 <211> 17 <212> DNA <213> Artificial Sequence <220> <223> primer <400> 1 atgcgtccgg cgtagag 17
New format: ST.26
<ST26SequenceListing dtdVersion="V1_3" fileName="Sample ST.26" softwareName="WIPO Sequence" softwareVersion="1.1.0-beta2" productionDate="2021-02-15"> <ApplicationIdentification> <IPOfficeCode>US</IPOfficeCode> <ApplicationNumberText>77/777,777</ApplicationNumberText> <FilingDate></FilingDate> </ApplicationIdentification> <ApplicantFileReference>SAMPLE</ApplicantFileReference> <ApplicantName languageCode="en">APPLICANT</ApplicantName> <InventionTitle languageCode="en">TITLE</InventionTitle> <SequenceTotalQuantity>1</SequenceTotalQuantity> <SequenceData sequenceIDNumber="1"> <INSDSeq> <INSDSeq_length>17</INSDSeq_length> <INSDSeq_moltype>DNA</INSDSeq_moltype> <INSDSeq_division>PAT</INSDSeq_division> <INSDSeq_feature-table> <INSDFeature> <INSDFeature_key>source</INSDFeature_key> <INSDFeature_location>1..17</INSDFeature_location> <INSDFeature_quals> <INSDQualifier> <INSDQualifier_name>mol_type</INSDQualifier_name> <INSDQualifier_value>other DNA</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>organism</INSDQualifier_name> <INSDQualifier_value>synthetic construct</INSDQualifier_value> </INSDQualifier> </INSDFeature_quals> </INSDFeature> </INSDSeq_feature-table> <INSDSeq_sequence>atgcgtccggcgtagag</INSDSeq_sequence> </INSDSeq> </SequenceData> </ST26SequenceListing>