• CAS
  • |
  • C&EN
  • |
  • Journals
  • |
  • ACS

search site
Advanced Search »
  • Home
  • |
  • About CAS
    • CAS Media Library
    • CAS Quotes
    • Colors of Chemistry
    • 100th Anniversary Celebration
    • Careers at CAS
    • FAQs
    • Directions to CAS
    • Contact Us
  • |
  • Our Expertise
    • CAS Databases
    • Value Added Tools
    • Technical Service and Support
  • |
  • Solutions
    • Researchers
    • IP Professionals
    • Information Professionals
    • Academics
  • |
  • Products & Services
    • SciFinder
    • STN Family of Products
    • Science IP
    • CAS Client Services
    • CAS Document Detective Service
    • CD Products
    • Print Products
  • |
  • Support & Training
    • SciFinder
    • SciFinder Scholar
    • STN
    • STN Express
    • STN AnaVist
    • STN Viewer
    • STN on the Web
    • STN Easy
    • CAS Customer Care
  • |
  • News & Events
    • What's New
    • Press Room
    • News Releases
    • In the News
    • Trade Shows
Home   •   Our Expertise  •  CAS Databases  •  Registry  •  Coverage of Sequences
CAS Coverage of Sequences

CAS indexing practices change over time in response to scientific developments. In the 1990s, with the rapid advance of biotechnology and the "Genomic Revolution," CAS began to register not only sequences we encountered in the process of analyzing journal literature and patents but also all sequences added to GenBank, even those not reported in the literature.  CAS revised its coverage policy in 2005.  From that point, CAS (1) limited the registration of GenBank sequences to those reported in the journal literature or patents and (2) stopped adding to CAS REGISTRYSM any sequences from patents that contained more than approximately 4,000 sequences.  In 2007, CAS stopped adding to CAS REGISTRY GenBank sequences in journal articles that referenced more than 1,000 GenBank accession numbers.

Effect of the Policy Change

Only 172 patents and 326 journal articles contain sequences exceeding these established limits.  CAS continues to register a great number of sequences each year and indexes a great many articles and patents that report them.  In 2007, CAS added records for more than 1.2 million sequences to the CAS REGISTRY, which as of December 2007 contained a total of 59.6 million sequences.

Value of CAS Sequence Information

CAS is the only source that collects sequence and small organic molecule information from both journals and patents in one place.  Since CAS scientists analyze all documents, they can identify and add to our REGISTRY database unique content such as chemically modified sequences not deposited in GenBank.  In fact, CAS currently has referenced more than 540,000 chemically modified sequences and an additional 800,000 sequences from journals that are not deposited in GenBank, as well as more than 11.5 million patent sequences not in GenBank.

CAS sequence coverage dates back to 1907, with the vast majority of sequences (95%) having been added since 2000.  We hope this clarification of CAS practices for the registration of sequences and our coverage of the associated literature and patents will help our users gain the greatest value from our search services and the CAS databases.


More Information

For additional information on sequences, see:

  • Structure Searching for Small Sequences in the CAS REGISTRY File (~1.1 MB PDF)
  • STN User Documentation: Searching for Sequences 
Updated: 2/22/2008 7:23:06 AM
Home  |  About CAS  |  Our Expertise  |  Solutions  |  Products & Services  |  Support  |  News & Events
Copyright © 2008 American Chemical Society