colas (table)
Details
Rows
2,500,000+
Columns
67
Data size
380MB+
Updates
Daily
Description
This table contains Certificates of Label Approval (COLAs) issued by the Alcohol and Tobacco Tax and Trade Bureau (TTB) to producers and importers of alcohol beverages. These records are sourced from the TTB's own Public COLA Registry.
COLAs from 2005 through yesterday are included. Features are extracted directly from the registry, or derived through various post-processing methods including barcode extraction, image text-extraction (OCR), text parsing, and LLM inference.
Relations
cola_images (via ttb_id)
Each COLA can have one or more cola_images.
Physically submitted COLAs have no associated images, as the entire form is a PDF with the images embedded within it.
cola_image_barcodes (via ttb_id)
Each COLA can have one or more cola_image_ barcodes
extracted from its associated images.
ttb_permittees (via permit_number)
Each COLA belongs to a permit_number
, which ties out to a company name and location, though these details are also included in the COLA itself.
Key Filters
is_form_physical
- indicates whether the COLA was submitted as physical mail to the TTB. Physically submitted COLAs make up <1% of submissions, and lack several other derived features as the TTB only provides as PDF scan of the document.
is_resubmission
- indicates whether the COLA is a resubmission of a previous COLA, related via the for_resubmission_ttb_id
Additional Considerations
Repeat COLAs
Imported products can have their labels submitted and approved multiple times by different importers, often operating in different states.
COLAs not Required
Regulatory requirements are complex, and there are several reasons why a product might not require a COLA. The three most common reasons are (and this is not legal advice!):
- Domestic products that are not sold across state lines do not require COLAs.
- Products falling outside the three categories of malt beverage, wine or distilled spirits, such as hard seltzers made without malt.
- Products covered by another COLA without significant enough labelling differences to warrant a new application.
Columns
ttb_id (TEXT)
Unique identifier of a COLA provided by the TTB and used in the Public Registry Search
application_type (TEXT)
The purpose of the application; approval or exemption.
application_status (TEXT)
The current status of the application: approved, revoked, surrendered, or expired.
is_distinctive_container (BOOLEAN)
Indicates if the intended container is unusual and requires specific approval.
for_distinctive_capacity (TEXT)
The volume of the distinctive container as free text entered on the application.
is_resubmission (BOOLEAN)
Indicates if the COLA is a resubmission of a previous COLA.
for_resubmission_ttb_id (TEXT)
The ttb_id of the previous COLA.
for_exemption_state (TEXT)
For exemption applications, the US state abbreviation where the product will be exclusively sold.
approval_qualifications (TEXT)
A large text block of qualifying statements by the TTB relating to specific conditions of approval.
off_label_information (TEXT)
Manufacturer-specified details about product information appearing on the container but not included on the provided labels.
is_form_physical (BOOLEAN)
Indicates whether the application was submitted as a physical form rather than a digital one. Physically submitted COLAs do not have associated imagery, and lack several other features.
form_image_s3_key (TEXT)
The S3 key to the form document scan image of physically submitted COLAs.
application_date (DATE)
The date when the application was made.
approval_date (DATE)
The date when the application was approved.
expiration_date (DATE)
The date the approval expires, when applicable.
latest_update_date (DATE)
The latest date in the process, either an update, application or approval. The 'completed date' of the COLA Search Registry.
product_name (TEXT)
The 'fanciful name' in the COLA Search Registry. With additional logic for missing names, or names placed only in the brand_name field.
brand_name (TEXT)
The 'brand name' in the COLA Search Registry. With additional logic for apparent product_names placed in the brand_name field.
permit_number (TEXT)
The plant registry, basic permit, or brewers number of the applicant if applicable. Strongly indicates business identity.
origin_id (TEXT)
TTB fixed set of origins code.
origin_name (TEXT)
TTB fixed set of origins descriptive name.
class_id (TEXT)
TTB fixed set of classes code.
class_name (TEXT)
TTB fixed set of classes name.
product_type (TEXT)
The type of alcohol product; malt beverage, distilled spirits, or wine.
domestic_or_imported (TEXT)
Label indicating whether the product is domestic or imported.
grape_varietals (ARRAY)
The varietal of wine grapes where applicable. Drawn from both the COLA and LLM interpretation of label text, semi-standardized.
wine_vintage_year (NUMBER)
The vintage year of wine and liquor products drawn from both the COLA and LLM interpretation of label text.
wine_appellation (TEXT)
The appellation of wines where applicable. Drawn from both the COLA and LLM interpretation of label text.
formula_code (TEXT)
A code relating to formulation approvals.
applicant_name (TEXT)
The name of the applicant.
applicant_phone_number (TEXT)
The business phone number of the applicant.
address_text (TEXT)
The business address of the applicant.
address_recipient (TEXT)
The business recipient extracted from the applicant address (the first line).
address_zip_code (TEXT)
The zip code extracted from the business address.
address_state (TEXT)
The US state abbreviation extracted from the business address.
ocr_abv (NUMBER)
The ABV extracted using OCR over associated COLA Images.
ocr_abv_ttb_image_id (TEXT)
The ttb_image_id of the COLA Image from which the ABV was extracted.
ocr_volume (NUMBER)
The volume numeric quantity extracted using OCR over associated COLA Images.
ocr_volume_unit (TEXT)
The units of volume extracted using OCR over associated COLA Images.
ocr_volume_ttb_image_id (TEXT)
The ttb_image_id of the COLA Image from which the volume and volume units were extracted.
main_ttb_image_id (TEXT)
The ttb_image_id foreign key of the front image, or a fallback if no image is labeled as the front.
main_image_s3_key (TEXT)
The s3_key key of the front image, or a fallback if no image is labeled as the front.
image_count (NUMBER)
The number of associated label images, not counting form images
image_count_broken (NUMBER)
The number of associated label images that are not openable with standard Python libraries.
has_front_image (BOOLEAN)
Indicates if the COLA has a front (or top of keg) label image.
has_back_image (BOOLEAN)
Indicates if the COLA has a back label image.
has_neck_image (BOOLEAN)
Indicates if the COLA has a neck label image.
has_strip_image (BOOLEAN)
Indicates if the COLA has a strip label image.
barcode_type (TEXT)
The barcode type found in the COLA's images, like upca. Resolved to a single value if there are multiples.
barcode_value (TEXT)
The value of the barcode found in the COLA's images, like 123456789012.
ttb_image_barcode_id (TEXT)
Unique identifier of the barcode by concatenating the ttb_image_id, and index of the barcode within the image.
qrcode_url (TEXT)
A URL extracted from the text of QR-code type barcodes in the COLA's images. Likely the company's website, or a shortened version thereof.
llm_container_type (TEXT)
The type of container inferred from the label text, like can, bottle, or keg.
llm_product_description (TEXT)
A free-text description of the product inferred from the label text.
llm_tasting_notes (TEXT)
A free-text description of the tasting notes for the product inferred from the label text.
llm_brand_established_year (NUMBER)
The year the brand was established inferred from the label text.
llm_category (TEXT)
Name of a fixed hierarchical category inferred from the label text and the categories table.
llm_category_path (TEXT)
Full path through the hierarchical categories to this category.
llm_tasting_note_flavors (ARRAY)
Array of tasting note flavors inferred from the label text.
llm_artwork_credit (TEXT)
Free-text crediting an artist or designer for designs on the can.
llm_wine_designation (TEXT)
Free-text representing any special designations applied to wines, inferred from the label text.
llm_beer_ibu (NUMBER)
International Bitterness Units - a measure of bitterness in beers ranging from ~5-120, inferred from the label text.
llm_beer_hops_varieties (ARRAY)
Array of beer hops names inferred from the label text of beer products.
llm_liquor_aged_years (NUMBER)
Number of years a spirit was aged for inferred from the label text.
llm_liquor_finishing_process (TEXT)
Free-text detailing any finishing process used in the production of a spirit, inferred from the label text.
llm_liquor_grains (ARRAY)
Array of grains used in the production of a spirit, inferred from the label text.