What COLA Cloud is
The TTB's Public COLA Registry is the federal database of every alcoholic beverage label approved for sale in the United States. It's the largest, most granular source of structured information on the US alcohol industry — 2.9 million records going back to 2005, with roughly 2,500 new approvals every week.
It's also a 1990s-era government website. Search is brutally slow. Bulk export doesn't exist. Label images are buried behind multiple clicks. There's no API. Anyone who wants to actually use this data ends up writing their own scraper and starting over.
COLA Cloud is the version of that data product I wanted to exist. We scrape the registry daily, normalize the records, run every label image through OCR and barcode extraction, apply LLM-powered categorization, and serve it through a clean REST API, SDKs, a CLI, an MCP server, and a Snowflake data share for bulk customers. Alongside the paid product we publish a growing family of free, CC0-licensed datasets at colacloud.us/ttb-data-pulse.
Who's behind it
I'm Jay Sobel, an analytics engineer with several years at Drizly and Gopuff — two of the leading e-commerce companies in alcohol. My work has centered on making messy external datasets usable inside a business, increasingly in service of AI applications.
COLA Cloud represents that work applied to a public dataset: data pipelines, data modeling, and AI-based feature extraction, packaged as a product.