Chatbots with Snowflake Cortex Analyst

How Semantic Layers Power Modern Business Chatbots with Snowflake Cortex Analyst

We built many chatbots recently, with open source tools like Vanna.ai as query engine, Langchain, Langraph Agents and now with Snowflake Cortex AI.

While building chatbots in last 3 months one thing came clear, LLMs are not naturally capable of having context of your business and with vector databases it becomes ongoing maintenance to keep the quality high all the time.

Snowflakes Cortex Analyst made a good dent in managing all this backend hitchhiking and incorporating that in Product/Service whatever we call it.

The Foundation: Dimensions, Facts, and Metrics in Data Warehouses

At the heart of any robust business intelligence system lies a well-structured data warehouse. These repositories organise information using three fundamental components:

Dimensions provide context to business data. They represent the “who, what, where, when, and why” of business processes. In the marketing domain, dimensions might include:

Brand (Acme, Google, LinkedIn)
Channel (Facebook, Google, LinkedIn)
Campaign names (2023 Festive Campaign, 2024 ISA Campaign)
Customer IDs and Brand IDs for unique identification

Facts are the quantitative measurements that businesses track. These are the numerical values that answer “how many” or “how much” questions:

Impressions (number of times an ad was viewed)
Clicks (user interactions with promotional content)
Conversions (completed desired actions)
Revenue (money generated from campaigns)
Cost (expenses associated with marketing efforts)

Metrics are derived calculations that provide business insights based on facts and dimensions:

Conversion Rate (percentage of users who completed desired actions)
Return on Ad Spend (ROAS — revenue generated relative to advertising costs)
Year-over-Year Growth Rate (annual performance comparisons)
Market Share (proportion of total spend in a specific segment)

While these structures organise data effectively for technical users, they remain inaccessible to business users who lack SQL expertise or intimate knowledge of database schemas.

visit our blog https://warehows.ai/blog
For project enquiries write us on sales@warehows.io

Enter the Semantic Layer: The Bridge Between Data and Business Users

A semantic layer serves as a crucial abstraction layer that translates complex database structures into business-friendly terminology. Think of it as a business-oriented lens through which users view their data warehouse.

The semantic layer delivers several key advantages:

Business-Friendly Terminology: Instead of navigating cryptic table and column names, users interact with familiar business concepts.
Consistency in Definitions: Metrics like “Conversion Rate” are calculated consistently across reports and dashboards.
Query Optimization: The semantic layer can optimize queries behind the scenes, improving performance without requiring user intervention.
Enhanced Data Discovery: Users can explore related concepts and metrics without needing to understand the underlying data relationships.

In the YAML files we’re working with, we can see these principles in action. For example, the campaigns.yaml file defines dimensions like "BRAND" and "CHANNEL", facts like "IMPRESSIONS" and "CLICKS", and provides clear descriptions for each.

What makes this semantic layer particularly powerful is its incorporation of synonyms. The system recognizes that users might refer to “brand” as “company,” “manufacturer,” or “trademark” — all mapping to the same underlying data field.

How Chatbots Leverage Semantic Layers for Natural Language Queries

The convergence of semantic layers with conversational AI creates a particularly powerful combination. Users can now simply ask questions in plain English, and the system translates these natural language queries into precise database operations.

Here’s how this works behind the scenes:

Query Interpretation: When a user asks, “what was last 6 months revenue?”, the chatbot must interpret this as a request for aggregated revenue data with a specific time filter.
Intent Recognition: The system identifies the core business metric being requested (revenue) and the time dimension constraint (last 6 months).
Synonym Matching: Using the semantic layer’s synonym mappings, the system recognises that “revenue” might also be called “sales,” “income,” or “earnings” in the database.
Query Construction: The chatbot translates the natural language request into a structured query, applying appropriate aggregations and filters.
Result Presentation: Finally, the system returns a user-friendly response: “3357.34” (as shown in screenshot).

The Role of Verified Queries in Building Trust

One of the most intriguing aspects of the files you’ve shared is the inclusion of “verified_queries” sections. These represent pre-validated questions and their corresponding SQL implementations, which serve several critical functions:

Training Data: They provide examples to help the NLP engine understand the relationship between natural language questions and SQL queries.
Quality Assurance: Each verified query has been reviewed by a human expert (e.g., “verified_by: PRANIT”) to ensure accuracy.
Performance Optimisation: Common questions can be pre-optimized for faster response times.
Consistency Check: They establish a baseline for how specific business questions should be interpreted.

For example, in the media_spends_sl.yaml file, we see verified queries like "What was the total digital spend in United States by year?" mapped to appropriate SQL statements that properly join tables and apply the correct filters.

Backend Search: How the System Finds the Right Answers

When a user submits a question to the chatbot, a sophisticated search process begins:

Document Parsing: The system examines all available semantic layer definitions (from the YAML files).
Relevance Determination: It identifies which dimensions, facts, and metrics are most relevant to the query.
Synonym Expansion: The query terms are expanded using synonym dictionaries to capture variations in terminology.
SQL Generation: Using all this context, the system constructs an optimized SQL query.
Execution and Return: The query runs against the data warehouse, and results are formatted for presentation.

The power of this approach is evident when handling ambiguous requests. For instance, if a user asks about “market performance,” the system can recognise that “market” might refer to geographical regions and “performance” could map to various metrics like revenue, growth, or market share.

Conclusion: The Future of Data Interaction

The combination of semantic layers, natural language processing, and conversational interfaces represents a fundamental shift in how businesses interact with their data. No longer restricted to technical users with query language expertise, business intelligence becomes accessible to everyone in the organization.

The real power comes from eliminating the translation layer between business questions and data answers. When marketing executives can simply ask, “Which channel delivered the highest conversion rate last quarter?” and receive immediate, accurate responses, the entire organization becomes more agile and data-driven.

As these technologies continue to evolve, we can expect even more sophisticated capabilities:

Proactive insights that anticipate user questions
Multimodal responses that combine text, visualizations, and interactive elements
Context-aware conversations that build upon previous questions and insights

For organizations looking to democratize data access and foster a culture of informed decision-making, implementing semantic layers and conversational interfaces should be high on the strategic priority list.

Write to us