This page outlines the syntax and semantics of MongoSQL, a SQL dialect that the SQL Interface uses to retrieve SQL from users or tools to then translate those statements into MQL (MongoDB Query Language). This page lists and describes supported clauses, operators, expressions, and functions.
Compatibility and Limitations
MongoSQL is based on the SQL-92 standard. However MongoSQL is not fully SQL-92 compatible due to the following limitations:
The
date
data type is not supported. Usetimestamp
instead.Interval and date interval arithmetic are not supported.
MongoSQL does not support MongoDB Vector Search and MongoDB Search.
Type System
Data Types
MongoSQL data types are the set of BSON types. All of these types can be queried in MongoSQL. They are:
String (
STRING
)Document (
DOCUMENT
)Array (
ARRAY
)BinData (
BINDATA
)ObjectId (
OBJECTID
)Boolean (
BOOL
)Date (
TIMESTAMP
)Null (
NULL
)Regex (
REGEX
)32-bit integer (
INT
)Double (
DOUBLE
)Long (
LONG
)Timestamp (
BSON_TIMESTAMP
)Decimal (
DECIMAL
)MinKey (
MINKEY
)MaxKey (
MAXKEY
)DBPointer (
DBPOINTER
)Symbol (
SYMBOL
)Javascript with scope (
JAVASCRIPTWITHSCOPE
)JavaScript (
JAVASCRIPT
)
Each type in MongoSQL has a name (in parentheses above), which is a keyword that can be used to
reference the type when necessary, such as in an expression like CAST
.
Type Conversions
Explicit type conversions are expressed via the CAST
function or the ::
operator.
Numeric types are all mutually comparable;
MongoSQL allows operations between the various numeric types without casting the operands to be the same numeric type.
Schemas
MongoSQL converts MongoDB's flexible document values into types using a schema. An MongoSQL schema is a collection of facts about an expression or collection that are known to be true at compile time.
For example, an MongoSQL schema might dictate that an expression is either a boolean or a document with subfields, or that an expression is either an array with a length of one or a positive integer.
If a static type constraint is not satisfied, the query will fail to compile.
Schema management differs between the On-Premise manual tool and Atlas automated tool.
Clauses
MongoSQL queries support a basic set of SQL clauses. The available clauses are:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
OFFSET
LIMIT
SELECT
SELECT
begins every Atlas SQL query.
MongoSQL allows SELECT VALUE
and SELECT VALUES
to be used interchangeably.
MongoSQL requires nested SELECT
statements to have an alias.
SELECT foo FROM (SELECT bar FROM baz) as subSelect
SELECT DISTINCT
Use SELECT DISTINCT
to exclude duplicate rows from the result set.
Duplicate checking follows MongoDB equality semantics, where field
order matters for document comparison and both element order and value
matter for array comparison.
SELECT DISTINCT bar FROM baz
CAST()
MongoSQL supports the CAST()
function, which allows you to
dynamically convert values in your query to a given data type.
SELECT * FROM table WHERE period_start_utc >= CAST('2023-01-01T00:00:00.000Z' AS TIMESTAMP)
FROM
FROM
is the first clause evaluated in every MongoSQL query.
FROM
can pull data from various sources including collections
(SELECT * FROM foo)
, arrays (SELECT * FROM [{'a': 1}])
, joins
(SELECT * FROM a JOIN b)
, derived tables
(SELECT * FROM (SELECT a FROM foo) d)
, and FLATTEN and UNWIND.
WHERE
The WHERE
clause is a filter on incoming data.
Its expression must statically have type BOOL
or NULL
and may
evaluate to MISSING
.
GROUP BY
GROUP BY
provides a means for grouping and aggregating data.
If using GROUP BY
on a field that has been flattened using FLATTEN or UNWIND,
an alias is required.
SELECT customerAge, COUNT(*) FROM Sales GROUP BY customer.age AS customerAge
Aggregation Functions
MongoSQL supports the following aggregation functions.
Name | Description | Notes |
---|---|---|
| Pushes the argument to the end of an array. The total output of this function will be an array. | The argument to |
| Pushes the argument to the end of an array removing duplicates.
The total output of this function will be an array with all duplicate items removed.
Duplicates are determined using the | The argument to |
| Returns the average of all the arguments. | The argument must be statically typed to a numeric type. |
| Counts the number of elements.
| The argument to |
| Returns the first element in the group. | Deterministic only when the input has deterministic order, otherwise undefined. |
| Returns the first element in the group. Deterministic only when the input has deterministic order, otherwise undefined. | The argument to |
| Returns the maximum element as ordered by the MongoSQL | The argument must be statically typed to be comparable via the |
| Returns a document formed by successively merging documents,
with the previous element used as the left hand side.
In the case of duplicate keys, the value of the key in the new element is kept.
As with | The argument must be statically typed as |
| Returns the minimum element as ordered by the MongoSQL | The argument must be statically typed to be comparable via the |
| Returns the standard deviation of all elements over the entire group population. | The argument must be statically typed to a numeric type. See stdDevPop. |
| Returns the standard deviation of a sample of all elements in the group. See stdDevPop. | The argument must be statically typed to a numeric type. |
| Returns the sum of all the arguments. | The argument must be statically typed to a numeric type. |
HAVING
The HAVING
clause operates in the same way as a WHERE
clause, but after the GROUP BY
clause.
Like the WHERE
clause, the HAVING
clause takes an expression
that must statically have type BOOL
or NULL
and may evaluate to MISSING
.
It can reference aliases defined in the GROUP BY
and can contain expressions with aggregation functions.
Only aliases defined in the GROUP BY
are available to the HAVING
clause.
ORDER BY
The ORDER BY
clause provides a way to order a result set by one or more sort keys.
Each sort key can be either a column reference,
or an integer literal referring to a SELECT
expression by its position in the select expression list.
Sort keys that are column references can be compound identifiers.
These compound identifiers can be qualified with datasource names or refer to document subfields.
MongoSQL sorts MISSING
before NULL
, and NULL
before all other values.
The ORDER BY
clause requires that all possible values in a sort key expression
can be statically verified to be comparable via the >
(greater than) and <
(less than) operators.
LIMIT and OFFSET
The LIMIT
and OFFSET
clauses allow users to retrieve only some of the rows returned by a query.
If a LIMIT
number is provided, no more than that number of rows will be returned.
If an OFFSET
number is provided, that number of rows is skipped before returning rows.
Both LIMIT
and OFFSET
numbers must be positive integers.
Using LIMIT
or OFFSET
without ORDER BY
does not guarantee the same result.
When LIMIT
and OFFSET
are both set,
the OFFSET
rows will be skipped before returning the rest of the results,
which should contain no more than the LIMIT
number rows.
LIMIT i, j
is a shorter form of LIMIT i OFFSET j
.
LIMIT
and OFFSET
can be used in subqueries.
Set operations
The UNION
and UNION ALL
set operators return a single result set
for two SELECT
queries. The UNION
operator removes duplicate
rows from the result set, while the UNION ALL
operator doesn't
remove duplicate rows from the result set. The result set returned by
these operators doesn't have a defined order.
MongoSQL does not support INTERSECT
or EXCEPT
set operations.
Expressions
Identifiers
Identifiers in MongoSQL refer to databases, tables, and columns.
MongoSQL identifiers support all UTF-8 characters except the null character \x00
.
In MongoSQL, some identifiers are restricted to avoid conflicts
with characters that have other semantic meaning; for an identifier to include such a character, it must be delimited,
that is, surrounded by double quotes or backticks.
For example, an identifier must be delimited if it begins with a digit or if it conflicts with a reserved keyword
(e.g. "10cent"
).
Identifiers are case-sensitive, whether delimited or not.
Aliases
Identifiers are used for all aliases in MongoSQL.
In most cases, MongoSQL returns an error if an alias is used more than once in the same clause.
The exception to this is that aliases can be repeated on both sides of a UNION ALL
.
This also applies to automatically generated aliases.
Keywords
MongoSQL keywords (such as SELECT
, FROM
, JOIN
, etc.) cannot be used as undelimited identifiers.
Literals
MongoSQL supports literals for booleans, null, numbers, and strings.
Strings are enclosed in single quotes.
To include a single quote character in a string literal, double it ('o''clock'
).
Literal integers are typed as INT
when within the 32-bit signed integer range and LONG
otherwise.
Literal floating point numbers or scientific notation numbers have type DOUBLE
.
Note
MongoSQL supports implicit type conversions of extended JSON-encoded strings
into their corresponding type. As MongoSQL supports all BSON types, and all
BSON types can be represented as extended JSON, you can include
literal values of any type in a query. That is, you can include an extended
JSON string value anywhere an expression is expected and MongoSQL automatically
converts to the corresponding literal type. For example, SELECT
'{"$numberInt": "1"}' + 2 FROM foo
interprets as SELECT 1 + 2 FROM foo
.
This is particularly useful for datetime literals. For example, SELECT * FROM
foo WHERE myDate > '{"$date": "1995-06-28T03:05:00.000Z"}'
.
We recommend that you use implicit extended JSON
instead of the
CAST()
function (or its shorthand operator, ::
) for writing
literal values. For example,
we don't recommend CAST('1995-06-28T03:05:00.000Z' AS TIMESTAMP)
to include a literal datetime value in a query. Using CAST()
or
::
is valid and works, but might have negative performance
implications, especially when used in a WHERE
clause. Extended
JSON implicit conversion doesn't have any negative performance
impact.
We recommend that you use CAST()
in a WHERE
clause when you
want to ensure type safety and avoid ambiguity.
Parenthesized Expressions
A parenthesized expression is an expression grouped by parentheses.
Any time infix operators are present, the need for parentheses (or a similar mechanism) to distinguish order of operations may be necessary.
MongoSQL has several infix operators, such as +
and ::
.
For example, the value of 1 + 2 * 3
is 7, while the value of (1 + 2) * 3
is 9.
Operators
MongoSQL supports the following basic operators:
+
-
*
/
||
<
<=
!=
==
>
>=
BETWEEN
AND
OR
NOT
Subquery Expressions
A subquery is a SQL query within a query. You can use a subquery anywhere an expression can be used.
MongoSQL supports scalar subquery and table subquery. A scalar subquery returns a result set with zero or one row and one column. It can be used in most places a literal or single column value is valid. A table subquery returns zero or more rows and one or more columns.
Document and Field-Access Expressions
Documents can be represented with a syntax similar to JSON objects. Keys must be strings and values can have any of the supported types. To access document fields, MongoSQL supports two options: "dot" notation and "bracket" notation.
Dot notation is similar to field access in MongoDB aggregation.
For example, if a document doc
contains a field f
, then the expression doc.f
is used to access the value of that field.
Bracket notation uses square brackets ([
and ]
) around a field name to access the field with that name.
For example, consider the same document described before: doc["f"]
is used to access the value of that field.
Null and Missing
BSON distinguishes between NULL
and MISSING
.
In the case of NULL
there is a field with the literal value NULL
,
whereas in the case of MISSING
, the field is gone.
Comments
Comments are sequences of characters within queries that do not impact query execution. MongoSQL supports both standard SQL comments and C-style block comments.
Standard SQL comments begin with double dashes and end with a new line:
\-- This is a standard SQL comment
Block comments begin with \*
and end at the matching occurrence of */
.
\* This is a multiline comment */