ONYPHE Query Language (OQL)
The OQL can be used with the following APIs:
It allows to search through data using filters and boolean operators. A number of integrations exists in various languages if you want to avoid developping your own integration with our APIs. See integrations chapter.
You can either use it from the CLI tools or from the Web interface which leverages the Search API under the hood.
General OQL syntax
The syntax is the following:
category:<CATEGORY> filter1:<VALUE1> filter2:<VALUE2> -<FUNCTION1>:<FUNCTION_VALUE1> -<FUNCTION2>:<FUNCTION_VALUE2>
- category: you chose which category of information you want to query. For instance, category:datascan, category:vulnscan or any of all the other categories we have;
- filter: you can pass as many filters as you need;
- function: you may also pass as many functions as needed.
- Search historical data on a given domain & protocol:
category:datascan domain:google.com protocol:rdp -monthago:3
- Identify all exposed VPN servers in the last 30-days:
category:datascan device.class:“vpn server”
NOTE: field values are NOT case sensitive, while fields ARE case sensitive but always available as lowercase.
NOTE2: if you need to pass values containing space characters, you have to enclose values with double-quotes. Examples: device.class:“vpn server”, device.class:database.
Supported boolean operators
OQL supports the following boolean operators:
- AND: implied by default between all filters;
- NOT: by prefixing a field name with ! character;
- OR: by prefixing a field name with ? character.
NOTE: OR boolean operator is available starting from Lion Views.
- Search for some exposed protocol AND associated with a specific domain name:
category:datascan protocol:rdp domain:google.com
- Search on some domain for all identified assets, except on given organization:
category:datascan domain:google.com !organization:google
- Search on some domain for either rdp or ssh protocols:
category:datascan ?protocol:rdp ?protocol:ssh domain:google.com
Full-text search vs exact search
By default, all fields are searchable with exact values only. That means you have to correctly enter the value for a filter. For instance, to search against protocol:rdp, you have to give the exact rdp string.
For some specific fields, you can search in full-text way. The list is the following:
- data: the raw data we have collected. For instance, the raw application response to an application request;
- summary: the most imported words taken as a subset of the data field;
- app.http.title: the HTML title of an HTTP response body;
- app.http.description: the HTML description from an HTTP response body;
- app.http.keywords: the HTML keywords from an HTTP response body;
- app.http.copyright: the HTML copyright from an HTTP response body;
- title: used in category:pastries only, that’s the title of the paste.
Thus, only the aforementioned list of fields can be used to perform full-text searches, all the others only accept exact values.
- Full-text search on HTML title:
- Exact search of aforementioned software:
category:datascan app.http.component.productvendor:“Atlassian” app.http.component.product:“Confluence”
Listing all available filters
You can either navigate through the Web interface to find the fields that you need to refine your search, either from displayed tabs or from the JSON tab. In fact, all fields displayed in JSON output can be used as filters. You can also list all available filters from the User API.
IP vs CIDR or network searches
When you need to find assets on a specific network block, you can use CIDR notation. However, to avoid performing I/O intensive searches, you cannot specify networks larger than /16. You may use the splitsubnet CLI procedure to auto-split CIDR searches in smaller subnets.
- Search a specific IP address:
- Search a specific network block:
- You can even search for an entire ASN, if that makes sense to you:
Not all fields support CIDR searches. The following fields are capable of that:
- ip: asset IP address, that’s the one that has been connected and from where the data content comes from;
- alternativeip: DNS resolution from hostname bound to the given address;
- app.extract.ip: when some IP addresses were identified from the raw data content, we extract them and makes them searchable from this field.
NOTE: the subnet field is NOT capable of CIDR searches, you have to pivot from this field value and use the value against the ip field.
How hostnames are split
You know our approach at Attack Surface Discovery & Attack Surface Management is domain-based. To achieve that goal, we have to split hostnames (or Fully-Qualified-Domain-Names, or sometimes called subdomains) into different parts. Thus, we split a FQDN into the following fields:
- tld: the Top-Level-Domain part, being a 1st-level, a 2nd-level or even at the regional or sector levels. Example: net for sam.probe.onyphe.net;
- domain: the domain name, which includes the tld. Example: onyphe.net for sam.probe.onyphe.net;
- subdomains: when a FQDN has so many dots in it, we may have an array of values for this field. Example: probe.onyphe.net for sam.probe.onyphe.net;
- host: the hostname part, like sam for sam.probe.onyphe.net.
In the end, when you don’t want to know how to query for a specific domain-based value, you can always perform an OR query:
category:datascan ?domain:sam.probe.onyphe.net ?subdomains:sam.probe.onyphe.net ?hostname:sam.probe.onyphe.net
NOTE: to perform this split, we rely on a list of TLDs built from IANA list. Our list is also available on our Github.
To refine your searches, we have a number of available functions. They may help you go back in the past exposed assets, reverse the sort or search for specific assets with an existing fields.
NOTE: we have wildcard search support, but NOT regex search support as it is too intensive I/O-wise for a shared platform.
NOTE2: functions are only available with Enterprise licenses.
Time range functions
These functions allows you to search through historical data.
Query data collected some hours ago. The use case is to automate your searches every hour to search for specific gems on previous hour of collected information.
category:datascan protocol:rdp -hourago:1
To query current hour:
category:datascan protocol:rdp -hourago:0
NOTE: an hour starts at minute 00 and ends at minute 59.
NOTE2: you can increment the hour counter to as much as your license allows it. For Lynx Views, that number may be up-to 30-days of data, so -hourago:720.
In the same way, you may want to execute searches at the day granularity level. To query previous day of data:
category:datascan protocol:rdp -dayago:1
To query current day:
category:datascan protocol:rdp -dayago:0
NOTE: a day starts at 00:00 hour and ends at 23:59 hour.
NOTE2: you can increment the day counter to as much as your license allows it. For Lynx Views, that number may be up-to 30-days of data, so -dayago:30.
Same as before, at the week granularity level. To query previous week of data:
category:datascan protocol:rdp -weekago:1
To query current week:
category:datascan protocol:rdp -weekago:0
NOTE: a week starts on Monday at 00:00 and ends on Sunday at 23:59.
NOTE2: you can increment the week counter to as much as your license allows it. For Lynx Views, that number may be up-to 30-days of data, so -weekago:4.
Same as before, at the month granularity level. To query previous month of data:
category:datascan protocol:rdp -monthago:1
To query current month:
category:datascan protocol:rdp -monthago:0
NOTE: a month starts the 1st at 00:00 and ends last day of the month at 23:59.
NOTE2: you can increment the month counter to as much as your license allows it. For Lion Views, that number may be up-to 90-days of data, so -monthago:3.
Sometimes, you may want to query on the full time range allowed by your license. Please note that this function is subject to some limitations based on your license.
For instance, Eagle Views can use the -since:7M from Search API but not from the Export API. Griffin Views can use the full time range on all APIs, up-to 12-month of historical data. To search for all exposed rdp services on the full 7-month time range:
category:datascan protocol:rdp -since:7M
OQL also have the capability to search using wildcards. It is possible only against exact search fields, not against full-text search fields. Also, these functions have the same limitations as the -since function, you can only use it against last 30-days of data for Eagle Views but on full time range for Griffin Views.
Wildcards accept the same syntax as usual UNIX shells:
- ?: substitute exactly one unknown character;
- *: substitute zero or more unknown characters.
One of the use case for wildcard searches is to identify typosquatting or phishing hostnames or domains. You may want to identify domains that look like yours, or to search against all TLDs for a given domain:
- Search typosquatting against google.com:
category:resolver -wildcard:domain,g??gle.com !domain:google.com
- Search phishing hostnames for google.com:
category:datascan -wildcard:hostname,*.google.com.* -notwildcard:domain,google.*
WARNING: this request is I/O intensive. You may receive request timeout errors. Feel free to relaunch your search until it succeeds.
- Search all TLDs for google:
You may also want to pass multiple wildcard conditions. Simply replace your -wildcard functions with multiple -orwildcard functions:
- Search some typosquatting against google.com:
category:resolver -orwildcard:domain,g?ogle.com -orwildcard:domain,googl?.com !domain:google.com
You can even exclude some wildcards:
- Search some typosquatting against google.*:
category:resolver -orwildcard:domain,g?ogle.* -orwildcard:domain,googl?.* -notwildcard:domain,google.*
This function use case is to identify assets which a specific field set. For instance, you may want to identify assets with a CVE identified, whatever the CVE is. datascan & vulnscan categories are the most interesting categories to use that function against.
- Identified potential CVEs:
category:datascan domain:google.com -exists:cve
category:vulnscan domain:google.com -exists:cve
Does the opposite of -exists function. For instance, you may want to check an asset has been scanned for vulnerabilities and that they are not vulnerable.
- Identify not vulnerable to verified CVEs we check:
category:vulnscan domain:google.com -notexists:cve
You may also want to search for different existing fields with the -orexists function. A use case would be to search for an existing CVE or an existing product:
- Search for CVEs impacting an asset or identified CPEs:
category:vulnscan domain:google.com -orexists:cve -orexists:cpe
This function has been designed to reduce the volume of data before applying some local processing or to integrate within a SIEM where license price is based on volume of indexed data. Sometimes, you may only be interested in identifying IP addresses from a specific search, thus you want to receive only the ip field as a result.
- Fetch a list of IP addresses with some specific open ports and only fetch ip & port information:
category:synscan ?port:3389 ?port:3390 ?port:3391 -fields:ip,port
By default, latest result is displayed first on output. In some cases, you want to identify the older result.
- Search oldest compromised asset from ESXIargs compromissions:
category:datascan app.http.title:“How to Restore Your Files” -since:7M -sort:0
Forgetting to renew a certificate is a thing. Also, not decommissioning an asset is a thing. By searching for expired certificates, you can find some gems.
- Search expired certificates:
category:datascan domain:google.com -tlsexpired:1
Now, you may wonder how you can search for specific products or devices? The dorkpedia is for you. You also have a list of dorks to help you identify the most important risks exposed by your assets.