Newsletter #3

Time has passed since newsletter #2, and we are proud of announcing major new features and new information categories.

In the past, we were scanning 10 ports on the Internet, and we have now raised the bar to 30 ports. We have added new information categories like ctl (Certificate Transpancy Logs), sniffer (listening to Internet background noise) and onionscan (Dark Web scanning). We have also enriched datascan information category with X509 certificates data when an SSL/TLS connection is negociated, threatlist information category now has some ONYPHE entries for botnets (mirai-like, for instance) and resolver information category is enriched with geolocation information. A number of new extractions are also performed on data, like extracting the HTML description, copyright or keywords.

But the most important feature is the capability to classify a remote device and identify its vendor, product or even productversion. For instance, we are able to identify if a source is, say, a Mikrotik device, which product and in some cases its exact productversion. The same is true for a fair number of other device or product vendors. The consequence of this version identification is that we are able to add a cpe filter allowing users to search for CVE vulnerabilities by just using that information returned from most new datascan entries.


More ports scanned

We used to scan 10 ports and we have now reached the 30 ports milestone. They are scanned once a month on the full IPv4 Internet address space. The list is the following:

80/tcp (http), 443/tcp (https), 7547/tcp (tr069), 8080/tcp (http), 22/tcp (ssh), 21/tcp (ftp), 25/tcp (smtp), 53/tcp (dns), 110/tcp (pop3), 8000/tcp (http), 3306/tcp (mysql), 23/tcp (telnet), 3389/tcp (rdp), 554/tcp (rtsp), 111/tcp (rpc), 8888/tcp (http), 5000/tcp (upnp), 1521/tcp (oracle), 3128/tcp (http), 135/tcp (msrpc), 5555/tcp (adb), 5900/tcp (vnc), 9200/tcp (elasticsearch), 1433/tcp (mssql), 139/tcp (netbios), 2323/tcp (telnet), 445/tcp (smb), 502/tcp (modbus), 102/tcp (s7comm), 11211/tcp (memchached).

That does not mean we are only scanning these ports, it is just that we guarantee a once a month frequency. We are also scanning other ports with no specific algorithm, just for the love of research.


New information category: ctl

We have started integrating Certificate Transparency Logs. As it is quite a huge quantity of data, we are monitoring all the Cloudflare Nimbus (all years) logs at the moment. We will monitor more CTLs in the future.

By monitoring CTLs, we also perform massive DNS requests to enrich resolver information category. That is, our passive DNS technology is now enriched with information gathered from CTLs.

As a mater of fact, and to give a proportion of standard DNS requests versus those taken from CTLs, we can say that resolver information category is now filled with 22% of DNS information gathered thanks to CTLs. As we also perform reverse DNS requets for the full IPv4 address space, we can say that this source involves around 75% of DNS data. The rest is shared between DNS requests performed from extracting hostnames and IP addresses from pastries or sniffer information categories.

Example of a CTL entry from ctl information category:

category:ctl tld:fr


New information category: sniffer

Since May this year, we have started to listen to Internet background noise. We are listening for both TCP and UDP traffic and are performing passive OS fingerprinting on TCP with our own technology.

We detect patterns for some botnets (such as mirai-like ones), and when we detect them, we launch a synscan along with a datascan against the remote device.

For instance, we perform active datascan requests when a mirai-like signature is detected against the potentially infected host. Thus, application data is written to the datascan information category with a mirai tag, and the same is true for the synscan information category. As we enrich other information categories, we add a tag to keep track of that information when creating synscan or datascan entries.

From that activity, we have created our threatlist called “ONYPHE – botnet/mirai”. Thus, you can use the Web search or the API to search potentially infected hosts. For instance, by running the following search (as long as you have proper credentials) you can find infected hosts in France:

category:threatlist country:FR threatlist:"ONYPHE - botnet/mirai"

Or you can analyze the data returned from datascan information category by using the mirai tag:

category:datascan tag:mirai


New information category: onionscan

In fact, we were already scanning the Dark Web but it was integrated within the datascan information category. As we didn’t want to participate in potentially illegal activities, we chose to create a brand new information category to control which entities may be able to access that content. The new information category is called onionscan and is not accessible with free credentials.

The fields you may find in this category are the same ones you could find in the datascan information category, but in onionscan information category you will only get HTTP protocol related information. You can use the data filter to perform searches on content gathered from scanned .onion Web sites.

We are also working on automatically classifying Dark Web content in order to better control which ones may be accessible. For instance, we don’t want to participate in pedo-pornography, thus we want to be able to state that a .onion Web site is classified as such and filter out this data.

To search for string contained in .onion Web sites, just use the data filter:

category:onionscan data:market

Well, this one has even some cryptomining JavaScript from CoinHive, and it is tagged as such in the tag filter:


New fields for resolver information category

As we have seen, we gather DNS information from our different information categories. We needed a way to know which source gathered that information, and we decided to add a source filter. This filter states from which information category the DNS lookup happened. For instance, when we listen to the Internet background noise, we perform reverse PTR lookups. Thus, data is written to resolver information category with a sniffer value for the source field.

Also, we added geolocation information to resolver information category entries. Now, you will be able to search from asn, country or city (to name a few) on resolver information category.

category:resolver country:FR source:sniffer


New fields for pastries information category

Since July, pastries information category has been enriched with new fields: subdomains, host and tld. When a hostname is extracted from pastries content, we also split this hostname into multiple new fields to make it easy to search specific values within the pastries information category.

For example, searching all pastries within the .fr TLD is as simple as running the search:

category:pastries tld:fr


New data enrichments

From the datascan information category, we now also extract HTML copyright, description or keywords information. These fields are searchable via app.http.copyright, app.http.description or app.http.keywords along with already existing filters app.http.title or app.http.realm.

We added a new enrichment field: device. By using pattern matching, we are able to identify different information from a remote device, like its class, vendor, product or even productversion. This information is searchable by using filters device.class, device.product, device.productvendor or device.productversion.

For example, if you want to gather information on D-Link devices, just perform the following search (as long as you have the right credentials to access this kind of data):

category:datascan device.productvendor:"D-Link"

Well, it looks like this D-Link DCS-960L is infected by a mirai-like variant as you can see in the tag field. You can also note that the device is classified as a Router.

Regarding HTTPS datascan, we now extracts X509 certificate information.

For instance, looking for all certificates issued by a known CA is as easy as:

category:datascan tls:true issuer.organization:"Super Micro Computer Inc."

Well, another one infected by a mirai-like variant.

Which brings us to the next addition: the cpe filter. As we identify the product, productvendor and hopefully the productversion, we can set the cpe field and perform lookups on that normalized way of naming things to map with existing CVEs:

For instance, if we search for the latest libssh vulnerability:

category:datascan cpe:"cpe:/a:libssh:libssh:0.6.0"

And then search on the NIST portal:



You may have wondered why it took so long for us to write this newsletter. We trust that by getting knowledge of these new additions you understand that we were just working hard on those new features.

In our last newsletter, we claimed that we would unveil the commercial offer. Unfortunately, it is not true yet. Only a couple of weeks remaining before we can announce it.

Of course, you will be informed provided you have created your free user account at the below link. At the minimum, a free registration will allow you to use the current API:

Newsletter #2

Since the last newsletter, we have been working on a language to perform searches on the ONYPHE search engine. It has been put in production last week, that is, end of april. Thanks to that, new APIs are now also possible and have been made available.

But there is more, we have enriched threatlist and inetnum categories of data by adding geolocation information. We also added subnet information for any single IP address to synscan and datascan categories of data. It will make it easy to pivot on a host and find anything else on the same subnet.

Finally, we industrialized Dark Web scanning and added two new protocols we are now watching: RDP (Remote Desktop Protocol) and DNS (Domain Name System). This makes a total of 13 protocols we are able to identify, whatever the listening port.

ONYPHE query language

We are in the last mile regarding the launch of our commercial offer. We had to finalize the query language, or the filters, before being able to sell the service. This is now done but not yet accessible to the mass (be patient).

This language allows users to query ONYPHE data with filters and CIDR mask, for instance. It will be as easy as typing key:value and keeping adding these filters to just get the information you want.

Another special filter is the category one. By default, ONYPHE will search for the datascan category (application data), but you may want to search for resolver data (passive DNS) or pastries (like pastebin).

Sample queries:

category:datascan product:Apache port:443 os:Windows
category:synscan port:23 country:FR os:Linux
category:synscan ip: os:Linux port:23
category:inetnum organization:"OVH SAS"
category:inetnum netname:APNIC-LABS
category:threatlist country:RU
category:threatlist ip:
category:pastries ip:
category:resolver ip:

With a full access, it is nearly 70 filters that are accessible. You may use these filters from the Web site on the search line or from the API, as described on the dedicated documentation. We will document this language more thoroughly when it will be available for end-users.

Geolocation enrichment for threatlist and inetnum categories

Starting from february 2017, we added geolocation information to threatlists. It was not done before because threatlists we aggregate were not giving this information. We considered that it may make sens, that if an IP was classified as a threat and its geolocation changed from one day to another, it may not be harmful anymore. We let the user decide how to consider that new information.

The same is true for inetnum: it is good to get these netblocks from RIR, but only the country is given. By adding geolocation information, we can enrich it with organization and GPS coordinates, for instance. Adding organization allows to perform such a thing as a netname to organization lookup (or the reverse).

Both of these enrichments are now readily available on any new data.

Subnet enrichment

When searching for information about your own IP addresses, you may find yourself in the situation where you want to find everything on your complete subnet. For that to work, the subnet information has to be put somewhere. This is now done, subnet information is added to synscan and datascan, and every category where geolocation is applied.

Thanks to that, you will be able to pivot on ip or subnet data by a simple click (or query filter) when the commercial offer we be available.

Scanning the Dark Web

Another addition is the scanning of the so-called Dark Web. Those .onion Web sites reachable only from the Tor network. We have compiled a first pass list of nearly 40,000 onion sites. Thanks to that list, we will be able to crawl the Dark Web and enrich this list by discovering new onion links, just like any search engine.

At the time of writing and taking into account this list, we have indexed more than 5,100 active hidden sites.

Note: don’t try the displayed search query as it is only available for ONYPHE purposes.

New watched protocols and fingerprinting

Finally, we added two new protocols along with fingerprinting of services: RDP (Remote Desktop Protocol) and DNS (Domain Name System). For RDP, we are able to differentiate between the Microsoft implementation and the XRDP one. That’s a start and should be very helpful. Thanks to that, we can enrich the information with the os.

For the DNS protocol, we simply use the version.bind request. And here is the TOP10 product in use on the Internet, being a resolver or authoritative server. The percentage is about this TOP10 only, not about all detected servers. Thus, BIND accounts for 78% of the TOP10 products discovered on the Internet.


As you can see, we have many new addition to share with you in this newsletter. The next time we share something with you will be the final pricing and the opening of commercial subscriptions.

In the meantime, for those not already registered, you can create your free user account and gain access to your API key by registering here:

Newsletter #1

We have been working on onyphe portal to bring new additions.

One of them is the preparation of the commercial launch of the service along with the user API and the other one is the addition of an abuse field for some categories of data.

The pricing model will be disclosed at a later time, when we will be ready to launch the commercial service.

Abuse email address field added to inetnum category

Following a request from a user of the service, we have added the extraction of abuse email addresses from RIR data (RIPE, for instance). You will be able to lookup abuse email addresses for a given IP address. The field’s name is “abuse”.


Of course, this field is now available from the inetnum API. It may be composed of multiple addresses, so expect it to be a multi-valued one.

Should you have requests for addition, you can reach us at support[at]

Limitation of the number of requests

We are working on the capability to sell the service, and before we go to the market, we have to be able to limit the number of queries a user can do on a monthly-basis.

For now, it is set to 0, meaning it is still unlimited. We will activate the limitation on the number of queries when we are ready to launch the commercial service.

New API: user

The user API gives you information about your user account. For instance, you will be able to make a free query to know how much credits are remaining.

The field giving this information is named “credits”. More information about this new API is available at the documentation page.


You can test the service for free, just register to get access to your free API and receive updates via our newsletter:

Samba Internet Exposure

Back in november 2017, a number of security vulnerabilities were disclosed impacting numerous versions of Samba software. CVE-2017-14746 is about a use-after-free issue while CVE-2017-15275 leads to a memory leak vulnerability. The former impacts all Samba versions starting from 4.0.0 while the later affects all versions starting from 3.6.0. Now, the question we may ask is: how many of this affected products can be reached from the Internet?

Samba Exposure

This question is important because, if successfully exploited, these issues may lead to the compromission of affected devices with, as a potential result, new hosts joining yet-another-botnet. By performing a simple search on ONYPHE with the string “samba”, we find around 1 million results.

The next obvious question is now: how many of these hits are using a vulnerable version of Samba? By querying for the TOP 10 versions of Samba, we obtain the following results:

80% of the TOP 10 versions are running vulnerable versions of Samba. That means a little bit more than 37,000 devices may be at risk of compromission.

Note: these results were collected at the end of November 2017.

We were specifically searching for Samba 3.6.x and 4.x. Now, those versions may not be the most prevalent on the Internet, so what about querying for the most seen results for a Samba query to list available shares? We can do that by querying for TOP 10 MD5 sums performed against collected banners.

Our results shows that only two MD5 sums are accounting for roughly 600,000 devices. For instance, if you query one of these sums, you will find more than 300,000 results: 

In fact, if you check for distinct IP addresses resulting from those two hashes, you will find around 300,000 unique addresses. That’s because those devices are exposing Samba through both ports 139/tcp and 445/tcp.

They are all Samba 3.2.15 hosted at Emirates Telecommunications Corporation organization. It is the exact same product behind this Samba version: D-Link DIR850L. The good news is it is not impacted by the previously discussed CVEs. Unfortunately, if you search for vulnerabilities impacting this given product, you find a blogpost dating back from Septembre 2017 describing a fair number of issues:


The results shown here were presented at the latest Botconf security conference in Montpellier, France during a lightning talk. We showed that Samba is quite heavily exposed on the Internet and may be abused to build a botnet, just like many other vulnerable products.

If you are interested in querying our data, you can register for free to get your API key and have access to ONYPHE queries.