Yes This Is A Really Long Request URL

Posted on 20 April 2017 in Asides

Yesterday, while reviewing some logs I came across a curious entry in an Apache error log:

[Wed Apr 19 08:51:48.119666 2017] [core:error] [pid 29210] (36)File name
too long: [client 137.226.113.7:40907] AH00036: access to
/YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForR
esearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongReques
tURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALo
okAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurpos
eWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisI
sAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPu
rposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWe
AreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUs
erAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreSca
nningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyL
ongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePlea
seHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingI
tOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTH
XYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann failed
(filesystem path '[...]')

Formatted to plain English: Yes, this is a really long request URL but we are doing it on purpose. We are scanning for research purpose. Please have a look at the user agent. Thanks!

What does the user agent for this request have to say?

Here is the access log entry:

137.226.113.7 - - [19/Apr/2017:08:51:48 -0400] "GET [...] HTTP/1.1" 403
1471 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
Scanning for research (researchscan.comsys.rwth-aachen.de)"

The website referenced in the user agent, researchscan.comsys.rwth-aachen.de, explains that this request is part of a research project at RWTH Aachen University in Germany and 137.226.113.7 is indeed a part of the university's network.

Interestingly, this is the first time I have seen such a request in any web server log (I have had occasion to look through more than a few). The Wayback Machine has an archive of the user agent page from 6 Dec 2015, so it seems that the research has been going on for at least a year. I checked for the YesThisIsAReallyLongRequestURL string in (a lot of) logs for about 10 different websites of varrying size and did not find any other instances. I wonder how they determine which sites to scan...