question

LennyLi-9907 avatar image
0 Votes"
LennyLi-9907 asked XingyuZhao-MSFT commented

How to overcome webbot rejection

I have a vb.net app that attempts to visit and download a webpage url. i found out, if i use a normal browser in windows 10 the webpage displays correctly, but when i use the vb.net app, it downloaded some request reject message. i can use the app to download a text file in another url. i suspect the problem is the server or firewall at the destination tried to block web scrabbling.

Can someone show me the coding solution which will not get blocked? my code is already not using the most basic commands to access the url. i already have specified user agent , hostname etc. There seems no information i can find online about this issue

dotnet-visual-basic
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

LennyLi-9907 avatar image
0 Votes"
LennyLi-9907 answered XingyuZhao-MSFT commented
         Dim url = "https://e-services.judiciary.hk/dcl/view.jsp?lang=en&date=" & dateOfCauselist & "&court=CFA"
    
         Dim responseData As String = String.Empty
         Try
             Dim hwrequest As HttpWebRequest = WebRequest.Create(url)
    
             hwrequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
             hwrequest.Host = "e-services.judiciary.hk"
             hwrequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"
    
             hwrequest.Timeout = 6000000
             hwrequest.Method = WebRequestMethods.Http.Get
             hwrequest.ContentType = "text/xml;charset=UTF-8"
             Dim hwresponse As HttpWebResponse = hwrequest.GetResponse()
             If hwresponse.StatusCode = HttpStatusCode.OK Then
                 Dim responseStream As StreamReader = New StreamReader(hwresponse.GetResponseStream())
                 responseData = responseStream.ReadToEnd()
                 If InStr(1, responseData, "Request Rejected") <> 0 Then
                     responseData = "The webmaster has rejected your access. Please contact them or try using a normal web browser to manually see information."
                 End If
             End If
             hwresponse.Close()
         Catch ex As Exception
             responseData = "Error occurred."
             Console.WriteLine(responseData)
    
         End Try
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I can only think of using a random list of user agent text to avoid giving the impression it is from a robot. But I dont think it is that simple, please enlighten me, thanks

0 Votes 0 ·

Hi @LennyLi-9907 ,
I have two questions to confirm with you based on the description.
What's the value of 'dateOfCauselist'?
I cannot access the url 'https://e-services.judiciary.hk', could you provide another url to help us make a test?
We are waiting for your update.

0 Votes 0 ·

It’s any date close to current date, eg 30052021

I suspect the judiciary website has implemented a system that rejects bots that try to scrabble data because if I use a browser on the same computer I can load the website correctly

0 Votes 0 ·

Hi @LennyLi-9907 ,

I suspect the judiciary website has implemented a system that rejects bots that try to scrabble data

I get an exception that the webmaster has rejected my access, so you can consider choosing another website to test the code.

0 Votes 0 ·