cancel
Showing results for 
Search instead for 
Did you mean: 

Intermittent connection issues loading/changing site- secure connection failed/site can't be reached

EDIflyer
Involved

Is anyone else occasionally getting a 'connection down' error from Chrome (or 'Secure Connection Failed' from Firefox) when changing page - if I hit refresh it works fine. At first I wondered if it was the site I was using but have noticed it on multiple different major websites over the past couple of months and not noticed it from other locations where I don't use Three Broadband so does seem to be related to the router/connection (this is via wired Ethernet too, so not a WiFi problem).  I've tried changing DNS server in case that helps but no difference.

It mainly seems to happen when trying to first load a site or (annoyingly) at checkout when a different site is being loaded as part of the checkout process.  It certainly doesn't happen everytime but does happen with reasonable frequency.  I've also noticed I often get it when trying to pull/push from/to Github too and have to do so a number of times for it to work (browsing the Github website works fine).

 
I'm using a NR5103E with Firmware Version V1.00(ACBJ.0)b14 - I tried rebooting it but to no effect.

EDIflyer_0-1697984212494.pngEDIflyer_1-1697984219294.png

 
 

EDIflyer_4-1697984241361.png

 

509 REPLIES 509
JonathanB
Community Moderator
Community Moderator

Hi everyone,

@EDIflyer mentioned they haven't been seeing the same error recently, but I don't want to write this off as resolved, particularly when others have mentioned this issue coming and going in the past.

We'd like to do some live tracing against affected connections. Is anyone interested in participating in this?

Thanks,
Jonathan



Mod tip! The author of a post can hit 'Accept as Solution', to highlight a reply that helped solved their query.


thelostlambda
Fledgling

Hi Jonathan!

I've been away from the UK since December or so, but since coming back last week I've been suffering this problem quite badly... I'd love to participate / provide helpful information to get this issue resolved ASAP!

Thanks a ton!

bytespider
Involved

Id be happy to help.

toaster
Active

Hi @JonathanB 

I'm still having this error across multiple sites. I'd be happy to help with some live tracing.

JonathanB
Community Moderator
Community Moderator

Thanks @toaster,

I've dropped you a PM now to get a couple of details to set this up

To view your private messages on the community, click on your avatar image in the top right of any community page, then "Messages".

Thanks,
Jonathan



Mod tip! The author of a post can hit 'Accept as Solution', to highlight a reply that helped solved their query.


EDIflyer
Involved

Although not glad you're having issues @toaster I'm equally glad you are if you know what I mean! Weird that it stopped for me today but great if you could help with live tracing.

toaster
Active

I haven't had any communication from Three about sending the packet captures from my previous investigation, but I've carried on digging... and I think I've found a couple more nuggets of info.

 

I tried seeing what the failure rate was for some of the sites that have been listed so far in this thread. Typically I get around 0.5% - 2% failure rate. Then I checked https://royalmail.com - I'm getting a ~20%-25% failure rate which is interesting.

What's also interesting is that https://www.royalmail.com hasn't failed yet so has a 0% failure rate.

 

What do www.royalmail.com and royalmail.com do differently?

The most obvious thing is that www.royalmail.com looks like it's fronted by Akami. My client ends up with a TLS  1.3 connection. royalmail.com doesn't look like it's fronted by Akami and my client ends up with a TLS 1.2 connection.

I ran curl (and openssl s_client) with the option to specify the ciphers against royalmail.com and www.royalmail.com with a small variety of specified ciphers which had no noticeable difference to the failure rates.

At the moment, I don't think the TLS connection is a factor. Looking back at the packet capture from my server in the previous tests, the ACK from the 3-way handshake and the initial FIN-ACK appear to be sent at the same time - well before the TLS negotiation even begins and before any intermediary would be able to see the supported ciphers/certs/etc from the server.

I think the decision to close the connection is made during the 3-way handshake. I don't know if there's much more that I can dig into as I can't see from the client side what would be triggering an intermediary to close the connection so early because all the requests from a client to initiate a connection to a server will be very similar. There's no real difference in the SYN packets sent to royalmail.com vs www.royalmail.com from the same client.  I can't reliably see the SYN-ACK responses from the server because something in the middle is rewriting things.

I'd be interested in any thoughts on what to look at next and if anyone else sees the same difference between royalmail.com and www.royalmail.com

 

EDIflyer
Involved

Weirdly both of those seem to be working OK for me in the browser. I did notice my connection went down for about 10 min last night but each one I'm trying this morning is working without a VPN active.

toaster
Active

Ok, this might be a long post. I've been lurking here for a while hoping to see a resolution to this problem. As we've been asked to provide more information, I decided to do a bit of digging...

Firstly, I'm not a network engineer so bear that in mind.

I've been seeing the same issues connecting to a variety of sites since before Christmas. I don't think it's isolated to specific sites. I think all sites can experience this behaviour. I'm not 100% sure that it only affects https either. I've been seeing similar issues with plain http connections - I think https failures just have a more distinctive error.

I started testing connecting to Github with curl. I was getting failures around 0.5% of the time. Capturing the packets on the client (on Three BB), I was able to see a distinctive pattern which looked like the server was completing the 3-way handshake and then immediately sending FIN-ACK to close the connection while the client was starting the TLS Client Hello. I couldn't see any reason for this to happen when nothing was being changed between requests.

 

To dig further, I set up an https server running nginx on AWS with a lets encrypt cert. I was able to get the same failures about 0.5% of the time.

I decided to capture the packets on the client and on the server to compare them.

Some interesting things came to light:

  • The TTL changes from 64 -> 60 for packets going from server to client (suggesting 4 hops) but from 64->46 for packets going to the server (suggesting 18 hops). Either packets are routed differently or something is rewriting the TTL
  • Other parts of the packets change between being sent by the sender and being received. Specifically the timestamp values change which could be indicative of some stateful appliance in the middle using that data to keep track of the sessions.
  • For failing connections: To the server, it looks like the FIN-ACK is first received from the client. To the client, it looks like the server sends the first FIN-ACK.
  • For failing connections, the client sends the expected TLS Client Hello, but this is never received by the server. The client receives an ACK to that Hello but the server never sends one. I believe this missing communication is with something that sits between the two parties.

What I think is happening is that something in the middle is intercepting the Client Hello, sending an ACK back to the client, then initiating the connection close by sending the FIN-ACK to the client, which responds to the server with FIN-ACK etc...

I'm not sure if this is a single bad configuration in a cluster of middleware/firewalls or if something in a request occasionally trips a rule that makes a firewall decide to end the connection.

@JonathanB  I have the wireshark-compatible packet captures. Please let me know how you'd like me to send them to you.

BrummyGit
Active

I had something remarkably similar in my work life. It was due to a firewall in the middle that was inspecting traffic within the TLS stream and therefore acting as a TLS proxy. We had an issue with missing servers in the certificate's subject alternate name. The inbound connection established from internet to the firewall, but the onward tunnel never established correctly therefore the inbound was closed due to a timeout (the destination server just ignored invalid connection attempts).

Sounds like Three might have a certificate missing individual names or ip addresses of one or more of their proxy array servers.