cancel
Showing results for 
Search instead for 
Did you mean: 

Investigating TR-069/easycwmpd log records and losses of internet connectivity Y5-210MU external hub

MymsMan
Rising star

 

Introduction

I have spent a bit of time looking at the system log records from the Greenpacket Y5-210MU external hub trying to understand why I suffered a loss of internet connectivity for a short period at around 01:03 UTC on 21 Sep and I have discovered how it is related to the poor implementation of the TR-069 protocol.

 

TR-069 is implemented on the hub by the  easycwmpd program and allows Three to remotely update the hub settings and firmware.  Normally easycwmpd runs every hour, on the hour, to run a ‘PERIODIC’  session to contact the ACS server and exchange information with it.   Without the ‘periodic inform’ sessions the ACS can also send a Connect Request to the hub to establish a session at any time.

 

I used the VS Code editor to examine 3 log files from the hub and the significant log records are attached at the end of this note.  I have also kept the  system logs for the  last month (collected every 3 hours) but I have no reason to believe they would show different patterns.

 

I hope the forum moderators will pass this post on to the Three technical team and Greenpacket

Analysis

The most striking feature of the logs is the sheer number of failures when easycwmpd is interacting with the ACS.  There are a 96 send events logged but 114 libcurl error responses,  

Error response 401 (not authorized) is retried immediately in the same session but other error responses cause the session to fail and be retried.  Instead of the 6 PERIODIC sessions  expected there are a further 29 retry sessions!

The 1st retry is after a 7 seconds delay, 2nd after 15 seconds, 3rd after 30 seconds, and 4th after 1 minute.

The 5th session retry is after 2 minutes the Session type changes to ‘BOOT’, the retry counter is reset,

 

If the initial BOOT session succeeds that completes the cycle, but if it fails on the first session the retry sessions disconnect and reconnect the WAN connection ending with a ‘VALUE CHANGE’ session to notify the ACS of the new WAN IP address.

 

Diagnosis 

Why are so many requests to the ACS server failing?  Normal web access is fast and reliable with very few failures.

 

I think the essence of the problem is the “every hour, on the hour” scheduling of the Periodic Inform.

This means that every hub in the country is attempting to ‘phone home’ simultaneously!  It is not surprising if the ACS server is unable to process so many simultaneous requests and rejects many,  the retries further contribute to the server overload.

 

No doubt when the software was developed and tested it was with a much smaller number of concurrently active hubs and overloads were a less significant issue.

 

I can think of no good reason why every hub has to connect to the server at the same time or so frequently.

 

Possible Solutions 

Fortunately there are a number of potential solutions and they should not be that difficult to implement.  Essentially they all involve trying to spread the workload of the ACS hub across the hour and avoiding too many hubs attempting to connect at the same time.

 

  • Schedule Periodic Inform to run at same time past hour as the last Hub reset time rather than on the hour, or
  • Add a random delay (0-3600 seconds) delay before initiating Periodic inform
  • Increase default periodic interval to a much larger value, is there a Real need for it to be as low as one hour -  wouldn’t daily be adequate for keeping hub to date?
    Three enforce the 3600 inform time value via TR-069 so should be able to change the value without  requiring changes to the Greenpacket firmware.

 

Unresolved questions

The above does not answer  all disconnection related questions

  • Why does this problem not affect all users equally?
    Some users see disconnections hourly, while others are less frequent,  I went 3.5 days between disconnections, others even longer.
  • Why do some users continue to see frequent disconnections even with TR-069 disabled?
  • Why do some users fail to reconnect automatically after disconnection and need to reboot the hub?

Selected log extracts

See Selected Log records 

4 REPLIES 4
Frustratedby3
Fledgling

We have just spent 2 hours on a chat to complaints, technical and cancellations- no solutions offered. Very keen for us to cancel. Complete failure to acknowledge that its their failure. Perhaps everyone with this issue should go to Martin Lewis or someone who might have some clout and get 3 to acknowledge and deal with this. 

MymsMan
Rising star

@Frustratedby3 

What symptoms are you actually seeing?

If it is frequent disconnections then there can be a fairly simple workaround:

  • log on to to hub https://192.168.0.1 
    You MUST use userid
    admin and password printed on hub label
  • Navigate to Advanced->System->TR069
  • Disable TR069 and Periodic Inform
  • Submit, if that fails with a red message then set the  'Request user name'  field  to admin and try again.  Do Not clear the  'ACS user name' field
  • Navigate to Advanced->System->Maintenance
  • Disable Scheduled Reboot or change to a less frequent interval (days)
  • Click on Scheduled Reboot to save

While I am disappointed that Three have not yet solved the underlying issues and that support do not suggest this simple work around I am actually a happy user of Three with a fast stable connection.

MymsMan
Rising star

My earlier diagnosis that problem were caused by  the “every hour, on the hour” scheduling of the Periodic Inform may not be correct.

I have set the Periodic Inform interval to slightly less than 1 hour so that my inform call does not occur on the hour and I am still seeing a significant number of LibCurl error responses and retry sessions ☹️

There must be some other issues with the ACS server but I have no idea what they might be

MichaelP
Community Support Team
Community Support Team

Hi there @MymsMan,

Thanks a lot for sharing this feedback with us, and I can understand your frustration with the disconnections you've been experiencing. 

I have passed the details you've kindly shared over to a colleague from our Broadband team.

Thanks,
Michael

 

 



Mod tip! The author of a post can hit 'Accept as Solution', to highlight a reply that helped solved their query.