(While testing on Monday keith noticed that the GPS sync was lost. To be sure that this is not the problem he has re-tested it with proper GPS sync, unfortunately with the same result.)
Meanwhile I have analyzed the problem a bit further. I am very sure that the problem is not due to an unreliable E1 connection. In fact the connection and the TRAU sync seems to be ok. When filtering the log with
cat ./pcu-startup-64k.log | grep "CCU-\|LOST\|PCU-\|555555555555555555\|LOST\|synchronized"
One can see that it goes through the sync procedure. But even though we send TRAU frames to the CCU, it does not stop sending SYNC indications. That is why we see an "In sync with CCU" message in response to every data indication we send. The CCU on the other end also seems to understand the frame since dbe and dfe are 0. If there were problems with the frame format we should get at least dfe=1 in the log. Also when looking at both logs we see the exact same behavior. So I think we can rule out a frame format problem.
I have the impression that the sync procedure somehow fails. Maybe the CCU fails to finish the procedure and still thinks that it is not synced yet. Maybe it tries to tell: "You sending me data frames, but I am not synced, here is a SYNC indication!"
One must also take into account that the latency of the E1 line is about 13 TRAU frames, so when we send a CCU_SYNC_IND, it will take some frames until it reaches the CCU, then the CCU has to respond and this also takes time. Maybe we have to send the SYNC indications longer. I have hacked up osmo-pcu so that it does not immediately stop the synchronization. Once in sync it will continue to send CCU-SYNC-IND for a few more TRAU frames.
The changes can be found on osmo-pcu.git pmaier/ccuhacks
keith: can you try the modified osmo-pcu version and attach the log output to this ticket?