How to debug PPPoS connection? Link is up, but (almost) no data going through

  1. My goal is: To have an ESP32 with a SIM7000E modem connect to MQTT+SLL and mDash, after having made a connection to my provider through PPPoS / LTE-M.

  2. My actions are: Having set up everything according to docs. Booting the Firmware on the ESP, seeing how PPPoS establishes a connection and gets an IP address. But never seeing MQTT or mDash making a connection.

  3. The result I see is: See log below (excerpt from a fresh boot of the ESP). Funnily enough, SNTP seems to work…occasionally.

Trying baud rate 115200 (fc off)...
[Dec 16 16:16:14.730] mgos_pppos.c:137        12 -> 5 80.906829 0.000000
[Dec 16 16:16:14.740] mgos_pppos.c:137        5 -> 6 80.915198 0.000000
[Dec 16 16:16:14.755] mgos_pppos.c:137        6 -> 7 80.929560 0.000000
[Dec 16 16:16:14.768] mgos_pppos.c:664        Connecting (UART1, APN 'iot.1nce.net')...
[Dec 16 16:16:14.776] mgos_pppos.c:696        Trying to connect to 26201,Telekom.de 1nce.net
[Dec 16 16:16:14.789] mgos_pppos.c:137        7 -> 8 80.964475 0.000000
[Dec 16 16:16:14.798] mgos_pppos.c:127        >> AT
[Dec 16 16:16:14.798] mgos_pppos.c:137        8 -> 9 80.975913 82.973014
[Dec 16 16:16:14.804] mgos_net.c:86           PPP: connecting
[Dec 16 16:16:14.818] Net event: 1313166337 
[Dec 16 16:16:14.818] mgos_event.c:135        ev NET1 triggered 3 handlers
[Dec 16 16:16:14.844] mgos_pppos.c:137        9 -> 8 81.018420 0.000000
[Dec 16 16:16:14.919] mgos_pppos.c:127        >> ATH
[Dec 16 16:16:14.919] mgos_pppos.c:137        8 -> 9 81.096927 83.093935
[Dec 16 16:16:14.945] mgos_pppos.c:137        9 -> 8 81.120135 0.000000
[Dec 16 16:16:14.998] mgos_pppos.c:127        >> ATE0
[Dec 16 16:16:15.001] mgos_pppos.c:137        8 -> 9 81.175717 83.172649
[Dec 16 16:16:15.024] mgos_pppos.c:137        9 -> 8 81.198834 0.000000
[Dec 16 16:16:15.119] mgos_pppos.c:127        >> AT+CFUN=0
[Dec 16 16:16:15.122] mgos_pppos.c:137        8 -> 9 81.297196 83.293697
[Dec 16 16:16:15.701] mgos_pppos.c:137        9 -> 8 81.876122 0.000000
[Dec 16 16:16:15.798] mgos_pppos.c:127        >> AT+CFUN=1
[Dec 16 16:16:15.802] mgos_pppos.c:137        8 -> 9 81.976966 83.973459
[Dec 16 16:16:16.075] mgos_pppos.c:137        9 -> 8 82.249952 0.000000
[Dec 16 16:16:16.139] mgos_pppos.c:127        >> ATI
[Dec 16 16:16:16.139] mgos_pppos.c:137        8 -> 9 82.316925 84.313935
[Dec 16 16:16:16.166] mgos_pppos.c:137        9 -> 8 82.340471 0.000000
[Dec 16 16:16:16.702] mgos_pppos.c:127        >> AT+GSN
[Dec 16 16:16:16.705] mgos_pppos.c:137        8 -> 9 82.880323 84.877044
[Dec 16 16:16:16.724] mgos_pppos.c:279        SIM7000E R1351, IMEI: 865234032678814
[Dec 16 16:16:16.736] mgos_pppos.c:137        9 -> 8 82.910973 0.000000
[Dec 16 16:16:16.809] mgos_pppos.c:127        >> AT+CIMI
[Dec 16 16:16:16.812] mgos_pppos.c:137        8 -> 9 82.987113 84.983776
[Dec 16 16:16:16.835] mgos_pppos.c:137        9 -> 8 83.009969 0.000000
[Dec 16 16:16:16.839] main.c:25               Tick uptime: 83.01, RAM: 4511380, 4325552 free
[Dec 16 16:16:16.909] mgos_pppos.c:127        >> AT+CCID
[Dec 16 16:16:16.912] mgos_pppos.c:137        8 -> 9 83.086796 85.083459
[Dec 16 16:16:16.935] mgos_pppos.c:137        9 -> 8 83.110022 0.000000
[Dec 16 16:16:17.009] mgos_pppos.c:127        >> AT+CPIN?
[Dec 16 16:16:17.012] mgos_pppos.c:137        8 -> 9 83.187240 85.183831
[Dec 16 16:16:17.036] mgos_pppos.c:349        SIM is ready, IMSI: 901405102309107, ICCID: 89882280666023091071
[Dec 16 16:16:17.043] mgos_event.c:135        ev PoS2 triggered 0 handlers
[Dec 16 16:16:17.049] mgos_pppos.c:137        9 -> 8 83.224733 0.000000
[Dec 16 16:16:17.109] mgos_pppos.c:127        >> AT+CREG=0
[Dec 16 16:16:17.112] mgos_pppos.c:137        8 -> 9 83.287446 85.283924
[Dec 16 16:16:17.136] mgos_pppos.c:137        9 -> 8 83.310639 0.000000
[Dec 16 16:16:17.195] mgos_pppos.c:127        >> AT+COPS=4,2,"26201"
[Dec 16 16:16:17.198] mgos_pppos.c:137        8 -> 9 83.374231 143.369853
[Dec 16 16:16:17.945] mgos_pppos.c:137        9 -> 8 84.119341 0.000000
[Dec 16 16:16:17.999] mgos_pppos.c:127        >> AT+CREG?
[Dec 16 16:16:18.002] mgos_pppos.c:137        8 -> 9 84.177352 86.173935
[Dec 16 16:16:18.026] mgos_pppos.c:405        Connected to mobile network (roaming)
[Dec 16 16:16:18.030] mgos_pppos.c:137        9 -> 8 84.207547 0.000000
[Dec 16 16:16:18.098] mgos_pppos.c:127        >> AT+COPS=3,2
[Dec 16 16:16:18.102] mgos_pppos.c:137        8 -> 9 84.277192 86.273504
[Dec 16 16:16:18.125] mgos_pppos.c:137        9 -> 8 84.299979 0.000000
[Dec 16 16:16:18.199] mgos_pppos.c:127        >> AT+COPS?
[Dec 16 16:16:18.202] mgos_pppos.c:137        8 -> 9 84.376956 86.373541
[Dec 16 16:16:18.711] mgos_pppos.c:137        9 -> 8 84.885599 0.000000
[Dec 16 16:16:18.733] init.js:57              Solar state:
[Dec 16 16:16:18.761] init.js:58              {"LONG":"700","BMV":"700","SOC":1000,"CE":"0","I":"0","V":26.200000,"PID":"0x203"}
[Dec 16 16:16:18.808] mgos_pppos.c:127        >> AT+COPS=3,0
[Dec 16 16:16:18.811] mgos_pppos.c:137        8 -> 9 84.986611 86.982911
[Dec 16 16:16:18.835] mgos_pppos.c:137        9 -> 8 85.009781 0.000000
[Dec 16 16:16:18.899] mgos_pppos.c:127        >> AT+COPS?
[Dec 16 16:16:18.902] mgos_pppos.c:137        8 -> 9 85.077337 87.073920
[Dec 16 16:16:18.926] mgos_pppos.c:456        Operator: 26201,Telekom.de 1nce.net
[Dec 16 16:16:18.930] mgos_pppos.c:137        9 -> 8 85.107648 0.000000
[Dec 16 16:16:18.999] mgos_pppos.c:127        >> AT+CSQ
[Dec 16 16:16:19.002] mgos_pppos.c:137        8 -> 9 85.177001 87.173739
[Dec 16 16:16:19.025] mgos_pppos.c:137        9 -> 8 85.199920 0.000000
[Dec 16 16:16:19.098] mgos_pppos.c:127        >> AT+CGDCONT=1,"IP","iot.1nce.net"
[Dec 16 16:16:19.102] mgos_pppos.c:137        8 -> 9 85.278849 87.273330
[Dec 16 16:16:19.148] mgos_pppos.c:137        9 -> 8 85.322753 0.000000
[Dec 16 16:16:19.199] mgos_pppos.c:127        >> ATDT*99***1#
[Dec 16 16:16:19.202] mgos_pppos.c:137        8 -> 9 85.377407 87.373653
[Dec 16 16:16:19.710] mgos_pppos.c:137        9 -> 10 85.884399 0.000000
[Dec 16 16:16:19.713] mgos_pppos.c:812        Starting PPP, user ''
[Dec 16 16:16:19.721] mgos_pppos.c:137        10 -> 11 85.895970 115.895913
[Dec 16 16:16:19.756] mgos_mongoose.c:66      New heap free LWM: 4314680
[Dec 16 16:16:19.771] mgos_net.c:90           PPP: connected
[Dec 16 16:16:19.783] Net event: 1313166338 
[Dec 16 16:16:19.783] mgos_event.c:135        ev NET2 triggered 3 handlers
[Dec 16 16:16:19.801] mgos_mongoose.c:66      New heap free LWM: 4314332
[Dec 16 16:16:19.884] mgos_net.c:102          PPP: ready, IP 10.215.141.1, GW 10.64.64.64, DNS default
[Dec 16 16:16:19.900] Net event: 1313166339 
[Dec 16 16:16:19.900] mgos_sntp.c:129         SNTP next query in 962 ms
[Dec 16 16:16:19.906] mgos_mqtt_conn.c:471    MQTT0 connecting to mqtt.ulmernest.de:8883
[Dec 16 16:16:19.921] Sys event: 1297044230 
[Dec 16 16:16:19.921] mgos_event.c:135        ev MOS6 triggered 2 handlers
[Dec 16 16:16:19.927] mongoose.c:3145         0x3ffcbdc0 mqtt.ulmernest.de:8883 -,-,ca.pem
[Dec 16 16:16:19.933] mgos_vfs.c:283          ca.pem -> /ca.pem pl 1 -> 1 0x3ffbc428 (refs 1)
[Dec 16 16:16:19.956] mgos_vfs.c:377          open ca.pem 0x0 0x1b6 => 0x3ffbc428 ca.pem 1 => 257 (refs 1)
[Dec 16 16:16:19.962] mgos_vfs.c:410          close 257 => 0x3ffbc428:1 => 0 (refs 0)
[Dec 16 16:16:19.968] mongoose.c:3145         0x3ffcc0e4 udp://8.8.8.8:53 -,-,-
[Dec 16 16:16:19.974] mongoose.c:3013         0x3ffcc0e4 udp://8.8.8.8:53
[Dec 16 16:16:19.980] mgos_rpc_channel_ws:263 0x3ffbccf4 Connecting to wss://mdash.net/api/v2/rpc, SSL? 1
[Dec 16 16:16:19.987] mongoose.c:8604         wss://mdash.net/api/v2/rpc use_ssl? 1 tcp://mdash.net:443
[Dec 16 16:16:19.996] mongoose.c:3145         0x3ffdf2e4 tcp://mdash.net:443 -,-,ca.pem
[Dec 16 16:16:20.002] mgos_vfs.c:283          ca.pem -> /ca.pem pl 1 -> 1 0x3ffbc428 (refs 1)
[Dec 16 16:16:20.008] mgos_vfs.c:377          open ca.pem 0x0 0x1b6 => 0x3ffbc428 ca.pem 1 => 257 (refs 1)
[Dec 16 16:16:20.017] mgos_vfs.c:410          close 257 => 0x3ffbc428:1 => 0 (refs 0)
[Dec 16 16:16:20.023] mongoose.c:3145         0x3ffdff50 udp://8.8.8.8:53 -,-,-
[Dec 16 16:16:20.029] mongoose.c:3013         0x3ffdff50 udp://8.8.8.8:53
[Dec 16 16:16:20.032] mgos_event.c:135        ev NET3 triggered 4 handlers
[Dec 16 16:16:20.049] mongoose.c:3028         0x3ffcc0e4 udp://8.8.8.8:53 -> 0
[Dec 16 16:16:20.052] mongoose.c:3028         0x3ffdff50 udp://8.8.8.8:53 -> 0
[Dec 16 16:16:20.059] mgos_mongoose.c:66      New heap free LWM: 4310972
[Dec 16 16:16:20.700] mgos_mongoose.c:66      New heap free LWM: 4309568
[Dec 16 16:16:20.870] mongoose.c:3145         0x3ffcbd18 udp://time.google.com:123 -,-,-
[Dec 16 16:16:20.876] mongoose.c:3145         0x3ffe30b8 udp://8.8.8.8:53 -,-,-
[Dec 16 16:16:20.879] mongoose.c:3013         0x3ffe30b8 udp://8.8.8.8:53
[Dec 16 16:16:20.885] mgos_sntp.c:96          SNTP query to time.google.com
[Dec 16 16:16:20.892] mgos_sntp.c:129         SNTP next query in 1897 ms
[Dec 16 16:16:20.906] mongoose.c:3028         0x3ffe30b8 udp://8.8.8.8:53 -> 0
[Dec 16 16:16:21.010] mgos_mongoose.c:66      New heap free LWM: 4308504
[Dec 16 16:16:21.026] mongoose.c:3013         0x3ffcbd18 udp://216.239.35.0:123
[Dec 16 16:16:21.029] mongoose.c:3028         0x3ffcbd18 udp://216.239.35.0:123 -> 0
[Dec 16 16:16:21.035] mgos_sntp.c:48          SNTP sent query to 216.239.35.0
[Dec 16 16:16:21.042] mgos_mongoose.c:66      New heap free LWM: 4308396
[Dec 16 16:16:21.193] mgos_sntp.c:59          SNTP reply from 216.239.35.0: time 1639667781.675384, local 87.371710, delta 1639667694.303674
[Dec 16 16:16:21.212] Sys event: 1297044227 
[Dec 16 16:16:21.212] mgos_event.c:135        ev MOS3 triggered 4 handlers
[Dec 16 16:16:21.218] mgos_sntp.c:79          SNTP close
[Dec 16 16:16:21.221] mgos_sntp.c:129         SNTP next query in 6881684 ms
[Dec 16 16:16:21.228] mgos_mongoose.c:66      New heap free LWM: 4306864
[Dec 16 16:16:21.820] mongoose.c:3013         0x3ffdf2e4 tcp://157.90.34.55:443
[Dec 16 16:16:21.837] mongoose.c:3013         0x3ffcbdc0 tcp://18.158.254.200:8883
[Dec 16 16:16:21.908] mongoose.c:3028         0x3ffdf2e4 tcp://157.90.34.55:443 -> 0
[Dec 16 16:16:21.948] mongoose.c:3028         0x3ffcbdc0 tcp://18.158.254.200:8883 -> 0
[Dec 16 16:16:21.962] mgos_mongoose.c:66      New heap free LWM: 4305340
[Dec 16 16:16:22.973] mgos_mongoose.c:66      New heap free LWM: 4303816
[Dec 16 16:16:26.952] main.c:25               Tock uptime: 93.13, RAM: 4511232, 4313880 free
[Dec 16 16:16:36.851] main.c:25               Tick uptime: 103.02, RAM: 4511248, 4315436 free
[Dec 16 16:16:38.712] init.js:45              MQTT state:false
[Dec 16 16:16:46.851] main.c:25               Tock uptime: 113.02, RAM: 4511252, 4315440 free
[Dec 16 16:16:56.978] main.c:25               Tick uptime: 123.15, RAM: 4511248, 4315436 free
[Dec 16 16:17:06.851] main.c:25               Tock uptime: 133.02, RAM: 4511248, 4315452 free
[Dec 16 16:17:08.711] init.js:45              MQTT state:false
[Dec 16 16:17:16.851] main.c:25               Tick uptime: 143.02, RAM: 4511252, 4315440 free
[Dec 16 16:17:26.948] main.c:25               Tock uptime: 153.12, RAM: 4511240, 4313904 free
[Dec 16 16:17:36.851] main.c:25               Tick uptime: 163.02, RAM: 4511256, 4315460 free
[Dec 16 16:17:38.711] init.js:45              MQTT state:false
[Dec 16 16:17:46.851] main.c:25               Tock uptime: 173.02, RAM: 4511256, 4315460 free
[Dec 16 16:17:56.947] main.c:25               Tick uptime: 183.12, RAM: 4511252, 4315456 free
[Dec 16 16:18:06.851] main.c:25               Tock uptime: 193.03, RAM: 4511256, 4315460 free
[Dec 16 16:18:08.710] init.js:45              MQTT state:false
[Dec 16 16:18:16.852] main.c:25               Tick uptime: 203.02, RAM: 4511256, 4315460 free
[Dec 16 16:18:26.947] main.c:25               Tock uptime: 213.12, RAM: 4511252, 4315456 free

  1. My question is: How do I debug this? Obviously the connection is being made successfully, also I can see in the backend of my provider that the SIM has successfully established the connection and got an IP (the same as in the logs). So my initial, naive assumption would be that something in the lower levels of mgos (or even outside of it) like LwIP or esp-idf are failing to make that connection usable to the system and/or mgos. What would be the way to verify this?

On a side note: To rule out modem/provider issues, I connected the modem to a linux VM, and had pppd / pon / poff establish a connection with the same settings (APN, also I checked the AT command sequences to be equivalent). Here, the connection was successful as well (with the same IP), but I was also able to run any kind of traffic over the interface. So I’d guess it’s either not related to modem and/or provider, or there’s some small detail that I’m missing that’s different between mgos pppos and pppd.

Thanks in advance!

One small addition: While doing the pppd test, I also did not change anything w/ regard to location or antenna, so signal quality issues also seem to be unlikely. If anything, it would have got something to do with RF / EM issues from cabling or something.

I suggest you just try a simple demo first so you can have a small task and a small scenario with a small number of variables and a small number of unknowns we can actually handle.

I would say your networking is working and you don’t need to debug PPP.

Those ca.pem lines make me think you are actually TCP connecting but something in your configuration or your certificates is wrong and TLS is not even tried. But I don’t see a “connected” indicator.
I suggest you try with WiFi first and sniff that in order to actually see what is going on. You can also try with no TLS first and on a local MQTT broker.

Hi scaprile, thanks for your quick answer! I should have mentioned that everything is as well working when using Wifi as the internet uplink (with the ESP being in STA mode) instead of PPPoS. So certificates and everything should be fine. I already tried and removed everything unneccessary from the project temporarily, but that didn’t change anything.

However, I think the only option left by now is to still follow your advice, and build a minimal example from ground up to try and reproduce the error. If it works there, no need for debugging, if not, we at least have some graspable test case to debug further.

your SIMcom should reply with a +CSQ: x,y line and we should see a RSSI: line here. That will tell you whether you are having a good signal or not. What is your debug.level ? If LL_INFO messages are allowed, and signal quality is positive and less than 32, we should see that line there. Otherwise manually check with a terminal what is going on (send AT+CSQ)

Try this also:
- ["pppos.hexdump_enable", true]
that should fill your screen with whatever is going on at the lowest level and hog your ESP32 resources too…

Thanks - CSQ returns values between 17 and 20 (via pppos lib and when using the mentioned debian VM with minicom as well), so I’d say it’s an okay-ish signal (actually I’ve got a direct line of sight to the cell tower, but a mediocre antenna). But in the VM (with the modem untouched apart from wiring), it’s still sufficient to get me occasional sub-100ms ping times to google and transfer rates around 30kb/sec with wget, which is probably close to the maximum you can get out of LTE-M under non-ideal conditions.

I also had the hexdump enabled for quite some time, everything AT is fine there, when switching over to PPP mode it’s hard to tell whats going on (and I indeed noticed that it seems quite resource-heavy, so I decided to turn it off once everything AT was okay).
I can say that there’s definitely regular data transfer happening which kinda “matches” the higher level commands going on, but for inspecting it in detail I’d need to capture the raw bytes somehow and pipe them to Wireshark somehow (never done that in this configuration - do you happen to have any advice on that?).

Edit: I’m already in the process of creating a minimal example, so further insights might be available soon.

you can always solder to the serial pins and plug them to serial ports on a Wireshark-loaded PC via a level translator…

If WiFi works and PPP does not… your provider might be firewalling you out or limiting traffic to certain ports.

Okay, will see if I can arrange that capture setup somehow as a last resort…

As for the Wifi vs PPP topic - this was also my initial guess, however when using the modem from the VM, it is also possible to connect to the MQTT Server without problems (using the node command line MQTT client). Also I can wget the mDash endpoint without any problems (didn’t try a real WS connect, though).

I see… looks strange. Try a minimal MQTT only and I guess the standard debug.level is enough.

Okay, I think I’m getting a little bit closer - with a really minimal setup, I get somewhat better results.

Connection itself works most of the time, with mDash, MQTT and SNTP connecting successfully after some time. However, 9 out of 10 times, it takes two PPPoS connection attempts, only getting a stable connection after the second attempt. However, in terms of logging, the first attempt also looks okay and ends with the device getting an IP - but no data going through, and the connection breaks shortly after.

So my guess would be that this is some sort of timing problem with the modem (with the ESP being too fast). With the minimal setup I’m also seeing occasional “SIM not ready” messages in the log, sometimes, indicating that the modem is not yet fully up and running. With my other setup, I think that the rest of the code slowed down all steps to some degree, so that message never occurred.

However, I also think that there’s some other condition with the modem not being fully ready yet the moment the PPPoS lib initiates the first connection, as the modem also soon jumps back to AT command mode after the first connection fails after some time (or maybe the lib initiates this, but I’m not seeing any +++ being sent). I’m still wondering how I can even get an IP with this first connection, after all…

Will now try to play around with the timings of rst_hold / rst_wait, maybe thats enough to mitigate the problem.

I did some testing a while ago with SIM800L and SIM808 modules and did not have any problems with them.

I’ve just rebuilt the firmware for the SIM800L dev board (TTGO_T-Call_SIM800) with dash support and it works ok.

[Dec 16 21:20:59.260] mgos_dash.c:203         Dash init ok, server: wss://mdash.net/api/v2/rpc)
[Dec 16 21:20:59.268] mgos_pppos.c:1087       PPPoS UART1 (RX:26 TX:27 CTS:27 RTS:13), 115200, fc off, APN 'net'
[Dec 16 21:20:59.268] mgos_rpc_channel_ua:320 0x3ffc63f8 UART0
[Dec 16 21:20:59.277] main.c:51               modem_start - set power_ctrl on
[Dec 16 21:20:59.285] mgos_init.c:36          Init done, RAM: 4531696 total, 4428324 free, 4428324 min free
[Dec 16 21:20:59.291] mgos_ota_core.c:1590    UID: 0fff7bad4a58164f, license: none
[Dec 16 21:20:59.295] mgos_net.c:84           PPP: disconnected
[Dec 16 21:21:00.304] mgos_mongoose.c:66      New heap free LWM: 4428292
[Dec 16 21:21:00.482] mgos_net.c:89           PPP: connecting
[Dec 16 21:21:00.582] mgos_pppos.c:599        Resetting modem...
[Dec 16 21:21:03.469] mgos_mongoose.c:66      New heap free LWM: 4428280
[Dec 16 21:21:05.780] mgos_pppos.c:678        Connecting (UART1, APN 'net')...
[Dec 16 21:21:05.786] mgos_pppos.c:712        Trying to connect to 22610,ORANGE
[Dec 16 21:21:05.791] mgos_mongoose.c:66      New heap free LWM: 4427740
[Dec 16 21:21:07.469] mgos_mongoose.c:66      New heap free LWM: 4427624
[Dec 16 21:21:07.576] mgos_pppos.c:285        SIM800 R14.18, IMEI: 863576043242523
[Dec 16 21:21:07.877] mgos_pppos.c:355        SIM is ready, IMSI: 226102601501603, ICCID: 8940101603182542017f
[Dec 16 21:21:07.882] mgos_mongoose.c:66      New heap free LWM: 4427540
[Dec 16 21:21:09.294] main.c:37               tz: EET-2EEST,M3.5.0/3,M10.5.0/4, epoch: 10, local: 1970-01-01 02:00:10 EET, utc: 1970-01-01 00:00:10 Z
[Dec 16 21:21:09.294] mgos_mongoose.c:66      New heap free LWM: 4427476
[Dec 16 21:21:13.175] mgos_pppos.c:415        Connected to mobile network (home)
[Dec 16 21:21:13.574] mgos_pppos.c:468        Operator: 22610,ORANGE
[Dec 16 21:21:13.672] mgos_pppos.c:477        RSSI: -77
[Dec 16 21:21:13.874] mgos_pppos.c:828        Starting PPP, user ''
[Dec 16 21:21:13.879] mgos_mongoose.c:66      New heap free LWM: 4419932
[Dec 16 21:21:13.883] mgos_net.c:93           PPP: connected
[Dec 16 21:21:13.894] mgos_mongoose.c:66      New heap free LWM: 4416972
[Dec 16 21:21:13.900] mgos_mongoose.c:66      New heap free LWM: 4416756
[Dec 16 21:21:15.497] mgos_net.c:104          PPP: ready, IP 10.113.154.158, GW 192.168.254.254, DNS 109.166.202.53, NTP 0.0.0.0
[Dec 16 21:21:15.504] mgos_rpc_channel_ws:269 0x3ffb85ec Connecting to wss://mdash.net/api/v2/rpc, SSL? 1
[Dec 16 21:21:16.487] mgos_mongoose.c:66      New heap free LWM: 4415844
[Dec 16 21:21:16.618] mgos_mongoose.c:66      New heap free LWM: 4414644
[Dec 16 21:21:16.624] mgos_mongoose.c:66      New heap free LWM: 4414436
[Dec 16 21:21:17.089] mgos_sntp.c:64          SNTP reply from 216.239.35.4: time 1639682477.617752, local 18.662561, delta 1639682458.955191
[Dec 16 21:21:17.918] mongoose.c:4906         0x3ffc7ac8 ciphersuite: TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256
[Dec 16 21:21:17.923] mgos_mongoose.c:66      New heap free LWM: 4412656
[Dec 16 21:21:18.130] mgos_mongoose.c:66      New heap free LWM: 4412604
[Dec 16 21:21:18.136] mgos_mongoose.c:66      New heap free LWM: 4412504
[Dec 16 21:21:22.790] mgos_mongoose.c:66      New heap free LWM: 4395600
[Dec 16 21:21:22.803] main.c:37               tz: EET-2EEST,M3.5.0/3,M10.5.0/4, epoch: 1639682483, local: 2021-12-16 21:21:23 EET, utc: 2021-12-16 19:21:23 Z
[Dec 16 21:21:24.104] mgos_rpc_channel_ws.:54 0x3ffb85ec WS handshake resp 101
[Dec 16 21:21:26.989] mg_rpc.c:314            Sys.GetInfo via WSS_out 157.90.34.55:443
[Dec 16 21:21:27.632] mg_rpc.c:314            FS.Get via WSS_out 157.90.34.55:443
[Dec 16 21:21:27.873] mg_rpc.c:314            FS.Rename via WSS_out 157.90.34.55:443
[Dec 16 21:21:28.155] mg_rpc.c:314            FS.Get via WSS_out 157.90.34.55:443

The escape sequence is only sent when you actually need to escape from data mode to command mode. The modem starts in command mode and is only in data mode after a PPP connection is established. AFAIK no command is sent (unless requested by the user) after ATD, so there is no need to escape. Once the connection drops, the modem is on command mode again.

Thanks for your efforts! I’m actually also starting to think that it might have to something with the fact that I’m using LTE and not a classic GPRS/GSM RAT. With the latter, I didn’t experience similar issues before as well. But as LTE implies using CGREG with the library, which might behave slightly different and report readyness when there is none yet, this could explain the timing issues as well. But this is still wild guesses.

As for the escape sequence - my reasoning was that when the lib starts the PPP connection, it might issue a “+++” in advance, to account e.g. for sudden reboots of mgos (which would leave the modem in PPP state unless physical modem reset is performed as well). A quick look into the source code of the PPPoS lib seemed to confirm this, however I didn’t do a deep dive into that yet.

My current progress is that with the minimal example, I can get a working connection more or less reliably on the second try, worst case. Sometimes it even works first try, but I’m still struggling to find the differences in the communication sequence or modem state that cause this. Will see if I can set up a capture circuit and tap into the serial port directly as a next step. Having the modem do a hardware reset and wait for >15sec until I can start to establish the connection, which might take another 2-3 minutes or more (in the case of multiple attempts necessary) is unfortunately not acceptable with the current project I’m working on.

Given the fact that I can do connection setup, packet sending and connection teardown running in a loop in the VM within a few seconds, without a single failure over a few hundred attempts, tells me that it is possible - I’m gonna update this post when I find out how exactly :slight_smile:

The first command sent would time out and then you would try sending the escape sequence; or at least that is what I would do (though I always control modem power and so it is off while the microcontroller is starting). I’m too lazy to check the actual code today.

If you can get a good connection with the Linux box, then you “somehow” have the proper command sequence. I’m not aware of differences between GPRS-like and LTE SIMcom modems, though I’ve not used the SIM7000E (I guess it was the 7000, not sure).
You can change the command sequence by sending your own commands, but I guess you’ve already seen that when peeking into the code.

Some short update on my further findings (no solution yet, unfortunately):

  • I managed to create a makeshift sniffer using another ESP and a sketch which bridges the ESP UARTs to TCP streams. This works nicely, however I found out that Wireshark won’t accept raw PPP byte streams and is only usable with “pre-packetized” captured streams from proper Win/Unix capture drivers. But even if that would be doable, Wireshark seemingly does not support analyzing TCP-in-PPP-in-TCP data – it’s just too much encapsulation.

  • While tinkering with the mentioned setup, I still got closer to the actual culprit by streaming the raw UART data w/ netcat. Spoiler: I’m quite sure by now that the lib does everything right, apparently. All AT responses from the module indicate that it is ready for making a connection, and the commands for the first (failing) and second (successful) attempt are identical.

  • With the first (in the end unsuccessful) connection attempt however, after a short period of time, sometimes even with very short data exchanges before (e.g. a working TCP CONNECT), I get a NO CARRIER from the module out of nowhere. When the lib notices this, a reconnect is triggered, which ends up being successful. This message somehow always went unnoticed by me w/ the hexdump enabled :confused:

  • Another little difference is the module indicating readyness after modem reset (done on every attempt with rst_mode=1. While the modem LEDs tell me that the reset should have worked, I only get the RDY message on the serial line with the first try. So possibly the modem does actually not reset at all for some reason – which would be again another indicator for non-readyness for the PPP connection for some time.

My next goal is to check the reset line being triggered, and the finding out whether the first failed connection attempt is somehow responsible for the modem to be ready, or if it’s just the time since reset. Hopefully, that’ll shed some light on the issue overall.

(Final) Update, also for those who might need to debug PPPoS as well for some reason:

  • I’ve given up on finding the root cause of the second connect being necessary with the SIM7000. Wait times only helped when going beyond 60sec, which is inacceptable for me. The modem didn’t indicate any status changes during that time (or at any time later), so I have absolutely no clue what changes after those 60 secs.
    However, with the modem, doing two retries (at any time) helped consistently, so I resorted to lowering the LCP echo intervals and allowed fails, to just have the reconnects fired with minimum delay for now. Overall, with this approach I’m seemingly getting a working connection within the first minute after the ESP is powered on.

  • To answer my initial question myself – onboard debugging (at least on the ESP) of the lower PPP workings is indeed possible, by specifying CONFIG_PPP_DEBUG_ON=y in the ESP SDKCONFIG section in mos.yml. This gets you to see LCP/IPCP packets. Maybe it’s also necessary to specify esp_log_level_set("esp-netif_lwip-ppp", ESP_LOG_VERBOSE); in your code somewhere.

Thanks scaprile again for your help!