My wife’s new Toshiba Tecra M3 was hobbled by two major Cisco VPN bugs. The first problem is caused by a bad interaction between the Cisco VPN client and the Intel Wi-Fi driver, which fails to work when resuming from a suspend. The second is that the newest Cisco VPN drivers cause random Matlab execution failures while the VPN is connected. I fixed both bugs and thought others might find the solutions useful. However, Windows Explorer still crashes occassionally, so I’m very interested in a new Cisco VPN Client release that fixes both bugs without any side effects. Having to deal with issues like this is a fairly damning indictment of the whole QA process theoretically employed by the 3 biggest tech companies: Cisco, Intel, and Microsoft. Each bug separately was a nightmare to diagnose (who would think at first that the Matlab problems could be caused by the VPN?).
The suspend/resume error occurs between the last 5 versions (4.0.4 through 4.6.02.0011) of the Cisco VPN client and the Intel(R) PRO/Wireless 2915ABG built-in Wi-Fi running Network Connection Driver 22.214.171.124, dated 2004-10-29. This is the latest Wi-Fi driver for the Toshiba Tecra M3 running Windows XP SP2. I can consistently connect to the VPN with this software. But when I then suspend and resume the laptop, the Wi-Fi no longer works, and therefore the VPN is unable to reconnect. Specifically, the Wi-Fi can see and get on a local network, and it can pass traffic within the subnet to itself, the router, and another computer. But it is not able to communicate beyond the router, including not being able to do any DNS lookups. Choosing Repair for the connection, using ipconfig /release and /renew, and choosing a different network all fail to work.
The problem is caused by corruption in the route tables, which store the Default Gateway. When the VPN is active, this gateway is correctly set to the VPN server, which is 126.96.36.199 in my case. The problem is that when the laptop comes back from a suspend/resume, the VPN Client loses the connection, but it does not restore the Default Gateway to the IP address of my local router, which is 192.168.200.1. At that point, all network connections besides pinging nodes on my local subnet fail, since the computer can’t talk through the router. VPN reconnection also fails since the VPN server is no longer accessible either.
The quick fix (which is also useful as a fallback), is to create a batch file with two commands (or to just run them from a command window). “route -f” deletes the (incorrect) Default Gateway from the route table. “ipconfig /renew” reconfirms the IP address with the Wi-Fi router, and also restores the correct gateway to the route table. You can create a batch file by right-clicking on the Desktop, selecting New Text File, naming it “networkfix.bat” and hitting enter, right-clicking the file and selecting edit, enter the two commands on two lines, and close and save. Run it by double-clicking.
The more permanent fix is to lower the interface metric for the Wi-Fi so that its Default Gateway is ranked higher. However, this seems to occasionally result in Windows Explorer crashing after a suspend/resume. (If Explorer crashes, hit Ctrl-Alt-Del, choose Task Manager, select File: New Task (Run…), type Explorer, and hit enter.) Interface metric is a measure of cost, so lower is better. The default is 30. When the Cisco VPN is connected to the VPN server, it sets the metric to 1, which is the highest priority. After disconnection, when the Cisco VPN incorrectly adds the VPN server as a gateway, it sets the metric to 10. So, by setting the Wi-Fi router metric to 5, we can have it be lower priority than the VPN gateway when connected to the VPN but higher when we are not, which is the correct behavior. To do this, right-click the Wi-Fi icon on the System tray, choose Status, click Properties, select Internet Protocol (TCP/IP), click Properties, click Advanced, unclick Automatic metric, enter 5, and hit OK twice and Close once. More details, including the route tables, are below.
The real fix is for Cisco and Intel to get their drivers to work together. And, of course, Microsoft should not be using a driver architecture that occasionally causes Windows Explorer to crash.
MATLAB/CISCO VPN BUG
The Matlab problem occurs with the two newest Cisco VPN Client versions, 4.6.01.0019 and 4.6.02.0011, and Matlab 188.8.131.525 (R14SP2). Matlab will fail to execute while they are connected, and even worse, will throw a variety of different errors each time it is run. The Matlab runs complete correctly when the Cisco VPN Client is running but disconnected. The problem was fixed by uninstalling the buggy versions and then installing Cisco VPN 4.0.5(B), and also does not occur with versions 4.0.5 and 4.0.4. I’ve never seen a network driver bug cause random errors in a regular program, particularly one that shouldn’t even be accessing the network.
Here are the Matlab errors from 3 sequential runs where Cisco VPN 4.6.02.0011 was connected:
??? Subscript indices must either be real positive integers or logicals.
Error in ==> sortrows>sort_back_to_front at 162
ndx = ndx(ind);
Error in ==> sortrows at 123
ndx = sort_back_to_front(x_sub);
Error in ==> griddata at 75
sxyz = sortrows([x y z],[2 1]);
Error in ==> put_ncom at 157
??? NaN’s cannot be converted to logicals.
Error in ==> interp1 at 142
Error in ==> put_ncom at 180
??? Attempted to access cmsk(-2.14748e+009,55); index must be a positive integer or logical.
Error in ==> creep at 37
if (cmsk(i,j) < 0.5) Error in ==> put_ncom at 164
ROUTE TABLES FOR SUSPEND/RESUME BUG
The route tables are on the Usenet posting.