Windows Server Forum / Host Integration Server / June 2008
Connection failure
|
|
Thread rating:  |
Asher Levi - 02 Jun 2008 13:05 GMT Hello all,
Since I moved to HIS 2006 We suffer occasionally from the 227 event
FRMR received 0X01
01 = invalid N(R) value sent.
After his event we get the 23 event - connection failure (00AD)
Then the connection performing disconnect/connect.
I have 53 connections (DLC 802.2) with connection to fast OSA
I'm working only with XID type 0 in the connection string
Does anyone know the reason and the cure for this?
Thanks'
Asher Levi
.
Neil Pike - 02 Jun 2008 17:20 GMT Hi Asher, Is HIS2006 the *only* change? i.e. this is the same server that HIS2004 was on, same NIC, same o/s, same nic driver, same switch port(s) etc. etc.? Also nothing changed at the o/s end regarding VTAM/OSA type software? Nothing changed network/routing between the HIS server and the mainframe OSA adapter (s).
With this sort of error I'd be suspecting other change(s)/issues rather than anything HIS related - mainly because I would expect the HIS 802.2/DLC code hasn't had many/any changes done to it. (Not that I know this for sure one way or the other). In any event, an SNA trace at the link service / 802.2 level is probably going to be needed to help get to the bottom of what is occuring. Does the issue occur randomly to all the connections?
> Since I moved to HIS 2006 We suffer occasionally from the 227 event > [quoted text clipped - 17 lines] > > .. Neil Pike. Protech Computing Ltd Microsoft SNA/HIS MVP https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D http://www.linkedin.com/in/neilpike
Stephen Jackson [MSFT] - 02 Jun 2008 20:49 GMT In addition to what Neil indicated, a network trace (Network Monitor, Sniffer, Ethereal, etc.) of the DLC traffic capturing the problem will also be needed. In most case, these types of things occur because of network glitches. If there are routers/switches/bridges between the HIS 2006 Server and the mainframe, you'll be best off capturing network traces on the network segment on which HIS 2006 is connected and the network segment where the mainframe is connected. These two concurrent traces will provide a good view of traffic that is seen on both network segments.
 Signature Stephen Jackson Microsoft® HIS Support
Please do not send e-mail directly to this alias. This alias is for newsgroup purposes only. This posting is provided "AS IS" with no warranties, and confers no rights.
> Hello all, > [quoted text clipped - 19 lines] > > . Asher Levi - 03 Jun 2008 08:10 GMT Hello Neil & Stephen,
Thank you very much for your information.
1. I moved to HIS 2006 from HIS 2000.
The main changes were: much more connection per server (53 against 30)
I'm working with dual homing (NFT mode). (I know that SLB mode is prohibited)
I'm working only with XID 0 connections. (I used to work also with XID 3 type).
The event occurred usually once a day.
Usually the event occurred only with one or two connections in a random fashion.
Also the there are some days with no problems at all.
I saw some received errors in the network statistic of the network card
(No buffers).
In the Main Frame area the only changes were to change the node type to XID type 0.
(No more CPNAME and so on).
As usual the network team didn’t saw any problem in the network port …
In conclusion I also suspected that the event connected to network problems
And I will focus on this
Best regards,
Asher Levi..
> Hello all, > [quoted text clipped - 19 lines] > > . Chris Mason - 03 Jun 2008 17:08 GMT Asher
Using XID 0 rather than XID 3 is a step backwards and may be the cause of trouble, possibly even the trouble you are experiencing. There are some parameters, for example, the PU statement MAXDATA operand value which are passed from one link station to another when using XID 3 which need to be specified - and, of course, specified correctly - when using XID 0.
Chris Mason
> Hello Neil & Stephen, > [quoted text clipped - 62 lines] > > - Show quoted text - Asher Levi - 05 Jun 2008 07:45 GMT Hello Chris
I moved to XID 0 because I don’t need anymore APPC/LU6.2(ILU)
In this time I'm checking the network connection side
By the way I also have others HIS with only 2 connections' XID 0 (for printing)
And I never suffered from the same problem.
So I also think it's connected to the number of the connection (53)
Anyway in case that it's not connected to a network problem I will focus on
The connection type (XID and so on).
best regards
Asher
Asher
Using XID 0 rather than XID 3 is a step backwards and may be the cause of trouble, possibly even the trouble you are experiencing. There are some parameters, for example, the PU statement MAXDATA operand value which are passed from one link station to another when using XID 3 which need to be specified - and, of course, specified correctly - when using XID 0.
Chris Mason
On Jun 3, 10:06 am, "Asher Levi" <sys...@012.net.il> wrote:
> Hello Neil & Stephen, > [quoted text clipped - 68 lines] > > - Show quoted text - Chris Mason - 05 Jun 2008 19:32 GMT Asher
I made the point about your changing from using XID 3 to XID 3 because this is definitely a step backward in general. There was only a slight possibility that this change related to your problem.
It is clear you do not have a good grasp of the difference between and the significance of XID 0 and XID 3. To remedy this situation, take a look at Chapter 3 of SNA Formats:
http://www.elink.ibmlink.ibm.com/publications/servlet/pbi.wss?CTY=US&FNC=SRX&PBL =GA27-3136-20
I make this claim based on your comment "(No more CPNAME and so on)". When you use connections to VTAM over a LAN, you use "switched" as opposed to "nonswitched" definitions in VTAM. When you use "switched" definitions, you use PU statements in a Switched Major Node. There has to be some matching between such a PU statement and an incoming connection attempt - I'm leaving aside the possibility to use the ISTEXCCS exit or the DYNPU=YES function[1]. With XID 0, you are obliged to match on the basis of the "node identification" field of the XID - which is common to all XID formats. This breaks down into two parts which are represented in operands of the PU statement as the IDBLK and IDNUM operands. If you are using XID 0, you must be using these operands on your PU statement - or if you are not, please let us all know what you are doing!
When you were using XID 3, you appear to have been using the PU statement CPNAME operand. Perhaps you were not aware that, having specified the CPNAME operand, you no longer needed the IDBLK and IDNUM operands and, indeed, specifying all of them probably led you to assume that VTAM was imposing excessive definition activity on you, a reputation VTAM has spent most of its working life - since the bad old days of the late seventies! - trying to shake off!
As for the "so on", if you examine the fields in the XID 3, you could see - if you had the necessary education - that a number of the operands of the PU statement were no longer needed. A number of seemingly needed parameter matches - matching additional to that required for identification - evident in the operands of the PU statement is *removed* by use of the XID 3. Thus there is no "so on". In changing from XID 3 to XID 0, you may find that some of the 802.2 parameters which were passed automatically in XID 3 now need explicitly to be specified with XID 0. Clearly in going from XID 3 to XID 0 you lay yourself open to getting such matches wrong. SNA designers went to a lot of trouble in creating XID 3 as an enhancement to XID 0.
I expect that the significance may not be clear from the information provided by the HIS product and that the HIS documentation is not clear enough. I participate in this newsgroup in order to "catch" the VTAM and SNA issues rather than solve problems deep within HIS - about which I know very little. This XID 0/XID 3 "thing" is very definitely an SNA issue - and also maybe a VTAM issue - that needs addressing!
However, you seem to have got some sort of wrong impression that XID 3 applies only when using APPC/LU 6.2 in the context of SSCP-independent LU (ILU) definitions. This is quite wrong. It was with considerable relief that those who were obliged to create definitions for the very popular 3174 using only SSCP-dependent LU types 2 and 3 greeted the advent of the XID 3 and threw out definition activity associated with XID 0 with glee.
I can't really help you with your basic problem. It seems rather unlikely that the XID format has anything to do with it. Either you should stick with XID 0 and make sure you do all the necessary desk checking with your VTAM counterpart to make sure that all the necessary parameter matches are there - or you just use XID 3 and use either the CPNAME operand - preferred - or the IDBLK and IDNUM operands - but there's no need whatsoever to use both.
Your problem is at the level of the 802.2 protocols. I could guess that if there is a buffer constraint which somehow extends to the adapter where the protocol is observed, it could lead to the sort of problem you describe. I would expect your Microsoft helpers to be of more use here.
If a buffer problem in the adapter is really to blame, you might like to relieve the stress - as it were - by operating with a smaller receive window for the adapter. When I taught this topic I used to encourage the use of an NCP tool which indicated what the required window size was for the sort of traffic I was generating in my test systems over the local centre's LAN. It was interesting that the required window size was never more than 11 or 12 and so I set my window size maxima to 20 so that, under normal operating conditions, the window size was never a constraint on the traffic.
Incidentally, you may decide to specify a receive window for the adapter of, say, 20. If you use XID 0, you now need to make sure that the value of the MAXOUT operand of the matching PU statement is 20. If you use XID 3, you can leave out the MAXOUT operand since the value you specified for the adapter is conveyed to VTAM in the XID 3 data. All so, so much easier!
The reason I'm making all this fuss is that I don't want anything in the archives of the newsgroup which begins to imply that changing from XID 3 to XID 0 is any way to try to solve problems - of any sort!
Chris Mason
[1] Since you point out you are not using ILU, you cannot be using DYNPU=YES.
> Hello Chris > [quoted text clipped - 104 lines] > > - Show quoted text - Asher Levi - 12 Jun 2008 09:01 GMT Hello Chris,
Thank you for your great tutorial in the VTAM area.
I didn’t mean to recommend to anyone to work with XID 0 formats.
In did XID type 3 is the preferred method.
My problem didn’t come from the VTAM area
From my experience (10 years in SNA G/W/HIS and 15 years with MVS/VTAM)
The XID type 0 works fine if you do some tuning (especially in the DLC timers)
Further for me it solved the problem of pending connection (in the sna/gw) after doing IPL.
The pending connection occurred randomly after doing IPL to the MF.
Best regards,
Asher
Asher Levi - 03 Jun 2008 12:10 GMT Hello again,
I just notice some very important point (I think…).
The event 227 and the problem always occurred in the same time when someone update the POOLS,WS and so on (update to snacfg.com event 670)
So when I see the 227 it was always in the same time when the 670 event appeared.
By the way I have many 670 events without any problem.
Does this fact give us some clues?
Best regards,
Asher Levi
> Hello all, > [quoted text clipped - 19 lines] > > . Stephen Jackson [MSFT] - 03 Jun 2008 16:46 GMT Updating the configuration by updating a Pool or Workstation wouldn't have any affect on the underlying DLC connections. At least, I've never seen this occur before.
 Signature Stephen Jackson Microsoft® HIS Support
Please do not send e-mail directly to this alias. This alias is for newsgroup purposes only. This posting is provided "AS IS" with no warranties, and confers no rights.
> Hello again, > [quoted text clipped - 37 lines] >> >> . Neil Pike - 05 Jun 2008 00:00 GMT Asher - are the same nic(s) used for admin/users as well as for the dlc connections to the mainframe? If so then it might be a network issue with traffic/congestions/collisions caused by the extra admin access.
I'd expect to see network errors on the switch ports and/or switch/nic negotiation being done at half duplex for something like that....
> I just notice some very important point (I think
). > [quoted text clipped - 37 lines] > > > > Neil Pike. Protech Computing Ltd Microsoft SNA/HIS MVP https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D http://www.linkedin.com/in/neilpike
Asher Levi - 05 Jun 2008 07:28 GMT Hello Neil ,
I have a 2 Gigabit server adapter ports(HP 373I) in a team connection (NFT)
So I believe it can't be a bottleneck on the network card.
When I checked the network statistics I saw some receive errors (no buffers)
But my network team didn't find any problem with the network port.
Anyway as you wrote me before I also believe that the problem connected to a network problem.
So I disabled one of my network cards in order to stop the team.
I try to check if the network teaming connected to the problem.
I know for sure that if you work with network a teaming with SLB mode you suffer from similar problem
I did the change yesterday and since then the error didn't reoccurs
I will update when I have the final results
best regards,
Asher
> Asher - are the same nic(s) used for admin/users as well as for the dlc > connections to the mainframe? If so then it might be a network issue with [quoted text clipped - 48 lines] > https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D > http://www.linkedin.com/in/neilpike Stephen Jackson [MSFT] - 05 Jun 2008 21:00 GMT Asher,
Since you mentioned that your NICs are teamed, I'd like to mention that DLC does not support NIC teaming. We have seen various issues when using NIC teaming with DLC connections. In some cases, customers never have a problem. However, we have found that random network related issues have been resolved when NIC teaming was disabled.
Thanks...
 Signature Stephen Jackson Microsoft® HIS Support
Please do not send e-mail directly to this alias. This alias is for newsgroup purposes only. This posting is provided "AS IS" with no warranties, and confers no rights.
> Hello Neil , > [quoted text clipped - 79 lines] >> https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D >> http://www.linkedin.com/in/neilpike Neil Pike - 05 Jun 2008 21:25 GMT Hi Asher - let us know the outcome. As you say it shouldn't be a network throughput/congestion issue with 1Gbit/sec nics - as long as the nic's/switch ports aren't misconfigured or erroring that is. Hopefully your network dept know what they're doing - my experience is that, unfortunately, they often don't... Neil Pike. Protech Computing Ltd Microsoft SNA/HIS MVP https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D http://www.linkedin.com/in/neilpike
Asher Levi - 12 Jun 2008 08:28 GMT Hello Neil & Stephen,
I disabled one of the team network in all my HIS servers and since then I didn’t suffer from the 227 event.
I didn’t decide yet to drop entirely the team definition
In case of dropping the users will suffer from downtime.
Further more in case of NIC problem we can enable the other NIC immediately
In case of suffering from more network problem I will drop the team definition entirely
Thank you for yours helps in this issue.
Neil Pike - 12 Jun 2008 17:39 GMT Thanks for the update Asher. Yet another odd glitch that can be put down to teaming drivers then! It won't be the last one either. Thanks for the update
> Hello Neil & Stephen, > [quoted text clipped - 13 lines] > > > Neil Pike. Protech Computing Ltd Microsoft SNA/HIS MVP https://mvp.support.microsoft.com/profile=BE66F0D8-9D78-47EF-840A-08E6D8522A2D http://www.linkedin.com/in/neilpike
|
|
|