Fedora Linux Support Community & Resources Center
  #1  
Old 22nd March 2008, 03:20 AM
hw_tech Offline
Registered User
 
Join Date: Feb 2008
Posts: 4
Resource(s) for basic hardware problems?

I am recently in the situation where I am supporting more and more linux servers. I have no linux background. I know a few things such as how to get into single user mode during boot to change passwords in centos.

What I am looking for is a resource or perhaps someone here can be of assistance with the specifics, which can help me do my new tasks.

For example, I know there are commands to deal with hard drive problems such as smartctl, fsck, badlocks and hdparm. I'm not sure when to use which command and which level I should be at. I know you can run smartctl and badlocks from the root prompt - remotely. I'm not sure about the others. If one suspects a drive is experiencing problems, what would the strategy be: run smartctl to determine hardware/drive is failing - if it shows good, run badblocks (which takes forever)? I'm very cautious about data loss, so anything destructive or which can be - like badblocks, I tend to be a bit shy about.

So, is there a good resource which deals with tackling such? I've done massive google searches and forum searches but no hits seem to address what I'm trying to find out.

I would also like to know service related info. I can stop and start network, httpd and shh (pretty basic I know.) But how do I check if other services are runnings such as pop smntp etc? How would I restart these?

I seem to have a particular situation present itself often: which is, a kernel supported nic is listed in lspci but no corresponding ethx file exist. I can't seem to get nic cards to be "seen" and used by the OS (typically centos.) I've tried all the hardware - even trying three different nic cards from different manufacturers and also in various slots. Modified bios in any number of ways and still nota.

So I would like to know more about how to deal with server issues which are related to hardware and startup problems.

In a few months, I may know enough to where I can be a contributer here. Who knows? lol. Thank in advance,
hw_tech.. who landed in linuxland, and for the most enjoying it.
Reply With Quote
  #2  
Old 22nd March 2008, 03:28 AM
bob Offline
Administrator (yeah, back again)
 
Join Date: Jul 2004
Location: Colton, NY; Junction of Heaven & Earth (also Routes 56 & 68).
Age: 69
Posts: 22,137
I'm sure others will jump in soon to help and I'm little more than a desktop daily user, but this is a resource that I'm sure you'll want to have on hand: http://rute.2038bug.com/index.html.gz
__________________
Linux & Beer - That TOTALLY Computes!
Registered Linux User #362651


Don't use any of my solutions on working computers or near small children.
Reply With Quote
  #3  
Old 22nd March 2008, 04:27 AM
scottro Offline
Retired Community Manager -- Banned from Texas by popular demand.
 
Join Date: Sep 2007
Location: NYC
Posts: 8,142
The CentOS docs are good too, but they don't cover those things all that well. Unfortunately, there really isn't anything comparable to MS' Technet.

You might have to wind up buying a book. Of course, most of them will be for people using GUIs and cover very little about important things like this, so you should definitely make an effort to browse through one at a bookstore before purchasing it.

As for services---yeah, when I changed jobs and went to CentOS from FreeBSD, I was in shock--the documentation is horrible. The best place to look is probably the mjmwired pages, (I guess look for Fedora 5, which will be closest.) He mentions most of the services that are started by default on startup--a surprising amount, especially considering it's supposed to be a server O/S.

What is your background? MS? Mac?

Anyway, there's a command called service, but I use /etc/init.d/ If you look in there, you'll see the startup services.

There's also a command called chkconfig. You can do
chkconfig --list |grep 3:on

This will show you what services are starting by default in runlevel 3, that is, when you boot into a console (run level 5 is when you boot into a GUI).

You can also type ntsysv at a command prompt. That will bring up a curses based dialog (ncurses--looks sort of like text but isn't really) listing what services are running in that level--by in that level I mean if you turn them off in runlevel 5 they will still start in runlevel 3.

What I do, after installation is something like
checkconfig --list |grep 3:on
This will give me an extremely long list. So I run a little command
for i in <list all the ones I don't want, e.g., cups, bluetooth, pcscd, etc.,etc.) ;
do /etc/init.d/$i stop;done
This will shut down those services. I then repeat the command, but instead of /etc/init.d/$i I type
do chkconfig $i off;done.

As there are more than a terminal's worth of things running at start, after this I repeat the command to see what's left.

Now your NIC problem is unusual and makes me wonder if it's a motherboard issue. I suppose you're at one of those companies where they only spend money on laptops for the bosses when their kid breaks the last one, but consider the servers to be overhead. At my last job, a server would run out of room and they'd give me someone's old workstation and tell me to set up some shares.

CentOS really is pretty good at recognizing the standard server NICs--we mostly use Supermicro and Dell, I think, and I've yet to have a problem. Which CentOS are you using? (You can tell by doing cat /etc/redhat-release)
Sigh, I didn't answer your other question. To tell if a service is running, you can use pgrep, e.g.,
pgrep nfsd
For pop and smtp--hrm, I don't know if pgrep shows that actually--I test those two by using telnet--for example

telnet localhost 25
If I get a connection, it should tell me smtp is running--actually, if your mailserver is postfix, you can do pgrep master
If you get a number back, it means it's running. With httpd, you'll get lots of them, as each process opens its own PID (Process ID.)
To restart a service, find it in /etc/init.d and do
/etc/init.d/sshd restart
You should be able to figure out their names from looking at the files in /etc/init.d/

Don't forget the CentOS forums either, they can be helpful, though sometimes complete newcomer questions will be ignored. (Not like yours, but questions like, how do I set up a mailserver, with no more information than that).
CentOS also has a wiki that has some excellent articles--some of the other articles aren't that great, but, some are really good.

Sorry for the ramble, I'm sleepy. So, back to your NICS. As I said, a bit odd. What hardware (both NIC and server) are you using? There is a GUI tool, system-network-config that might be useful as you're learning. The files, as I think you know, are in /etc/sysconfig/networking-scripts. (at least on later versions of CentOS.)

lspci sees it. However, you do ifconfig eth0 and get nothing back, correct? This is one I haven't run into with server hardware, and it does make me wonder if there is a problem with the board.
Reply With Quote
  #4  
Old 22nd March 2008, 05:20 AM
hw_tech Offline
Registered User
 
Join Date: Feb 2008
Posts: 4
I really appreciate the resource posted and the detailed response Scottro. You have it about right - servers are purchased on the cheap. My background was windows servers back "in the day!" However, my recent past dealt with full blown storage/fiber RAID systems, as opposed to slotted hardware raid. I got out of the support arena for general computers and dealt with firmware manipulation and rebuilding failed raid.

Anyhow, my current employ deals with a massive number of tower and racked servers, but nothing on the scale of storage. It's a data center in many respects and each system is for a different client. Many are windows which is easy enough. But linux I am just an unschooled fool. So, my day is putting out hardware fires currently.

I am an expert, as much as one cane be, with respect to editing the network config files for static ip, hostname and dns resolution in CentOs (all very basic.) We use no GUI, ever.

Our server motherboards are gigabyte (the majority) and supermicro(more and more.)

We've noted many of the gigabyte boards are experiencing great difficultly with sata2. Perhaps sata1 as well for all know. When I began here, it's all been sata2. Often there is nothing you can do, bios simply does not see them. Take them to another GB mobo having the same model # stamped and they work fine. The power supplies seem a bit low (285 watts for up to 4 drives), but the majority seem to have no issue. This suggests to me, that a 285 watt power supply is sufficient or just adequate and that perhaps a slightly greater draw from elsewhere in the system, may be just enough to underpower the drives, resulting in the drives not to be shown in BIOS.

Those NIC situations really cause me a great source of frustration. This has happened several times, with various cards. We install via PXE. When a situation like that occurs, often I am able to install via cdrom and no nic issue. However, there have been numerous times (like about 7-12) where I cannot get a working nic (functional in another system) to show in ifconfig.

I usually use ifconfig to note the hardware address and check the ethx file for that address. In my limited experience, I've found its a good idea to check to ensure they match, even if it is rarely a problem. I'm not skilled enough to get that address elsewhere. I'm not sure if lspic contains the address - I'll look into that. However, I do note, that when I disable the onboard nic, the disabled nic still shows in lspci, as does each of the other cards I place in - that is, lspci reflects whatever card I put in the slot - such as realtek, d-link, 3com. And this causes me to wonder if the slot is bad, then how is lspci seeing it? Also, it's a curiousity for me that centos is not getting it's info from the same place lspci is, or vice versa. I'm trying to reconcile why lspci sees it and not the kernel.

I'm not certain if disabling the nic card causes the slotted nic to be eth0. I recall being somewhat perplexed because some systems did and some systems did not. But it wasn't a priority at the time. So I may remember incorrectly. Just something I want to know one day..lol, but, it's not a real issue so far.

In the end, we toss the motherboard and get another one, which may or may not be new. But it just seems odd for such a high number of motherboards to fail reading the pci slots - particularly if lspci reads correctly. I was hoping for other answers - I hate having to say, repeatedly (for various reasons other than the nic) It's a bad mobo. I feel like I'm missing something.

But the drives are really the mainstay of what I do so I have to learn the proper strategy of attack and also when and which level, and what to mount and when and all that jazz.

Services have also been a royal pain, but largely because no one there is good with them. So I can't really rely on their advice. I really appreciate your response on services, I will put that in use ASAP.

It's late for me too. I really appreciate your advice, technical assistance and willingness to assist another.

Thank you,.
Reply With Quote
  #5  
Old 22nd March 2008, 05:23 AM
hw_tech Offline
Registered User
 
Join Date: Feb 2008
Posts: 4
oh, sorry, we typically install centos 5.1 Sometimes clients perfer 4.6 - so we install 4.5 and yum it, which I understand brings it to 4.6.

We also use debian and freebsd to a far lesser extent. I have to read the scripts for these. With Centos, it's already ingrained in the brain.
Reply With Quote
  #6  
Old 22nd March 2008, 08:18 AM
scottro Offline
Retired Community Manager -- Banned from Texas by popular demand.
 
Join Date: Sep 2007
Location: NYC
Posts: 8,142
I'm not too familiar with Deb--with FreeBSD, I find it much easier as it's a far more logical layout--plus, their documentation is what made me realize that it was the writers of Linux docs, not me, who was at fault. FreeBSD will have it all in one easily configured--and well documented--file, /etc/rc.conf.

(bsdnexus.com has some small, but good forums, though there's a few anti-everything not BSD zealots--the best thing to do, till you get to know and love 'em, is ignore them.)

I wonder if it has to do with disabling the onboard nics though--that's about the only thing I can think of that would cause such widespread problems--or, if they're under warranty, SuperMicro, or the vendor who sold them, can sometimes help.
One weird thing I've found--we get our SuperMicro machines from a third party vendor, put together already--that is, we let them put in the raid card, processor and everything else. Some come with one nic, some with two.
Now, we have one machine that had FTP on CentOS 4.something. We updated it on a separate drive and left the old configuration around for a little while, as we made sure we had everything moved over. One nic is on the internal network and the other goes to the DMZ. The funny thing was, that without us changing anything, on the 4.x machine eth0 was one nic and eth1 the other. However, on the 5.x installation, the nics reversed. There was never time to really look into this--we just put it down to some quirk in something or another (It's not relevant to anything here save to say that sometimes it might be completely software related.)
Going back to services, if you are only booting into runlevel three then the ntsysv command should be a good way to view services. Then you can stop (or start or restart) with the /etc/init.d/<servicename> that I gave before. I think the service command uses the same syntax, that is, for example
service httpd start, it's just that I don't use it--no reason, just habit, although it's a bit less typing.

One problem with the CentOS/RH docs is that much of their deployment guide, oddly enough for a server O/S, seems to assume you're running a GUI--sometimes I've found that they won't give the command line way to do something, which is a bit frustrating. FreeBSD's handbook, conversely, assumes you're working from the command line. Again, its documentation is, even in most Linux lovers' opinion, far superior.

On the other hand, much of this probably has to do with the different structures. FreeBSD (and the other BSDs) are a complete system, kernel and userland. However, third party apps are third party, and the handbook, for example, doesn't go into less detail on say, samba, than does the RH guide.

Linux itself is only a kernel--device drivers. Everything else, including the shell, is third party, so it's more of a mishmash. One could argue that samba is no more third party than the bash shell.
Reply With Quote
Reply

Tags
basic, hardware, problems, resources

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hardware problems (maybe?) Helker Hardware & Laptops 7 21st March 2008 10:04 PM
USB device mount problems | FC8 | basic issues... cooch17 Hardware & Laptops 1 29th February 2008 08:39 PM
Basic Hardware jaanleva Using Fedora 4 6th May 2005 08:51 PM
Network problems after install (should be basic for you) cae Installation, Upgrades and Live Media 2 21st April 2005 07:46 PM
Great resource for hard-to-find Hardware (e.g. mini-pci cards) mjman Hardware & Laptops 0 21st January 2005 05:46 PM


Current GMT-time: 09:28 (Monday, 15-09-2014)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat