A technical overview of our server issues

DNA Solves
DNA Solves
DNA Solves
Status
Not open for further replies.
Thanks for all your effort, BigTex! :)
 
Adding my heartfelt thanks, too. :)
 
Are ya done yet, Fred?! LOL Just kidding! Keep up the good work, BigTex!

A suggestion for those of us who are officially addicted: I have been saving pages on my computer of threads I have honestly been wanting to read, but with all the breaking stories never seem to make time for! This way it gives me a chance to look them over and ease my WS addiction somewhat. : )
 
Thanks for your efforts on behalf of WS.

I would like to know what changed? Why are we experiencing hangs now when we did not a month ago? And what about the picture threads? Wouldn't they be more of a problem than a signature? I mean, would it make a difference if we just posted a url rather than a direct link? What is the difference between the sig pictures and the picture threads?

You don't have to answer if time does not allow. I was just wondering.
 
Thanks for your efforts on behalf of WS.

I would like to know what changed? Why are we experiencing hangs now when we did not a month ago?

You don't have to answer if time does not allow. I was just wondering.


My thought also, what was the change that produced this mess.:innocent:
 
I read it all and understood every word :liar: THANKS so much Big-Tex for helping us on this.. hugs to you :blowkiss:
I understood every word as well paperDoll:D

Thanks for your time and effort Big_Tex..youre bloods worth bottling:)
 
Thank you Big Tex & Tricia, your hearts are as vast as the ocean.

As long as I can read the information & opionions, links to pic's are just fine as it is the community here that is rare.

Websleuth's ROCKS!!!!!!
 
My thought also, what was the change that produced this mess.:innocent:

The problem could be as simple as growth in the number of concurrent connections or growth in database size or something such as external source (such as a malformed http get/post command). The problem could be the result of a combination where a particular event causes the system to consume all of its available memory and stop working - a small problem that triggers a larger event. The trick is now to watch the server and try to figure it out. My gut instinct says that we are short on memory but there is something (and it could be as simple as active views) that causes the webserver to run out of memory and freak out.

[Edited to clarify]
The picture thread is only a single thread of messages with pictures in it but the pictures in a signatures are in all threads. This means that those signatures pics are loaded by the server way more often than the pictures in the Picture Thread.


Hope that helps clear it up a tad,
-BigTex
 
The problem could be as simple as growth in the number of concurrent connections or growth in database size or something such as external source (such as a malformed http get/post command). The problem could be the result of a combination where a particular event causes the system to consume all of its available memory and stop working - a small problem that triggers a larger event. The trick is now to watch the server and try to figure it out. My gut instinct says that we are short on memory but there is something (and it could be as simple as active views) that causes the webserver to run out of memory and freak out.

As to the picture thread and the signatures - I think a single object (a picture), even a largish one, is not as harsh on the resources as loading a thread with many objects (each object would need at least a single read from the database). If I find that an external source is causing the issues and we can resolve that then sigs would probably me okay but that is something we simply don't know yet.

Hope that helps clear it up a tad,
-BigTex

I hope you are considering hacking. Small viruses with tracking cookies that can make a website or computer go slow. Many members have experienced this and has been reported. I would look for a server coming from the UK. I would then report it to LE.
 
I hope you are considering hacking. Small viruses with tracking cookies that can make a website or computer go slow. Many members have experienced this and has been reported. I would look for a server coming from the UK. I would then report it to LE.

While security threats are always a concern, it is far more likely that the server would be attacked directly via a DoS attack or Buffer Overflow exploit than attempting to manipulate cookie information. The typical client server model is built around the idea that the server is secured from anything the client attempts to do since that is the most likely security threat for a web server is a manipulation in the http protocol. Security remains a constant concern for any machine on the internet but this problem does not seem to be caused by an external source that I can see yet. Lastly (and I speak from years of experience) the LE will rarely even take notice of a security issue unless you can tie it to an offense such as child *advertiser censored*, national security, etc. Most hackers use far east servers these days or actually use compromised home computers here in the US in a formation called a BotNet. It is unbelievably hard to track down a serious hacker - most so called hackers are termed script-kiddies because the attack machines with pre-made scripts/tools they don't really understand. The real hackers (termed Crackers by most computer security people) are usually like professional thieves, you only know they were there when you find your stuff gone and are long long gone by the time the LE arrive to find.

-BigTex
 
While security threats are always a concern, it is far more likely that the server would be attacked directly via a DoS attack or Buffer Overflow exploit than attempting to manipulate cookie information. The typical client server model is built around the idea that the server is secured from anything the client attempts to do since that is the most likely security threat for a web server is a manipulation in the http protocol. Security remains a constant concern for any machine on the internet but this problem does not seem to be caused by an external source that I can see yet. Lastly (and I speak from years of experience) the LE will rarely even take notice of a security issue unless you can tie it to an offense such as child *advertiser censored*, national security, etc. Most hackers use far east servers these days or actually use compromised home computers here in the US in a formation called a BotNet. It is unbelievably hard to track down a serious hacker - most so called hackers are termed script-kiddies because the attack machines with pre-made scripts/tools they don't really understand. The real hackers (termed Crackers by most computer security people) are usually like professional thieves, you only know they were there when you find your stuff gone and are long long gone by the time the LE arrive to find.

-BigTex

Well, I think tracking IP's is good too. And, I wouldn't be so sure that LE isn't capable of figuring things out.
 
Well, I think tracking IP's is good too. And, I wouldn't be so sure that LE isn't capable of figuring things out.

I think that tracking IPs is a good idea but if are dealing with someone who is truly skilled then you will find that they are "washing" their connection through many hosts and often have penetrated the security of the host such that they can modify the logs unless they are offsited to a inaccessible (trapdoor) host. I didn't say that the LE could not figure things out, the FBI is very good at tracking hackers and such, but they would have to have a reason to do so. If you call your local police or even the state bureau computer crimes divisions you will find it is like reporting a bike theft - they take your info but there is little more they will do unless it poses other threats such as child *advertiser censored*, security information, banking/financial info, etc. The quickest way is to email the ISP you are getting attacked by and ask them to look into it and so on and so on. It is like tracing a phone call prior to the age of digital switching where you had to call the phone company of each trunk to track back the line physically. It can be done but the amount of time and effort has to be worth taking the LE from other things and I doubt that even a outright blatant attack on a non-profit server such as WS would garner much attention.

Having said all that, I am looking into the logs of each http ip address to see if it correlates with any times we suffered outages - nothing yet however.

-BigTex
 
Hallo BigTex,

Thanks for your efforts!

You know, Pharlap and Colomom have the minds of trouble shooters! i.e. how long ago were we okay, and what changed just before we were not okay. Troubleshooting Rule #1, "Ask what just changed before this mess?" Troubleshooting Rule #2 "Ask (yourself) can you really trust/believe the answer of the person to whom you just asked question #1?" In some cases they are not withholding information, just very forgetful or ignorant of what has changed. In other cases, they have a selective memory, preferring to cop to "dumb user" and "if you would do this, that wouldn't happen" (i.e. I ain't telling, because you will give me the heat.) 'SCUSE ME, ALL WAS WELL, NOW WE HAVE EXPLOSIONS AND I KNOW NOTHING HAS SUBSTANTIALLY CHANGED??? Granted, something "can change" with the use of the software (software can have its oddball bugs and limits), sometimes the software is not being managed/adjusted to the best of its needs (woah do I know that), BUT often there is an underlying "change" that occurred triggering the fall of the dominoes.

In 1998, I had a buggy database software running on a friend's server. The software crashed their server so often that it endangered their other customers' sites. We parted on friendly terms, I didn't want them losing their customers due to the "piece" I was running (great idea, poor craftmanship in many ways). I KNEW my software was buggy, though the manufacturer would never admit it (prideful lot they are), and UNTIL they were basically forced to do so (so that people could overcome the failures, or they would lose business, this was high dollar stuff). Fortunately, I had a server to go to who hated the stuff I was running due to the attitude of the software company who put out buggy garbage and would not work with their users, BUT my server knew ways to work around the nasty failures. We just had to keep working with the junk until there were fixes.

In a nutshell (some brainstorming, "needle in a haystack" ideas):
* Was the server's OS upgraded recently? Conflicts with the vBulletin software? Is a change/upgrade needed for RAM or resource apportionment (swap file or whatever you fellows know is needed) due to the OS upgrade?
* Has the vBulletin software received an upgrade lately, conflict with the OS or requiring other setups with resource management? If so, has the vBulletin support group been hit for thoughts?
* Did some new techy at the server company recently piddle with the settings/files that apportion the server's resources OR WS's?
* Has the server company recently added a heavy volume user (or has one of their users recently been hit with heavy volume), or installed a new or updated database software that is sucking the resources as they have been set?
* Is that pathetic little 1GB of RAM actually functioning or did it kick the bucket (or is it kicking the bucket)? Gads, my PC (yes Windows) has 1 GB! This company can't afford more than 1 GB of RAM for its servers?
* Is there some other server type software which has been on that machine that has recently received an upgrade or which is recently taking some new resource sucking hit? (a mail server??)
* How about hardware, have they recently put some hardware on that server which is causing some type of resource sucking hit/conflict?
* Has some new "procedure" for backup or otherwise been instigated lately and that is causing a hit?

Big Tex, you can bet all of us are rooting for you in troubleshooting this. There is no doubt that "how" we run our software can make a dif (I limit all manner of thing with my software, file sizes, attachment sizes, graphic sizes, message sizes, etc.), but has WS changed any of that lately? Did someone mistakenly go in and change what they thought was a slight "of no import" setting and this kicked off the bomb? OR did a recent "above this size in the database" thing set off a well known predicament with the software (requiring another setting change)? Did a recent database size thing recently kick off the type of thing that someone mentioned earlier, something like a flushing of a cache or "resetting" need? I realize I am clutching at some straws and am not well describing some necessities, but thinking of some of the MYRIAD settings my software has, and how I have had to work with my gurus, geeks and server pros to keep things running pristinely.

Bottomline...as you know, a place like WS hopes for a 24/7 presence with "slim to no" downtime, especially that which is unplanned. We count upon those serving our software and our software (and those maintaining it.) Something has hit...and I'm thinking our detectives Pharlap and Colomom may have to find themselves a job in the troubleshooting industry :)

The question is, "What has changed?" AND "Can you get an honest answer to that question from those running the servers." Further, has anyone pulling the vBulletin strings around here (upgrades and/or setting changes) give you an answer to help settle the mystery?

Wrinkles
 
Wrinkles....you mean I wasn't supposed to click that delete button over there??? OMG :eek: :eek: :eek: :truce:
 
Hallo BigTex,

The question is, "What has changed?" AND "Can you get an honest answer to that question from those running the servers." Further, has anyone pulling the vBulletin strings around here (upgrades and/or setting changes) give you an answer to help settle the mystery?

Wrinkles

I snipped the quote but be assured I read it in its entirety. The troubleshooting can be simplified a great deal in this case with regards to what has changed because Tricia leases the entire server (not just a website on someone else's server) and logs show that no one has modified any software, operating system component, etc. This leads me to think that either the data size or content, which changes constantly, or the memory, which is directly affected by the data and server connections, is the most likely culprit. I fully agree with you that most the time someone who tells you nothing has changed is not being accurate, either by accident or purposely and it more often than not the problem. A favorite quote of mine is "Change is the enemy of a stable system".

-BigTex
 
You Know maybe this thread needs to get locked so Big TEX can do what needs to done.And a Big Thanks for what you are doing Big TeX
 
With respect the the virus aspect ever since WS had that major melt down my laptop has been having issues where it freaks out the screen turns blue with white text for a second and then turns it self off. The blue screen with the text is only up for a split second so I have no clue what it says. I have been picking up a lot more cookies than in the past, and really I only visit 5 or 6 sites on a regular basis. I have also been picking up a virus, not sure if it is the same one every day, which my security suit deletes for me. My point being I never had these problems before "the weekend with out WS", and they started shortly after I was able to get back on. I know it's probably not related, but I thought I should let you know, just in case.
 
Big Tex is so right on all counts here. And explains it so well.

I would compare reporting a "crack the web server" attack to a person reporting a phishing email. You can do it, I guess. But someone may have given personal info, an account login for example, and not known it until money goes missing. Or you can be suspicious that someone is trying to steal that information, and you can try to report it, but I think they could only give you an incident # to use for insurance purposes in all reality. That's what they do when you report your car broken into, for example.

From what I've seen in the threads on these outages looks to me like the timing coincided with a change in facility management, and at least one major outage was the whole data center going down, so the odds of something else all of a sudden independent of that but at the same time are I think not probable.

I think there must have been a downgrade of bandwidth available which stressed the server which was already incredibly loaded with this busy site and maybe pushed it into running out of memory on stacked up work to do. I'm sort of amazed one server with 1 GB of memory is serving all this. (or even could do it with more memory, although Big Tex is right about that too.)

Concerning links to pictures, I think they handled it well here. I had links using the BBCode to load the image which they disabled and now link shows. The link is a small number of characters, and clicking on it uses bandwidth of whereever the image is, such as photobucket or whatever, so that's sort of perfect medium of having the links to images but no bandwidth here used for the viewer to see it if they want to click on it.

I don't know about where other images are stored such as sig graphics and whether the server is involved in downloading any of them.

rd
 
Status
Not open for further replies.

Members online

Online statistics

Members online
139
Guests online
281
Total visitors
420

Forum statistics

Threads
609,559
Messages
18,255,640
Members
234,691
Latest member
Scotttacos
Back
Top