Distinguish backup file names or pay the price!

So far, no one has found exercise to be beneficial to servers. Purposeless repetitive motion may be good for human muscles, but your SQL Server instance experiences no gain for the pain. Here’s a good example: taking useless backups. Let me explain…

Running around a track
SQL instances shouldn’t do this

So far, no one has found exercise to be beneficial to servers. Purposeless repetitive motion may be good for human muscles, but your SQL Server instance experiences no gain for the pain.

Here’s a good example: taking useless backups.

(“Did she say useless backups? I’ve never heard of such a thing!” Yeah, just wait.)

Backup file names are critical

Traditionally, backup files are named after the database and the backup type, and given a timestamp. So you’ll see something like master_FULL_20170101.bak. If you like to stripe your backups, you might name the files something like 1of5MasterFull20170101.bak, and so on.

But I have run across shops that takes backups without bothering to time stamp the file name: master_FULL.bak.  These shops either overwrite each backup file, with each successive backup (using INIT and FORMAT), or add to the backup set (which I find mildly annoying, but to each their own).

The problem with using the same backup file name over and over is if you have a cleanup mechanism that deletes old backup files!

The same-name cleanup issue

Let’s say that in your shop, you have Minion Backup (MB) installed and running with the following options:

  • INIT is enabled
  • FORMAT is enabled
  • Backup file retention (in hours) is 48, so we’ll keep 2 days’ worth of backups
  • Backup name is set to %DBName%%BackupType%.bak, which works out to DB1Full.bak for a full backup of DB1.

Note: For more information on the %DBName% and %BackupType% inline tokens, see “About: Inline Tokens“ in our help center.

Here is what happens:

  1. On day 1, MB takes a backup of DB1, to \\NAS1\SQLBackups\DB1Full.BAK.
  2. On day 2, MB takes a backup of DB1, which overwrites the file \\NAS1\SQLBackups\DB1Full.BAK.
  3. On day 3, MB takes a full backup of DB1 (which overwrites the same file). And then the delete procedure sees that it has a file from day 1 (>48 hours ago) that needs deleting. And so it deletes \\NAS1\SQLBackups\DB1Full.BAK. Remember, this is the file that MB has been overwriting again and again.
  4. On day 4, MB takes a backup of DB1, to \\NAS1\SQLBackups\DB1Full.BAK.. Then, it sees that it has a file from day 2 that needs deleting, and so deletes \\NAS1\SQLBackups\DB1Full.BAK.

See? From Day 4 on, we’re creating a backup file just to delete it again!

Fixing the issue if you have MB

One excellent way to figure out if you have this problem is to notice that, hey, you don’t have any backup files. Another indicator in Minion Backup is if you see “Complete: Deleted out-of-process. SPECIFIC ERROR:Cannot find path ‘…’ because it does not exist.” in the Minion.BackupFiles Status column.

But the real smoking gun is if you haven’t time stamped your backup files in Minion.BackupSettingsPath. Here’s how to fix that:

That will give each file its own name, and eliminate the chance of this ever happening.

And it will prevent your server from performing useless exercise. Me? I’m going to hit the gym.

Technology: Shoot yourself in the foot faster!

Most of us love technology. And most of us have experienced how blindingly fast technology can provide some degree of cataclysmic failure. Especially when you start by saying, “I should go ahead and do that real quick…”

Most of us love technology. And most of us have experienced how blindingly fast technology can provide some degree of cataclysmic failure. Let me explain.

I’ve been getting better and better about planning out my work week, listing out what needs doing each day, and sticking to it. So I knew that today I would be working a little on a few website updates in the morning, and then the rest of the day would be free for coding.

(I pause here for the audience to have a hearty chuckle.)

So here we are. I log onto the WordPress back end for MinionWare.net, change the wording and format of some text elements, and double-check that they look okay.

The other thing on my list is making sure I have a specific plugin installed. Is it? Why, yes it is! That’s grand, I guess my work here is about done.

But hey, look at that. A bunch of the other plugins need updating. I should go ahead and do that real quick…

(I pause here for the audience’s gasps of horror.) But wait, let me reassure you: I did pause and download a fresh WP backup.

And then I updated three WordPress plugins. Why, oh why did I do that? Why, when I know better?

The website came up as gibberish. Every page, every post, came up as complete gibberish. Of course I immediately restored the WP backup…which did absolutely nothing to help. It turns out that these backups are pretty much for content, not for restoring plugins to a specific state.

Goody.

After a good deal of fighting and rage-coffee, I narrowed everything down to one culprit, killed the plugin with fire, and confirmed that the site was up and looking good.

Why is it so much easier to destroy than to fix?

This is totally a common theme in life. From the big glass bowl my kid shattered in the sink, to the car we (I won’t say who) scraped against a wall, to the appointment we missed. So much of our time is spent cleaning up mistakes, paying to have them fixed, and making up for lost time.

And technology lets you break things so much faster! I can drop a bowl and spend 30 minutes cleaning it up, but I can drop 2 Tb of data without a thought and spend weeks trying to get it back. (I mean literally, that’s what it takes to drop 2Tb…not thinking at all.) I can scrape the paint on my car and just leave the thing scraped…but I can bring down a years-old website with the click of a button.

You see a theme here?

I’m starting to see that a large percentage of an IT professional’s life is (or should be) disaster prevention – teaching yourself to triple check what server you’re connected to, making sure backups are up and running. And another very large percentage is disaster recovery, in one form or another. Yes, of course I mean traditional SQL disaster recovery. But I also mean recovering from the borked website, the forgotten perfmon trace, the third missed meeting this week (where your manager noticed particularly that you weren’t there).

Prevention and recovery

With technology, as with life, automation is a huge part of the solution. But, it’s not the whole solution.

  • We can automate database backups.
  • We can automate WordPress Backups.
  • We must set reminders for meetings.
  • We must set reminders to get the car’s oil changed. (Also: to keep off your dang phone while driving.)
  • We should create standard procedures for maintenance and downtime.
  • We should create standard procedures for managing personal tasks. (I’ve become a huge fan of the Bullet Journal method for this.)

And of course, we can’t prevent everything. So sometimes, we spend the morning staring furiously at wp-admin folders in FileZilla, instead of coding.

Good luck with the chaos.

-Jen

Five minutes to freedom: Installing Minion Enterprise

Get your repo server, your Minion Enterprise download, and your license key (trial or permanent). Install and config take just five minutes, and your entire enterprise is in order.

Let’s take five minutes and get our entire enterprise in order.

Get your repo server, your Minion Enterprise download, and your license key (trial or permanent). Install and config take just five minutes.

02-Install1. Install

Extract the MinionEnterprise2.2Setup.exe and run it on your repository server.  Give the installation “localhost” for the instance name.

2. Install the license key

Once you receive your license key – trial or permanent – from MinionWare:

  1. Copy License.txt to C:\MinionByMidnightDBA\Collector on the repository server.
  2. Rename the file to License.dll
  3. From Powershell, run the command:
    C:\MinionByMidnightDBA\Collector\License.exe Install

3. Configure email for alerts

Connect to the repo server and insert your alert email:

4. Configure servers

Insert (or bulk insert) server to the repo:

5. You’re done!

Jobs will begin kicking off to collect data within the next hour. Some jobs run hourly, some run daily or weekly or monthly. You’ll start getting tables full of useful data, and email alerts that actually mean something.

If you’re impatient to start getting some of the good stuff right away, kick off the CollectorServerInfo% jobs manually. These populate data in the dbo.Servers table, which other jobs need to run. You’ll start noticing data in the Collector.DBProperties, Collector.ServiceProperties, and Collector.DriveSpace tables first, as these jobs run most frequently.

While ME is taking care of your shop for you, you can get to know it better with some of our better resources:

Download our free maintenance modules:

  • Minion Backup: Free SQL Server Backup utility with world-class enterprise features and full lifecycle management.
  • Minion Reindex: Free SQL Server Reindex utility with unmatched configurability.
  • Minion CheckDB: Free SQL Server CHECKDB that solves all of your biggest CHECKDB problems.

SQL maintenance is a lifecycle, not an event!

You’ve heard me talk about this many times, in so many different ways, but it’s worth repeating: SQL maintenance lifecycles are important.

People who disagree, disagree because they spend too much time firefighting and don’t have time to really think about it. Don’t know anything about SQL maintenance lifecycles? Let’s touch on a couple of points…today, we’ll cover SQL backup.

You’ve heard me talk about this many times, in so many different ways, but it’s worth repeating: SQL maintenance lifecycles are important.

People who disagree, disagree because they spend too much time firefighting and don’t have time to really think about it.  Don’t know anything about SQL maintenance lifecycles? Let’s touch on a couple of points…today, we’ll cover SQL backup.

SQL maintenance lifecycles: Backups

Backups have the biggest lifecycles of all, because the uses for backup files are so varied.

They’re used for restoring databases to the original server, sure.  But here’s a short list of the other kinds of things you need to do with backup files:

  • Restore to development, QA, etc.
  • Send to vendor
  • Initialize replication
  • Initialize Availability Groups
  • Restore to a secondary server to run CheckDB

A backup is not a single event

A database backup isn’t a single event.  But unfortunately, nearly every vendor out there – and most DBAs – treat backups as a single event that has a start and a finish.  “Look, I took a SQL backup and produced a file, and now I’m finished!”  But that’s not remotely the case.  We use and shuffle SQL backup files around all day and all week long on our various servers.

This is why I’m so against those tools that I call “network backup tools”.  You know the ones: Backup Exec, AppAssure, etc.  Your network admin loves those tools, because he doesn’t understand SQL and he wants the same backup utility for the whole shop.

But network backup tools simply aren’t going to cut it for SQL maintenance lifecycles, because they take snapshots of the data and log files.  And quite often those two files aren’t fully synchronized when the snapshot takes place, so you wind up creating a new log file and losing whatever data was in there.  Not only that, but you can only attach those files somewhere, usually on the same server they came from.

Without the understanding that your maintenance has a lifecycle, you wind up with a lot of piecemeal solutions.  You write something to restore a database on a development box, then another DBA does the same thing, then another, then another.  Soon you have a huge mix of solutions that need support, all created by a mix of DBAs over the years…it’s an unmanageable web of solutions, each one with its own quirks and limitations.  That’s no way to run a shop, and it’s no way to support your data.

SQL Maintenance Lifecycles: SQL backup done right

Minion BackupThis is why we’ve made Minion Backup the way we have.  Minion Backup is the only backup tool on the market that treats your environment holistically.  Minion Backup understands your SQL maintenance lifecycle and it works for you to provide you built-in solutions that you just deploy.  There’s no need to write your own processes, or manage something separate because it’s built into Minion Backup by default.

Let’s take for instance that you need to back up your SQL certificates, because you know, who doesn’t?  With Minion Backup you can easily backup every certificate, even to multiple locations.  In fact, you can back up your certificates to as many locations as you need.  This functionality is built into Minion Backup, and setting it up is a simple table setting.

Not only that, but since we store the private key password encrypted, and give it to you in the log, you can even rotate the private key passwords on any schedule you like to increase your security.  This way if you change the private key password every month, then you’ve always got access to the encrypted value in the log so you can restore that certificate with ease.  This is what we mean by SQL maintenance lifecycle management.  All of our SQL maintenance products have this level of lifecycle management.  And we’re improving them all the time.  We work closely with our users to give them built-in solutions that matter.

All of our SQL maintenance products are completely FREE, so download them and try them now.  And if you like what you see, then try our enterprise products and see the power of bringing your SQL maintenance lifecycles into the enterprise architecture.

Download our FREE maintenance modules below:

  • Minion Backup: Free SQL Server Backup utility with world-class enterprise features and full lifecycle management.
  • Minion Reindex: Free SQL Server Reindex utility with unmatched configurability.
  • Minion CheckDB: Free SQL Server CHECKDB that solves all of your biggest CHECKDB problems.

 

Integrity Checks are Useless*

Integrity checks have become more and more important over the years, as companies store more and more data. This data is absolutely critical, so when corruption happens, it can break a company. But in all but the smallest shops, running any kind of integrity check is all but completely useless. Unless you have Minion CheckDB.

Integrity checks have become more and more important over the years, as companies store more and more data. This data is absolutely critical, so when corruption happens, it can break a company.

But in all but the smallest shops, running any kind of integrity check is all but completely useless.

Integrity checks without management

DBCC CHECKDB was originally written without consideration of management. For a very long time, you could only output CHECKDB results to the client – meaning you had to run it by hand – or to the error log. Seriously, what good is that? Later, Microsoft added the ability to output CHECKDB results to a table, but that remained undocumented for years so very few people knew it even existed.

That’s not the only problem. A lot of DBAs don’t run CHECKDB at all, because large databases take a long time to process. CHECKDB takes an internal snapshot for its operations. For large databases, that snapshot can fill up the drive, killing the integrity check process entirely. So, you have nothing to show for your efforts. Some DBAs tried to get around this issue by moving to CHECKTABLE instead, but that process comes with the same issues, and is even harder to manage because there are so many more tables than databases.

Integrity checks without reporting

Another reason that running DBCC CHECKDB is useless is this: reporting is still very difficult. Let’s say you have 50 servers – yes, I know a lot of you may have many more than that – and CHECKDB is set to run through a maintenance plan, outputting results to the SQL error log. In this scenario, you have to mine the log for CHECKDB results. That means you not only have to know when the CHECKDB completes on each and every server, but you also have to know what the log entry will look like, and how to parse out the results so they’re readable…again, on every single one of 50 servers. There is simply no easy way to consume CHECKDB results on any kind of scale.

Useful integrity checks with Minion CheckDBMinion makes integrity checks useful again

The whole goal of Minion CHECKDB was to transform these issues surrounding integrity check processing into simple settings. Now DBAs of every skill level can have a stable, standardized integrity check process for even their largest databases.

Minion CHECKDB has a huge number of groundbreaking features. MC solves the reporting problem by capturing integrity check results to our own log table, along with a huge amount of additional information. Now it just takes a simple query to get the results. Instead of the native raw CHECKDB results, the MC log table tells you how many corruption errors were encountered. Then you can drill deeper from there if you like.

We’ve also solved the problem of the large database that can’t be processed because it runs out of snapshot space. In fact, we’ve solved that a few ways, and I’ll discuss a couple of them here.

  • You can create a custom snapshot on a different drive that has more space. MC manages those snapshots for you, so it’s literally a set and forget operation.
  • Or, you can restore a recent backup of your database to a remote system and run CHECKDB against that instead. Remote CHECKDB even brings the results back to the primary server, so you don’t have to go hunting for them. Again, this is a setting; you configure it, and then MC manages the whole process for you.

Of course, there’s more

There are many more features in Minion CHECKDB that I didn’t mention. But since MC is completely free, you can download it and install it on as many instances as you want. Go explore!

Put MC on all your systems and start doing integrity checks again. We’ve taken away all the excuses. And when coupled with Minion Backup and Minion Reindex (both also free), you’ve got a maintenance suite that meets all your needs from planning to reporting, and everything in between.

 

*Without proper management and reporting!

Webinar: Introducing Minion CheckDB!

Minion CheckDB will be available for download on February 1, 2017! In celebration, we’re having a Minion CheckDB webinar on Wednesday, February 1 and on Friday, February 3 at 12:00 PM.

 

productimg_checkdb

Minion CheckDB is available for download as of February 1, 2017!

In celebration, we had Minion CheckDB webinars.

Done: And, we will be giving away 3 licenses of Minion Enterprise to one lucky winner* at each webinar. Must be present to win! 

Minion CheckDB completes the MinionWare maintenance and backups suite in style. Each solution is plug-and-play for the busy DBA, and deeply configurable for those shops with in-depth needs.

This new module is MinionWare’s most ambitious free release yet, featuring all of the rich scheduling and logging functionality in previous products, plus remote CheckDB, multithreading, custom snapshots, rotational scheduling, and more.

In this webinar we’ll show you how this FREE tool by MinionWare can meet your needs with almost effortless management. You are going to LOVE it.

If you simply can’t wait, check out the Minion CheckDB tutorials!

*(Giveaway offer is not open to previous ME winners.)

Managing Backup Issues in Availability Groups

Let’s talk about some of the challenges that DBAs face when managing backups with Availability Groups. Then we’ll talk about solving these issues with Minion Backup.

A user asked me the other day how to go about managing backup issues in Availability Groups.  Well, Minion Backup is fully Availability Group aware, and has innovative features that makes it a complete solution.  This is one of the big reasons we coded MB: any solution for dealing with the issues listed below has to be manually coded. We wanted to stop re-inventing the wheel over and over.

Let’s talk about some of the challenges that DBAs face when managing backups with Availability Groups. Then we’ll talk about solving these issues with Minion Backup.

Managing Backup Issues in Availability Groups – The Issues

1. Back up from…wherever

Availability Groups don’t let you pick a specific server to run backups on.  You can only give weight to a specific node, but you can’t say that you want logs to run on Server1 and Fulls to run on Server2, etc.  You can code those choices in, but if you decide to make a change, you have to make it on all the nodes manually.

2. Mixed file locations

Pretty much every backup solution on the market appends the server and/or instance name to the backup path.  Usually, this is a good idea: if you have a centralized backup drive, you’ll want your backups separated by server name.  Availability Groups introduce a problem though, because you can back up on different nodes. So of course, each backup will be under a different path (based on the node name).  Full backups could be going to \\NAS1\SQLBackups\Server1, while the log backups for the same Availability Group would be going to \\NAS1\SQLBackups\Server3 and even \\NAS1\SQLBackups\Server2.  The backups can move around as nodes go up and down.  Restoring in this scenario can be difficult if all the backups aren’t in one place.

3. COPY_ONLY woes

Backups have to be run with COPY_ONLY on non-primary nodes.  This isn’t an issue on its own, but it is something that you must code into your solution.  And if you remove a database from an Availability Group, then you have to remember to manage that in your code (or preferably, to have coded it properly in the first place).  Either way, it’s just one more thing to manage.

4. Work multiplier

Any change to a schedule, or to a backup setting must be made on all nodes.  Nothing in the Availability Group setup lets you manage your multiple backup locations, and remembering to do this on all the nodes is doomed to fail.

5. Backup file backlog

If your backup node fails, you lose your record of where its backups were. Without that log, the backup files won’t delete on your defined schedule.  Sure, you could probably get that log back by restoring a copy of the management database from that server, but how much time has gone by since then?  That is, of course, if the management database was being backed up properly.  And if all that happens, it’s just one extra step you don’t need when you’ve got a node down.  You shouldn’t have to worry about those tasks that should be taken care of for you.

These are the biggest issues with managing backups in Availability Groups. Now let’s look at Minion Backup. It solves each one of them with no coding required.

Managing Backup Issues in Availability Groups – The Solutions

Availability Groups: Managing Backup Issues

1. Back up from the right server

Backing up from a specific server is easy in Minion Backup.  Minion Backup knows if your database is in an Availability Group, and you can set the “PreferredServer” column in the Minion.BackupSettingsDB table to tell it where you want each backup type taken.  You have the option to use the backup preference in the Availability Group settings by configuring PreferredServer to ‘AGPreferred’, or you can use the name of a specific server.  That means you can set a specific server for fulls, and a specific server for logs if you want.  If you want Minion Backup to determine your PreferredServer dynamically, you can code your own logic use it as batch precode.  While it’s an easy setting, you can make it as flexible as you need it to be.

2. Files in one place

We know that backups coming from different servers means that all of your different backups can be in different locations.  However, Minion Backup solves this problem easily using server labels.  The ServerLabel field (in Minion.BackupSettingsPath) overrides the ServerName in the backup path, so that all of your backups go to the same location.  You can set ServerLabel to the Availability Group or listener name, or even the application name if you want.  So, instead of sending backups to \\NAS1\SQLBackups\Server1, the backups would go to \\NAS1\SQLBackups\MyAGName.  As long as you set all the nodes to the same ServerLabel, then no matter which node they backup to, they’ll all go to the same location.  Now you don’t have to search around for where the different backups are and your restores and file deletes are easy.

3. COPY_ONLY automatically

If you want to take a full backup on a non-primary node of an Availability Group, then you have to use COPY_ONLY.  And of course, you do have to code that into your own solution.  However, Minion Backup is fully aware of the Availability Group, so it does this for you automatically.  There’s nothing you need to code, and nothing you need to manage.  It just does it.  As databases come and go out of your Availability Groups, Minion Backup makes the right choice and does what it needs to do.

4. Sync settings between nodes

Manually changing settings on each node of your Availability Group is just asking for trouble.  You could have a new node that you haven’t heard about, or you could just forget about one, or a node could be down and then you get busy and forget to push the changes when it comes back up.  So what you need is a way to automatically push all of your settings changes to all the nodes.  This is where Minion Backup comes in.  All you need to do is turn on the SyncSettings column in the BackupSettingsServer table and Minion Backup will keep all your node settings in sync.  This also means that as nodes come and go, they’ll get the new settings, and if a node is offline, Minion Backup will push them when it comes back online.

5. Backup files, handled

If you could keep your file deletion schedules even when a node goes down, or to be able to easily find the file locations, it can keep you from having a bad day.  Minion Backup spares you the worry about where your files, are or whether they’re going to still be deleted on schedule.  Again, Minion Backup makes this very easy with the SyncLogs column in the BackupSettingsServer table.  Your backup logs will automatically be synchronized to all nodes, so locations and delete times are always in place.

That’s a good roundup of the Availability Group features in Minion Backup.  We love these features, because we’ve seen firsthand how much easier it is to manage your Availability Group backups with Minion.

Go on, solve your problems: download Minion Backup!

Database snapshots for everyone!

If you’re lucky enough to have SQL Server 2016, go on and upgrade to SP1….you get extra-lucky special database snapshots in any version! Oh what’s that? You don’t use snapshots? Well let’s see a couple of good use cases for them.

Database snapshots have been available in SQL Server Enterprise edition in the last several versions. But if you’re lucky enough to have 2016 at work, go on and upgrade to SP1….you get extra-lucky special database snapshots in any version!

(Be sure to read the footnote for “Database snapshots” in that link.)

What are database snapshots for?

Oh what’s that? You don’t use snapshots? Well let’s see a couple of good use cases for them:

  • Take a database snapshot before an upgrade, data load, or major change. It’s WAY faster than a backup, and you can roll back the changes if you need to, almost instantly.
  • Customize integrity checks. When you run DBCC CheckDB or DBCC CheckTable, behind the scenes SQL Server creates a snapshot of the database to run the operation against. You can also choose to create a custom snapshot…among other reasons, so you can specify where the snapshot files are placed. This is especially useful for shops running tight on disk space.

Speaking of integrity checks and snapshots, Minion CheckDB is slated to come out on  this coming February 1.  Be on the lookout…it’s going to blow your mind.

Sign up for your trial of Minion Enterprise today, and sign up for our newsletter for updates!

productimg_checkdb

Proper Backup Alerting

Today I’d like to talk about two topics that get overlooked quite often, the “backups” to the backup, so to speak. First up: proper backup alerting. And second, missing backup recovery.

tsql2sday150x150When I saw the topic for T-SQL Tuesday this time I just had to get in. Maybe I’ve never mentioned it, but backups is one of my big things.  Today I’d like to talk about two topics that get overlooked quite often, the “backups” to the backup, so to speak. First up: proper backup alerting.  And second, missing backup recovery.

Traditional alerting falls short

Well, let’s begin with a story from my days as senior DBA. Years ago, one of the application groups messed something up in their database, and they needed a restore.  “Sure thing,” I said.  No problem.  So I went to the backup drive, and there wasn’t anything that could even be vaguely considered a fresh backup. The last backup file on the drive was from about three months ago.

OOPS.  Oh crap…so what do I tell the app team?

First, a little investigation.  I had to find out why the backup alert didn’t kick off. Every box was set up to alert us when a backup job failed.  I found the problem right away.  The SQL Agent was turned off.  And from the looks of things, it had been turned off for quite some time.  And as you may realize, there’s just no way to alert on missing backups if the Agent is off and can’t fire the alert.

But that was just the first part of the problem.  The SQL Agent couldn’t send the email, of course. But the job never actually failed, because it didn’t start in the first place.

This is the crux of the issue: jobs that don’t start, can’t fail.  Alerting on failed backup jobs isn’t the way to go.

“But it’s okay, we have…”

Hold on, I know what you’re thinking. You have service alerts through some other monitoring tool, so that could never happen to you! To  degree, you’re right. But let’s see what else can go wrong along those same lines:

  • The database in question isn’t included in the backup job.
  • The network monitor agent was turned off, or not deployed to that server.
  • SMTP on the server has stopped working.
  • The backup job has the actual backup step commented out.
  • Someone deleted that backup job.
  • Someone disabled the backup job, or just disabled the job’s schedule.

Service alerts won’t help you in any of these circumstances.

Proper backup alerting

I’ve run into every one of those scenarios, many times.  And there are only two ways to mitigate every one of them (and any other situation you come across) with proper backup alerting.

Number one: Move to a centralized alerting system.  You can’t put alerts on each of your servers. When you do that, you’re at the mercy of the conditions on that box, and those conditions can be whimsical at best.

Move the backup alerts from the server level to the enterprise level.  Then, when there’s an issue with SMTP or something, you only have one place to check. It’s much easier to keep track of whether  an enterprise level alerting system isn’t working than to keep track of dozens, hundreds, or even thousands of servers.  After all, if you haven’t heard from a server in a long time, how do you know whether it’s because there’s nothing to hear, or if the alerting mechanism is down?

Number two: Stop alerting on failed backups.  Alert on missing backups.  When you alert on missing backups, it doesn’t matter if the job didn’t kick off, if the database wasn’t part of the job, or if the job was deleted.  The only thing that matters is that it’s been 24 hours since your the backup. Then when you get the alert, you can look into what the problem is.

The important point is that the backup may or may not have failed, but your enterprise alert will fire no matter what.  This is a very effective method for alerting on backups, because it’s incredibly resilient to all types of issues…not only in the backups, but also in the alerting process.  If you do it right, it’s just about foolproof.

Part 2: Missing Backups

Handling missed backups is not the same as alerting on missing backup (like we talked about above). What we want to do is avoid the need for the alert to begin with.

Minion Backup (which is free, so we get to talk about it all we want, ha!) includes a feature called “Missing Backups”, which allows you to run any backups that failed during the last run.  

Here’s what this looks like:  You set your backups to run at midnight, and they’re usually done by around 2:00 AM.  However, occasionally they fail for one reason or another. Then you get an alert in the middle of the night, and you have to get up to deal with it.

Missing Backups lets you set Minion Backup to run again at, say, 2:30 or 3:00 AM with the @Include = ‘Missing’ parameter.  This will look at the last run and see if there were any backups that failed; if there were, then MB will retry them.  This will prevent the need for alerts in the first place.

We use this feature in many shops we consult in because we see databases that fail from time to time for weird reasons, but they always pass the second time.  So Minion Backup helps improve your backups simply by giving you a second chance at your backups.

Now we mention Minion Enterprise

We’ve got you covered for enterprise-level alerting, too.  Our flagship product, Minion Enterprise, was made for just that purpose and it comes with many enterprise-level features; not just backup alerting.  I invite you to take a look at it if you like.

But if you don’t then by all means, write yourself an enterprise-level alerting system and stop relying on alerts that only fire on missing backups.

And, improve your situation in general by switching to the free Minion Backup.

What Really Causes Performance Problems?

Every IT shop has its problems with performance: some localized, and some that span a server, or even multiple servers. Technologists tend to treat these problems as isolated incidents – solving one, then another, and then another. This happens especially when a problem is recurring but intermittent. When a slowdown or error happens every so often, it’s far too easy to lose the big picture.

Some shops suffer from these issues for years without ever getting to the bottom of it all. So, how can you determine what really causes performance problems?

Every IT shop has its problems with performance: some localized, and some that span a server, or even multiple servers. Technologists tend to treat these problems as isolated incidents – solving one, then another, and then another. This happens especially when a problem is recurring but intermittent. When a slowdown or error happens every so often, it’s far too easy to lose the big picture.

Some shops suffer from these issues for years without ever getting to the bottom of it all.  So, how can you determine what really causes performance problems?

First, a story

A developer in your shop creates an SSIS package to move data from one server to another. He makes the decision to pull the data from production using SELECT * FROM dbo.CustomerOrders.  This works just fine in his development environment, and it works fine in QA, and it works fine when he pushes it into production.  The package runs on an hourly schedule, and all is well.

What he doesn’t realize is that there’s a VARCHAR(MAX) column in that table that holds 2GB of data in almost every row…in production.

Things run just fine for a couple months.  Then without warning, one day things in production start to slow down.  It’s subtle at first, but then it gets worse and worse.  The team opens a downtime bridge, and a dozen IT guys get on to look at the problem.  And they find it!  An important query is getting the wrong execution plan from time to time.  They naturally conclude that they need to manage statistics, or put in a plan guide, or whatever other avenue they decide to take to solve the problem.  All is well again.

A couple of days later, it happens again.  And then again and then again.  Then it stops.  And a couple weeks later they start seeing a lot of blocking.  They put together another bridge, and diagnose and fix the issue.  Then they start seeing performance issues on another server that’s completely unrelated to that production server.  There’s another bridge line, and another run through the process again.

What’s missing here?

The team has been finding and fixing individual problems, but they haven’t gotten to the root of the issue: the SSIS package data pull is very expensive.  It ran fine for a while, but once the data grew (or more processes or more users came onto the server), the system was no longer able to keep up with demand.  The symptoms manifested differently every time.  While they’re busy blaming conditions on the server, or blaming the way the app was written, the real cause of the issues is that original data pull.

Now multiply this situation times several dozen, and you’ll get a true representation of what happens in IT shops all over the world, all the time.

What nobody saw is that the original developer should never have had access to pull that much data from production to begin with.  He didn’t need to pull all of the columns in that table, especially the VARCHAR(MAX) column.  By giving him access to prod – by not limiting his data access in any way – they allowed this situation to occur.

What Really Causes Performance Problems?

Just as too many cooks spoil the broth, too many people with access to production, will cause instability. Instability is probably the biggest performance killer. But IT shops are now in the habit of letting almost anyone make changes as needed, and then treating the resulting chaos one CPU spike at a time.

This is why performance issues go undiagnosed in so many shops.  The people in the trenches need the ability to stand back and see the real root cause of issues past the singular event they’re in the middle of, and it’s not an easy skill to develop.  It takes a lot of experience and it takes wisdom, and not everyone has both.  So, these issues can be very difficult to ferret out.

Even when someone does have this experience, they’re likely only one person in a company of others who aren’t able to make the leap.  Management quite often doesn’t understand enough about IT to see how these issues can build on each other and cause problems, so they’ll often refuse to make the necessary changes to policy.

So really, the problem is environmental, from a people point of view:

  • Too many people in production makes for an unstable shop.
  • It takes someone with vision to see that this is the problem, as opposed to troubleshooting the symptoms.
  • Most of the time, they’re overridden by others who only see the one issue.

What’s the ultimate solution?

In short: seriously limit the access people have in production. It’s absolutely critical to keep your production environments free from extra processes.

Security is one of those areas that must be constantly managed and audited, because it’s quite easy to inadvertently escalate permissions without realizing it.  This is where Minion Enterprise comes in: I spent 20 years in different shops, working out the best way to manage these permissions, and even harder, working out how to make sure permissions didn’t get out of control.

Minion Enterprise gives you a complete view of your entire shop to make it effortless to audit management conditions on all your servers.

That’s the difference between performance monitoring and management monitoring.  The entire industry thinks of performance as a single event, when in reality, performance is multi-layered.  It’s comprised of many events, management-level events where important decisions have been ignored or pushed aside.  And these decisions build on each other.  One bad decision – giving developers full access to production – can have drastic consequences that nobody will realize for a long time.

Sign up for your trial of Minion Enterprise today.

MinionWare