Quantcast
Channel: Symantec Connect - Backup and Recovery - Discussions
Viewing all articles
Browse latest Browse all 2300

Two Backup jobs, two media write errors - one job resumes, one doesn't

$
0
0
I need a solution

A Tale of Two Backups

 
The Scene: Two backup jobs, "a" and "h", running simultaneously on client ned18 (along with three other backup jobs). Both policies have checkpointing enabled, both jobs back up the same fileserver.  Job "a" backs up the user directories beginning with "a" (one directory), and job "h" backs up the user directories beginning with "h" (three directories). 
 
NetBackup 7.0, server and client both running Linux
 
 
Act 1: Job "a"
- Job "a" having backed up 12TB and 800K files, stops with a "1: (84) media write error"
- Select the job, click Actions -> Resume Job
- Within minutes job "a" got set up, mounted and positioned a fresh tape
- Activity monitor immediately shows the byte count and file count incrementing
- In short job "a" is back on track and has been running fine for more than 8 hours
 
A few hours later...
 
Act 2, scene 1: Job "h"
- Job "h" having backed up 15TB and 150K files, stops with a "1: (84) media write error"
- Select the job, click Actions -> Resume Job
- Within minutes job "h" got set up, mounted and positioned a fresh tape
- BUT, the activity monitor shows no activity - the byte count and file count remain unchanged for job "h"
- Three hours later the master server timed out, "Error bpbrm (pid=nnn) socket read failed: errno = 62 - Timer expired" and "file read failed  (13)"
 
Act 2, scene 2: Job "h" (continued)
- Undaunted, enable debug logging for bpbkar on client (touched bpbkar_path_tr, VERBOSE = 5, Debug_Database = 5, ENABLE_ROBUST_LOGGING = YES)
- Select the job, click Actions -> Resume Job
- The bpbkar log file grows very quickly, BUT in a tight loop, and recording the same 19 lines over and over
- This continued for three more hours until the master server timed out (again, "errno = 62 - Timer expired").
 
Act 3: Looking for a happy ending
- Don't know why "Resume Job" for job "a" worked immediately
- Don't know why "Resume Job" for job "h" didn't work
- Once job "h" was resumed, what is the bpbkar process doing?
- And don't know what to trynext
 
 
References:
 
Endlessly repeating "bpbkar" debug log entries for job "h"
 
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/mapper/VolGroup00-LogVol01 on /
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (proc) proc on /proc
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (sysfs) sysfs on /sys
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (devpts) devpts on /dev/pts
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (tmpfs) tmpfs on /dev/shm
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/sda1 on /boot
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/sda2 on /boot-rcvy
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/mapper/VolGroup00-LogVol05 on /crash
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/mapper/VolGroup00-LogVol00 on /rcvy
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/mapper/VolGroup00-LogVol03 on /var
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ext4) /dev/mapper/VolGroup00-LogVol02 on /var-rcvy
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (binfmt_misc) none on /proc/sys/fs/binfmt_misc
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (ipathfs) none on /ipathfs
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (rpc_pipefs) sunrpc on /var/lib/nfs/rpc_pipefs
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (gpfs) /dev/gsfs3 on /gpfs/gsfs3
15:56:46.048 [15296] <2> mount build_mount_list: INF - Processing (gpfs) /dev/gsfs2 on /gpfs/gsfs2
15:56:46.048 [15296] <2> bpbkar resolve_path: INF - /gs3/users/hcs2 resolves to /gpfs/gsfs3/users/hcs2
15:56:46.048 [15296] <2> bpbkar resolve_path: INF - Actual mount point of /gs3/users/hcs2 is /gpfs/gsfs3/users/hcs2
15:56:46.059 [15296] <2> bpbkar cd: remap name would return=/gs3/users/hcs2
<REPEAT>
 

 

"ps" output for processes for "a" and "h" backup jobs on client machine ned18
 
"a"
root     25535     1  6 08:05 ?        00:33:17 bpbkar -r 5356800 -ru root -dt 0 -to 0 -clnt ned18 -class dis-gs3-a -sched biomart -st FULL -bpstart_to 300 -bpend_to 300 -read_to 10800 -ckpt_time 7200 -blks_per_buffer 2048 -use_otm -fso -b ned18_1392781497 -kl 7 -use_ofb
 
"h"
root     15296     1  9 12:50 ?        00:20:04 bpbkar -r 5356800 -ru root -dt 0 -to 0 -clnt ned18 -class dis-gs3-h -sched biomart -st FULL -bpstart_to 300 -bpend_to 300 -read_to 10800 -ckpt_time 7200 -blks_per_buffer 2048 -use_otm -fso -b ned18_1392746035 -kl 7 -use_ofb
 

Viewing all articles
Browse latest Browse all 2300

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>