SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Finding (and killing) Phanton job IDs
Author Message
Post Finding (and killing) Phanton job IDs 
Hello,

We have a working bacula (5.0.3) instance with a MySQL backend. A few weeks ago we had a power outage that caused the server to shut down and restart abruptly. The jobs are running successfully but now there is a complete set of phantom jobs running in parallel that fail.

Bconsole only shows the successful JobIds. If I do a status of a client only the successful job IDs show up. The failed job IDs increment just as one would expect if they were valid jobs. An example of the successful job IDs are 730, 731, 732. An example of the phantom/failing job IDs are 900, 901, 902. As you can see they are not in the same sequence. The successful job IDs follow the previously successful job IDs (prior to the crash).

I looked around in the database but I was unable to determine where they are stored so that I could remove them. I have confirmed that there is only one instance of bacula running on the server.

Any pointers as to where I might look would be appreciated,

Karyn


--
Karyn Stump
IT Administrator
FastSoft, Inc
karyn < at > fastsoft.com

Post Finding (and killing) Phanton job IDs 
On Fri, Apr 27, 2012 at 06:13:42PM -0700, Karyn Stump wrote:
We have a working bacula (5.0.3) instance with a MySQL backend. A few
weeks ago we had a power outage that caused the server to shut down and
restart abruptly. The jobs are running successfully but now there is a
complete set of phantom jobs running in parallel that fail.

Your problem should't be possible to exist. At least not if the db is
fine, I checked with my bacula-mysql instance and it says

CREATE TABLE `Job` (
`JobId` int(10) unsigned NOT NULL AUTO_INCREMENT,
...

If you still get 2 different sequences, I'd assume the database has
issues due to the crash. Have you tried 'mysqlrepair' aka 'mysqlcheck
-r' and possibly mysqlanalyze and mysqloptimze while you are at it?

If these find no errors, you can just dump the database and loead it
again, this should find inconsistencies as well.

Another possility would be not having 2 bacula-directors, but 2 mysqld
instances due to some strange startup error or some sort of redundant
setup with both instances running with the same IP.

Hope that helps.

Regards,
Adrian
--
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91
Mail: lihas < at > lihas.de - Web: http://lihas.de
Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Post Finding (and killing) Phanton job IDs 
On Fri, 27 Apr 2012 18:13:42 -0700, Karyn Stump said:

We have a working bacula (5.0.3) instance with a MySQL backend. A few
weeks ago we had a power outage that caused the server to shut down and
restart abruptly. The jobs are running successfully but now there is a
complete set of phantom jobs running in parallel that fail.

Bconsole only shows the successful JobIds. If I do a status of a client
only the successful job IDs show up. The failed job IDs increment just
as one would expect if they were valid jobs. An example of the
successful job IDs are 730, 731, 732. An example of the phantom/failing
job IDs are 900, 901, 902. As you can see they are not in the same
sequence. The successful job IDs follow the previously successful job
IDs (prior to the crash).

Where exactly are these phantom jobids listed? Can you post some output?

__Martin

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Post Finding (and killing) Phanton job IDs 
-----Original Message-----
From: Adrian Reyer [mailto:bacula-lists < at > lihas.de]
Sent: Saturday, April 28, 2012 4:29 AM
To: Karyn Stump
Cc: bacula-users < at > lists.sourceforge.net
Subject: Re: [Bacula-users] Finding (and killing) Phanton job IDs

On Fri, Apr 27, 2012 at 06:13:42PM -0700, Karyn Stump wrote:
We have a working bacula (5.0.3) instance with a MySQL backend. A few
weeks ago we had a power outage that caused the server to shut down
and restart abruptly. The jobs are running successfully but now there
is a complete set of phantom jobs running in parallel that fail.

Your problem should't be possible to exist. At least not if the db is fine, I checked with my bacula-mysql instance and it says

CREATE TABLE `Job` (
`JobId` int(10) unsigned NOT NULL AUTO_INCREMENT, ...

If you still get 2 different sequences, I'd assume the database has issues due to the crash. Have you tried 'mysqlrepair' aka 'mysqlcheck -r' and possibly mysqlanalyze and mysqloptimze while you are at it?

If these find no errors, you can just dump the database and loead it again, this should find inconsistencies as well.

Another possility would be not having 2 bacula-directors, but 2 mysqld instances due to some strange startup error or some sort of redundant setup with both instances running with the same IP.

Hope that helps.

Regards,
Adrian
--
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91
Mail: lihas < at > lihas.de - Web: http://lihas.de Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart

I ran it before I posted the first message, just forgot to include that info. I didn't do a repair, just a check. Here is the output again:

[root < at > back karyn]# /usr/bin/mysqlcheck bacula
Enter password:
bacula.BaseFiles OK
bacula.CDImages OK
bacula.Client OK
bacula.Counters OK
bacula.Device OK
bacula.File OK
bacula.FileSet OK
bacula.Filename OK
bacula.Job OK
bacula.JobHisto OK
bacula.JobMedia OK
bacula.Location OK
bacula.LocationLog OK
bacula.Log OK
bacula.Media OK
bacula.MediaType OK
bacula.Path OK
bacula.PathHierarchy OK
bacula.PathVisibility OK
bacula.Pool OK
bacula.Status OK
bacula.Storage OK
bacula.UnsavedFiles OK
bacula.Version OK



--
Karyn Stump
IT Administrator
FastSoft, Inc
karyn < at > fastsoft.com

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Post Finding (and killing) Phanton job IDs 
-----Original Message-----
From: Karyn Stump
Sent: Monday, April 30, 2012 10:43 AM
To: 'Martin Simmons'
Subject: RE: [Bacula-users] Finding (and killing) Phanton job IDs

-----Original Message-----
From: Martin Simmons [mailto:martin < at > lispworks.com]
Sent: Monday, April 30, 2012 7:24 AM
To: bacula-users < at > lists.sourceforge.net
Subject: Re: [Bacula-users] Finding (and killing) Phanton job IDs

On Fri, 27 Apr 2012 18:13:42 -0700, Karyn Stump said:

We have a working bacula (5.0.3) instance with a MySQL backend. A few
weeks ago we had a power outage that caused the server to shut down
and restart abruptly. The jobs are running successfully but now there
is a complete set of phantom jobs running in parallel that fail.

Bconsole only shows the successful JobIds. If I do a status of a
client only the successful job IDs show up. The failed job IDs
increment just as one would expect if they were valid jobs. An example

of the successful job IDs are 730, 731, 732. An example of the
phantom/failing job IDs are 900, 901, 902. As you can see they are not

in the same sequence. The successful job IDs follow the previously
successful job IDs (prior to the crash).

Where exactly are these phantom jobids listed? Can you post some
output?

__Martin

The jobIDs show up in the Bacula generated emails relating the job
status. The Bacula Director is running on CentOS and the catalog backup
is doing the same thing (one fails, one is successful). Here is a
sample, one each for the same client/fileset:

30-Apr 05:05 back-dir JobId 967: Start Backup JobId 967,
Job=MM_Exchange.2012-04-30_05.05.00_02
30-Apr 05:05 back-sd JobId 967: Error: dev.c:120 Unable to stat device
/var/backups: ERR=No such file or directory 30-Apr 05:05 back-sd JobId
967: Warning:
Device "FileStorage" requested by DIR could not be opened or does
not exist.
30-Apr 05:05 back-sd JobId 967: Error: dev.c:120 Unable to stat device
/var/backups: ERR=No such file or directory 30-Apr 05:05 back-sd JobId
967: Warning:
Device "FileStorage" requested by DIR could not be opened or does
not exist.
30-Apr 05:05 back-sd JobId 967: Error: dev.c:120 Unable to stat device
/var/backups: ERR=No such file or directory 30-Apr 05:05 back-sd JobId
967: Warning:
Device "FileStorage" requested by DIR could not be opened or does
not exist.
30-Apr 05:05 back-sd JobId 967: Fatal error: Device reservation failed
for JobId=967: Jmsg Job=MM_Exchange.2012-04-30_05.05.00_02 type=5
level=1335787502 back-sd JobId 967: Warning:
Device "FileStorage" requested by DIR could not be opened or does
not exist.

30-Apr 05:05 back-dir JobId 967: Fatal error:
Storage daemon didn't accept Device "FileStorage" because:
3924 Device "FileStorage" not in SD Device resources.
30-Apr 05:05 back-dir JobId 967: Error: Bacula back-dir 5.0.3 (04Aug10):
30-Apr-2012 05:05:02
Build OS: x86_64-pc-linux-gnu gentoo
JobId: 967
Job: MM_Exchange.2012-04-30_05.05.00_02
Backup Level: Full
Client: "mm-fd" 5.0.3 (04Aug10)
Linux,Cross-compile,Win32
FileSet: "NTBackup" 2011-06-21 12:02:07
Pool: "ex-full-backup" (From Job FullPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "back-sd" (From Job resource)
Scheduled time: 30-Apr-2012 05:05:00
Start time: 30-Apr-2012 05:05:02
End time: 30-Apr-2012 05:05:02
Elapsed time: 0 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s):
Volume Session Id: 61
Volume Session Time: 1334731875
Last Volume Bytes: 0 (0 B)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status:
SD termination status:
Termination: *** Backup Error ***
------------------------------------------------------------------------
--------
30-Apr 05:05 back-dir JobId 838: Start Backup JobId 838,
Job=MM_Exchange.2012-04-30_05.05.00_59
30-Apr 05:05 back-dir JobId 838: There are no more Jobs associated with
Volume "ex-full-0078". Marking it purged.
30-Apr 05:05 back-dir JobId 838: All records pruned from Volume
"ex-full-0078"; marking it "Purged"
30-Apr 05:05 back-dir JobId 838: Recycled volume "ex-full-0078"
30-Apr 05:05 back-dir JobId 838: Using Device "FileStorage"
30-Apr 05:07 back-sd JobId 838: Recycled volume "ex-full-0078" on device
"FileStorage" (/var/backups), all previous data lost.
30-Apr 05:07 back-dir JobId 838: Max Volume jobs=1 exceeded. Marking
Volume "ex-full-0078" as Used.
30-Apr 05:07 mm-fd JobId 838: Generate VSS snapshots. Driver="VSS Win
2003", Drive(s)="C"
30-Apr 06:22 mm-fd JobId 838: VSS Writer (BackupComplete): "System
Writer", State: 0x1 (VSS_WS_STABLE) 30-Apr 06:22 back-sd JobId 838: Job
write elapsed time = 01:15:29, Transfer rate = 21.20 M Bytes/second
30-Apr 06:22 mm-fd JobId 838: VSS Writer (BackupComplete): "MSDEWriter",
State: 0x1 (VSS_WS_STABLE) 30-Apr 06:22 mm-fd JobId 838: VSS Writer
(BackupComplete): "Microsoft Exchange Writer", State: 0x1
(VSS_WS_STABLE) 30-Apr 06:22 mm-fd JobId 838: VSS Writer
(BackupComplete): "Event Log Writer", State: 0x1 (VSS_WS_STABLE) 30-Apr
06:22 mm-fd JobId 838: VSS Writer (BackupComplete): "Registry Writer",
State: 0x1 (VSS_WS_STABLE) 30-Apr 06:22 mm-fd JobId 838: VSS Writer
(BackupComplete): "IIS Metabase Writer", State: 0x1 (VSS_WS_STABLE)
30-Apr 06:22 mm-fd JobId 838: VSS Writer (BackupComplete): "COM+ REGDB
Writer", State: 0x1 (VSS_WS_STABLE) 30-Apr 06:22 mm-fd JobId 838: VSS
Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE) 30-Apr
06:22 mm-fd JobId 838: VSS Writer (BackupComplete): "BITS Writer",
State: 0x1 (VSS_WS_STABLE) 30-Apr 06:22 back-dir JobId 838: Bacula
back-dir 5.0.3 (30Aug10): 30-Apr-2012 06:22:31
Build OS: x86_64-redhat-linux-gnu redhat
JobId: 838
Job: MM_Exchange.2012-04-30_05.05.00_59
Backup Level: Full
Client: "mm-fd" 5.0.3 (04Aug10)
Linux,Cross-compile,Win32
FileSet: "NTBackup" 2011-12-20 05:05:00
Pool: "ex-full-backup" (From Job FullPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "File" (From Job resource)
Scheduled time: 30-Apr-2012 05:05:00
Start time: 30-Apr-2012 05:05:05
End time: 30-Apr-2012 06:22:31
Elapsed time: 1 hour 17 mins 26 secs
Priority: 10
FD Files Written: 2
SD Files Written: 2
FD Bytes Written: 96,038,868,528 (96.03 GB)
SD Bytes Written: 96,038,868,780 (96.03 GB)
Rate: 20671.3 KB/s
Software Compression: None
VSS: yes
Encryption: no
Accurate: no
Volume name(s): ex-full-0078
Volume Session Id: 56
Volume Session Time: 1335214872
Last Volume Bytes: 96,110,084,699 (96.11 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK

30-Apr 06:22 back-dir JobId 838: Begin pruning Jobs older than 6 months
.
30-Apr 06:22 back-dir JobId 838: No Jobs found to prune.
30-Apr 06:22 back-dir JobId 838: Begin pruning Jobs.
30-Apr 06:22 back-dir JobId 838: No Files found to prune.
30-Apr 06:22 back-dir JobId 838: End auto prune.



And here is the output from logwatch. The phantom jobIDs are not listed
here either:


################### Logwatch 7.3 (03/24/06) ####################
Processing Initiated: Mon Apr 30 04:02:02 2012
Date Range Processed: yesterday
( 2012-Apr-29 )
Period is day.
Detail Level of Output: 0
Type of Output: unformatted
Logfiles for Host: back.fastsoft.com
##################################################################

--------------------- bacula Begin ------------------------

Jobs Run:
2012-04-29 830 MM_Exchange.2012-04-29_05.05.00_51
BackupOK

2012-04-29 831 DC1_DFS.2012-04-29_23.05.00_52
BackupOK

2012-04-29 832 DC2_DFS.2012-04-29_23.05.00_53
BackupOK

2012-04-29 833 MM_Logs.2012-04-29_23.05.00_54
BackupOK

2012-04-29 834 TS_Quickbooks.2012-04-29_23.05.00_55
BackupOK

2012-04-29 835 support-files.2012-04-29_23.05.00_56
BackupOK

2012-04-29 836 soup-files.2012-04-29_23.05.00_57
BackupOK

2012-04-29 837 BackupCatalog.2012-04-29_23.10.00_58
BackupOK


---------------------- bacula End -------------------------


--------------------- Disk Space Begin ------------------------

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
7.1T 3.8T 2.9T 57% /
/dev/sda3 92M 20M 67M 23% /boot


---------------------- Disk Space End -------------------------


###################### Logwatch End #########################


And the output form the client status from bconsole:

mm-fd Version: 5.0.3 (04 August 2010) VSS Linux Cross-compile Win32
Daemon started 18-Apr-12 08:50. Jobs: run=25 running=0.
Heap: heap=0 smbytes=91,694 max_bytes=174,983 bufs=83 max_bufs=119
Sizeof: boffset_t=8 size_t=4 debug=0 trace=1

Running Jobs:
Director connected at: 30-Apr-12 10:34
No Jobs running.
====

Terminated Jobs:
JobId Level Files Bytes Status Finished Name
======================================================================
801 Incr 12 27.40 M OK 25-Apr-12 23:18 MM_Logs
806 Full 2 96.02 G OK 26-Apr-12 06:52 MM_Exchange
809 Incr 10 31.02 M OK 26-Apr-12 23:18 MM_Logs
814 Full 2 96.05 G OK 27-Apr-12 07:10 MM_Exchange
817 Incr 8 29.56 M OK 27-Apr-12 23:19 MM_Logs
822 Full 2 96.02 G OK 28-Apr-12 06:26 MM_Exchange
825 Incr 8 49.44 M OK 28-Apr-12 23:18 MM_Logs
830 Full 2 96.02 G OK 29-Apr-12 06:26 MM_Exchange
833 Diff 93 419.6 M OK 29-Apr-12 23:20 MM_Logs
838 Full 2 96.03 G OK 30-Apr-12 06:22 MM_Exchange
====
*


--
Karyn Stump
IT Administrator
FastSoft, Inc
karyn < at > fastsoft.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Post Finding (and killing) Phanton job IDs 
On Mon, 30 Apr 2012 10:43:16 -0700, Karyn Stump said:

The jobIDs show up in the Bacula generated emails relating the job
status. The Bacula Director is running on CentOS and the catalog backup
is doing the same thing (one fails, one is successful). Here is a
sample, one each for the same client/fileset:

You almost certainly have two different Director daemons running and possibly
two different Storage daemons as well. This is proved by the different build
dates and "Build OS" lines in the output:

30-Apr 05:05 back-dir JobId 967: Error: Bacula back-dir 5.0.3 (04Aug10):
30-Apr-2012 05:05:02
Build OS: x86_64-pc-linux-gnu gentoo


30-Apr 06:22 back-dir JobId 838: Bacula back-dir 5.0.3 (30Aug10):
30-Apr-2012 06:22:31
Build OS: x86_64-redhat-linux-gnu redhat

You might be able to figure out the IP addresses of the machines by looking at
the email Received: headers.

__Martin

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Post Finding (and killing) Phanton job IDs 
-----Original Message-----
From: Martin Simmons [mailto:martin < at > lispworks.com]
Sent: Tuesday, May 01, 2012 11:32 AM
To: bacula-users < at > lists.sourceforge.net
Subject: Re: [Bacula-users] Finding (and killing) Phanton job IDs

On Mon, 30 Apr 2012 10:43:16 -0700, Karyn Stump said:

The jobIDs show up in the Bacula generated emails relating the job
status. The Bacula Director is running on CentOS and the catalog
backup is doing the same thing (one fails, one is successful). Here
is
a sample, one each for the same client/fileset:

You almost certainly have two different Director daemons running and
possibly two different Storage daemons as well. This is proved by the
different build dates and "Build OS" lines in >the output:

30-Apr 05:05 back-dir JobId 967: Error: Bacula back-dir 5.0.3
(04Aug10):
30-Apr-2012 05:05:02
Build OS: x86_64-pc-linux-gnu gentoo


30-Apr 06:22 back-dir JobId 838: Bacula back-dir 5.0.3 (30Aug10):
30-Apr-2012 06:22:31
Build OS: x86_64-redhat-linux-gnu redhat

You might be able to figure out the IP addresses of the machines by
looking at the email Received: headers.

__Martin

Yep Martin,

That was it. I had migrated our backups to a new server, and when the
power outage happened, the old server restarted too and re-started
bacula which I had only stopped before but hadn't removed its start-up
config.

Thank you very much,

Karyn

--
Karyn Stump
IT Administrator
FastSoft, Inc
karyn < at > fastsoft.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB