Welcome! » Log In » Create A New Profile

Odd non-fatal errors in amdump reports.

Posted by Austin S. Hemmelgarn 
Austin S. Hemmelgarn
Odd non-fatal errors in amdump reports.
November 07, 2017 06:59AM
Where I work, we recently switched from manually triggered vaulting to
automatic vaulting using the vault-storage, vault, and dump-selection
options. Things appear to be working correctly, but we keep getting
some odd non-fatal error messages (that might be bogus as well, since
I've verified the dumps mentioned restore correctly) in the amdump
e-mails. I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).

In our configuration, we have three different backup sets (each is on
it's own schedule). Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):

taper: FATAL Header of dumpfile does not match command from driver 0 XXXXXXX /home/XXXXXXXXXXXXXXXXX 20171031074642 ------ 0 XXXXXXX /home/XXXXXX 20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168

For a given backup set, the particular hostname and paths are always the
same, but the backup appears to get taped correctly, and restores
correctly as well.

With the third backup set, we're regularly seeing things like the
following in the dump summary section, but no other visible error
messages:

DUMPER STATS TAPER STATS
HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
--------------------------------------------- ---------------------- ---------------- ---------------
XXXXXXXXXX /boot 0 -- FAILED
XXXXXXXXXX /boot 1 10 10 -- 0:00 168.8 0:00 0.0

In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well. The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.

All three sets use essentially the same amanda.conf file (the
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted. The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.

8<------------------------------------------------------------

org "XXXXX"
mailto "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit

displayunit "k"
netusage 8000000 Kbps

dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes

bumppercent 20
bumpdays 2

etimeout 900
dtimeout 1800
ctimeout 30

device_output_buffer_size 256M

compress-index no

flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes

runtapes 16

define changer vtl {
tapedev "chg-disk:/net/XXXXXXXXXXXXXXXXXX/amanda/XXXXX"
changerfile "/etc/amanda/XXXXX/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}

define changer aws {
tapedev "chg-multi:s3:XXXXXXXXXXXXXXXX/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/XXXXX/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" "XXXXXXXXXXXXXXXXXXXX"
device-property "S3_SECRET_KEY" "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "XXXXXXXXX"
device-property "STORAGE_API" "AWS4"
}

define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
labelstr "^XXXX-[0-9][0-9]*$"
autolabel "XXXX-%%%" any
erase-on-full YES
erase-on-failure YES
vault cloud 0
}

define storage cloud {
tpchanger "aws"
tapepool "$r"
tapetype "S3TAPE"
labelstr "^Vault-XXXX-[0-9][0-9]*$"
autolabel "Vault-XXXX-%%%" any
erase-on-full YES
erase-on-failure YES
dump-selection ALL FULL
}

storage "local-vtl"
vault-storage "cloud"

maxdumps 4
maxdumpsize -1

amrecover_changer "vtl"

holdingdisk hd1 {
comment "main holding disk"
directory "/var/lib/amanda/XXXXX"
use 128 Gb
chunksize 1Gb
}

infofile "/etc/amanda/XXXXX/curinfo"
logdir "/etc/amanda/XXXXX"
indexdir "/var/lib/amanda/XXXXX/index"
tapelist "/etc/amanda/XXXXX/tapelist"

define tapetype V64G {
length 65536 MB
part-size 1G
part-cache-type memory
}

define tapetype S3TAPE {
length 2048 GB
part-size 1G
part-cache-type memory
}

define application amgtar {
plugin "amgtar"
comment "amgtar"
property append "ignore" "file changed as we read it$"
property append "ignore" "File removed before we read it$"
property "CHECK-DEVICE" "NO"
}


define dumptype global {
comment "Global definitions"
index yes
exclude list ".amanda.excludes"
compress client fast
}

define dumptype root-tar {
global
program "APPLICATION"
application "amgtar"
comment "root partitions dumped with tar"
compress none
index
priority low
}

define dumptype high-tar {
root-tar
comment "partitions dumped with tar"
priority high
}

define dumptype remote-high {
high-tar
auth "ssh"
ssh_keys "/XXXXXXXXXXXXXXXXXXX"
estimate calcsize
maxdumps 4
compress server custom
server-custom-compress "/usr/bin/zstd"
}

define dumptype remote-low {
remote-high
priority low
}

define interactivity inter_tty {
plugin "tty"
}
define interactivity inter_email {
plugin "email"
property "mailto" "admin1"
property "resend-delay" "10"
property "check-file" "/tmp/email_input"
property "check-file-delay" "10"
}
define interactivity inter_tty_email {
plugin "tty_email"
property "mailto" "admin1"
property "resend-delay" "10"
property "check-file" "/tmp/email_input"
property "check-file-delay" "10"
}
interactivity "inter_tty_email"

define taperscan taper_traditional {
comment "traditional"
plugin "traditional"
}
define taperscan taper_oldest {
comment "oldest"
plugin "oldest"
}
define taperscan taper_lexical {
comment "lexical"
plugin "lexical"
}
taperscan "taper_lexical"
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 07, 2017 07:59AM
Austin,

It's hard to say something with only the error message.

Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
backup set that fail.

The tapedev of the aws changer can be written like:

tapedev "chg-multi:s3:XXXXXXXXXXXXXXXX/slot-{0..127}


Jean-Louis

On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
> Where I work, we recently switched from manually triggered vaulting to
> automatic vaulting using the vault-storage, vault, and dump-selection
> options. Things appear to be working correctly, but we keep getting
> some odd non-fatal error messages (that might be bogus as well, since
> I've verified the dumps mentioned restore correctly) in the amdump
> e-mails. I've been trying to figure out these 'errors' for the past
> few weeks now, and I'm hoping someone on the list might have some advice
> (or better yet, might recognize the symptoms and know how to fix them).
>
> In our configuration, we have three different backup sets (each is on
> it's own schedule). Of these, two are consistently showing the following
> error in the amdump e-mail report (I've redacted hostnames and exact paths,
> the second path listed though is a parent directory of the first):
>
> taper: FATAL Header of dumpfile does not match command from driver 0 XXXXXXX /home/XXXXXXXXXXXXXXXXX 20171031074642 ------ 0 XXXXXXX /home/XXXXXX 20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168
>
> For a given backup set, the particular hostname and paths are always the
> same, but the backup appears to get taped correctly, and restores
> correctly as well.
>
> With the third backup set, we're regularly seeing things like the
> following in the dump summary section, but no other visible error
> messages:
>
> DUMPER STATS TAPER STATS
> HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
> --------------------------------------------- ---------------------- ---------------- ---------------
> XXXXXXXXXX /boot 0 -- FAILED
> XXXXXXXXXX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
>
> In this case, the particular DLE's affected are always the same,
> and the first line that claims a failure always shows dump level
> zero, even when the backup is supposed to be at another level.
> Just like the other error, the affected dumps always restore
> correctly when tested, and get correctly vaulted as well. The
> affected DLE's are only on Linux systems, but it seems to not
> care what distro or amanda version is being used (it's affected,
> Debian, Gentoo, and Fedora systems, and covers 5 different
> Amanda client versions), and are invariably small (sub-gigabyte)
> filesystems, but I've not found any other commonality among them.
>
> All three sets use essentially the same amanda.conf file (the
> differences are literally just in when they get run), which
> I've attached in-line at the end of this e-mail with
> sensitive data redacted. The thing I find particularly odd is
> that this config is essentially identical to what I use on my
> personal systems, which are not exhibiting either problem.
>
> 8<------------------------------------------------------------
>
> org "XXXXX"
> mailto "admin"
> dumpuser "amanda"
> inparallel 2
> dumporder "Ss"
> taperalgo largestfit
>
> displayunit "k"
> netusage 8000000 Kbps
>
> dumpcycle 4 weeks
> runspercycle 28
> tapecycle 128 tapes
>
> bumppercent 20
> bumpdays 2
>
> etimeout 900
> dtimeout 1800
> ctimeout 30
>
> device_output_buffer_size 256M
>
> compress-index no
>
> flush-threshold-dumped 0
> flush-threshold-scheduled 0
> taperflush 0
> autoflush yes
>
> runtapes 16
>
> define changer vtl {
> tapedev "chg-disk:/net/XXXXXXXXXXXXXXXXXX/amanda/XXXXX"
> changerfile "/etc/amanda/XXXXX/changer"
> property "num-slot" "128"
> property "auto-create-slot" "yes"
> }
>
> define changer aws {
> tapedev "chg-multi:s3:XXXXXXXXXXXXXXXX/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
> changerfile "/etc/amanda/XXXXX/s3-changer"
> device-property "S3_SSL" "YES"
> device-property "S3_ACCESS_KEY" "XXXXXXXXXXXXXXXXXXXX"
> device-property "S3_SECRET_KEY" "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> device-property "S3_MULTI_PART_UPLOAD" "YES"
> device-property "CREATE_BUCKET" "NO"
> device-property "S3_BUCKET_LOCATION" "XXXXXXXXX"
> device-property "STORAGE_API" "AWS4"
> }
>
> define storage local-vtl {
> tpchanger "vtl"
> tapepool "$r"
> tapetype "V64G"
> labelstr "^XXXX-[0-9][0-9]*$"
> autolabel "XXXX-%%%" any
> erase-on-full YES
> erase-on-failure YES
> vault cloud 0
> }
>
> define storage cloud {
> tpchanger "aws"
> tapepool "$r"
> tapetype "S3TAPE"
> labelstr "^Vault-XXXX-[0-9][0-9]*$"
> autolabel "Vault-XXXX-%%%" any
> erase-on-full YES
> erase-on-failure YES
> dump-selection ALL FULL
> }
>
> storage "local-vtl"
> vault-storage "cloud"
>
> maxdumps 4
> maxdumpsize -1
>
> amrecover_changer "vtl"
>
> holdingdisk hd1 {
> comment "main holding disk"
> directory "/var/lib/amanda/XXXXX"
> use 128 Gb
> chunksize 1Gb
> }
>
> infofile "/etc/amanda/XXXXX/curinfo"
> logdir "/etc/amanda/XXXXX"
> indexdir "/var/lib/amanda/XXXXX/index"
> tapelist "/etc/amanda/XXXXX/tapelist"
>
> define tapetype V64G {
> length 65536 MB
> part-size 1G
> part-cache-type memory
> }
>
> define tapetype S3TAPE {
> length 2048 GB
> part-size 1G
> part-cache-type memory
> }
>
> define application amgtar {
> plugin "amgtar"
> comment "amgtar"
> property append "ignore" "file changed as we read it$"
> property append "ignore" "File removed before we read it$"
> property "CHECK-DEVICE" "NO"
> }
>
>
> define dumptype global {
> comment "Global definitions"
> index yes
> exclude list ".amanda.excludes"
> compress client fast
> }
>
> define dumptype root-tar {
> global
> program "APPLICATION"
> application "amgtar"
> comment "root partitions dumped with tar"
> compress none
> index
> priority low
> }
>
> define dumptype high-tar {
> root-tar
> comment "partitions dumped with tar"
> priority high
> }
>
> define dumptype remote-high {
> high-tar
> auth "ssh"
> ssh_keys "/XXXXXXXXXXXXXXXXXXX"
> estimate calcsize
> maxdumps 4
> compress server custom
> server-custom-compress "/usr/bin/zstd"
> }
>
> define dumptype remote-low {
> remote-high
> priority low
> }
>
> define interactivity inter_tty {
> plugin "tty"
> }
> define interactivity inter_email {
> plugin "email"
> property "mailto" "admin1"
> property "resend-delay" "10"
> property "check-file" "/tmp/email_input"
> property "check-file-delay" "10"
> }
> define interactivity inter_tty_email {
> plugin "tty_email"
> property "mailto" "admin1"
> property "resend-delay" "10"
> property "check-file" "/tmp/email_input"
> property "check-file-delay" "10"
> }
> interactivity "inter_tty_email"
>
> define taperscan taper_traditional {
> comment "traditional"
> plugin "traditional"
> }
> define taperscan taper_oldest {
> comment "oldest"
> plugin "oldest"
> }
> define taperscan taper_lexical {
> comment "lexical"
> plugin "lexical"
> }
> taperscan "taper_lexical"
>
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 07, 2017 09:59AM
On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> Austin,
>
> It's hard to say something with only the error message.
>
> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
> backup set that fail.
Yes, though it may take me a while since our policy is pretty strict
about scrubbing hostnames and usernames from any internal files we make
visible publicly.

Just to clarify, it will end up being 3 total pairs of files, two from
backup sets that show the first issue I mentioned (the complaint about a
header mismatch), and one from the backup set showing the second issue I
mentioned (the apparently bogus dump failures listed in the dump summary).
>
> The tapedev of the aws changer can be written like:
>
> tapedev "chg-multi:s3:XXXXXXXXXXXXXXXX/slot-{0..127}
Thanks, I hadn't know that the configuration file syntax supported
sequences like this, that makes it look so much nicer!
>
>
> Jean-Louis
>
> On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
> > Where I work, we recently switched from manually triggered vaulting to
> > automatic vaulting using the vault-storage, vault, and dump-selection
> > options. Things appear to be working correctly, but we keep getting
> > some odd non-fatal error messages (that might be bogus as well, since
> > I've verified the dumps mentioned restore correctly) in the amdump
> > e-mails. I've been trying to figure out these 'errors' for the past
> > few weeks now, and I'm hoping someone on the list might have some advice
> > (or better yet, might recognize the symptoms and know how to fix them).
> >
> > In our configuration, we have three different backup sets (each is on
> > it's own schedule). Of these, two are consistently showing the following
> > error in the amdump e-mail report (I've redacted hostnames and exact
> paths,
> > the second path listed though is a parent directory of the first):
> >
> > taper: FATAL Header of dumpfile does not match command from driver 0
> XXXXXXX /home/XXXXXXXXXXXXXXXXX 20171031074642 ------ 0 XXXXXXX
> /home/XXXXXX 20171031074642 at
> /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168
> >
> > For a given backup set, the particular hostname and paths are always the
> > same, but the backup appears to get taped correctly, and restores
> > correctly as well.
> >
> > With the third backup set, we're regularly seeing things like the
> > following in the dump summary section, but no other visible error
> > messages:
> >
> > DUMPER STATS TAPER STATS
> > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
> > --------------------------------------------- ----------------------
> ---------------- ---------------
> > XXXXXXXXXX /boot 0 -- FAILED
> > XXXXXXXXXX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
> >
> > In this case, the particular DLE's affected are always the same,
> > and the first line that claims a failure always shows dump level
> > zero, even when the backup is supposed to be at another level.
> > Just like the other error, the affected dumps always restore
> > correctly when tested, and get correctly vaulted as well. The
> > affected DLE's are only on Linux systems, but it seems to not
> > care what distro or amanda version is being used (it's affected,
> > Debian, Gentoo, and Fedora systems, and covers 5 different
> > Amanda client versions), and are invariably small (sub-gigabyte)
> > filesystems, but I've not found any other commonality among them.
> >
> > All three sets use essentially the same amanda.conf file (the
> > differences are literally just in when they get run), which
> > I've attached in-line at the end of this e-mail with
> > sensitive data redacted. The thing I find particularly odd is
> > that this config is essentially identical to what I use on my
> > personal systems, which are not exhibiting either problem.
> >
> > 8<------------------------------------------------------------
> >
> > org "XXXXX"
> > mailto "admin"
> > dumpuser "amanda"
> > inparallel 2
> > dumporder "Ss"
> > taperalgo largestfit
> >
> > displayunit "k"
> > netusage 8000000 Kbps
> >
> > dumpcycle 4 weeks
> > runspercycle 28
> > tapecycle 128 tapes
> >
> > bumppercent 20
> > bumpdays 2
> >
> > etimeout 900
> > dtimeout 1800
> > ctimeout 30
> >
> > device_output_buffer_size 256M
> >
> > compress-index no
> >
> > flush-threshold-dumped 0
> > flush-threshold-scheduled 0
> > taperflush 0
> > autoflush yes
> >
> > runtapes 16
> >
> > define changer vtl {
> > tapedev "chg-disk:/net/XXXXXXXXXXXXXXXXXX/amanda/XXXXX"
> > changerfile "/etc/amanda/XXXXX/changer"
> > property "num-slot" "128"
> > property "auto-create-slot" "yes"
> > }
> >
> > define changer aws {
> > tapedev
> "chg-multi:s3:XXXXXXXXXXXXXXXX/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
> > changerfile "/etc/amanda/XXXXX/s3-changer"
> > device-property "S3_SSL" "YES"
> > device-property "S3_ACCESS_KEY" "XXXXXXXXXXXXXXXXXXXX"
> > device-property "S3_SECRET_KEY"
> "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> > device-property "S3_MULTI_PART_UPLOAD" "YES"
> > device-property "CREATE_BUCKET" "NO"
> > device-property "S3_BUCKET_LOCATION" "XXXXXXXXX"
> > device-property "STORAGE_API" "AWS4"
> > }
> >
> > define storage local-vtl {
> > tpchanger "vtl"
> > tapepool "$r"
> > tapetype "V64G"
> > labelstr "^XXXX-[0-9][0-9]*$"
> > autolabel "XXXX-%%%" any
> > erase-on-full YES
> > erase-on-failure YES
> > vault cloud 0
> > }
> >
> > define storage cloud {
> > tpchanger "aws"
> > tapepool "$r"
> > tapetype "S3TAPE"
> > labelstr "^Vault-XXXX-[0-9][0-9]*$"
> > autolabel "Vault-XXXX-%%%" any
> > erase-on-full YES
> > erase-on-failure YES
> > dump-selection ALL FULL
> > }
> >
> > storage "local-vtl"
> > vault-storage "cloud"
> >
> > maxdumps 4
> > maxdumpsize -1
> >
> > amrecover_changer "vtl"
> >
> > holdingdisk hd1 {
> > comment "main holding disk"
> > directory "/var/lib/amanda/XXXXX"
> > use 128 Gb
> > chunksize 1Gb
> > }
> >
> > infofile "/etc/amanda/XXXXX/curinfo"
> > logdir "/etc/amanda/XXXXX"
> > indexdir "/var/lib/amanda/XXXXX/index"
> > tapelist "/etc/amanda/XXXXX/tapelist"
> >
> > define tapetype V64G {
> > length 65536 MB
> > part-size 1G
> > part-cache-type memory
> > }
> >
> > define tapetype S3TAPE {
> > length 2048 GB
> > part-size 1G
> > part-cache-type memory
> > }
> >
> > define application amgtar {
> > plugin "amgtar"
> > comment "amgtar"
> > property append "ignore" "file changed as we read it$"
> > property append "ignore" "File removed before we read it$"
> > property "CHECK-DEVICE" "NO"
> > }
> >
> >
> > define dumptype global {
> > comment "Global definitions"
> > index yes
> > exclude list ".amanda.excludes"
> > compress client fast
> > }
> >
> > define dumptype root-tar {
> > global
> > program "APPLICATION"
> > application "amgtar"
> > comment "root partitions dumped with tar"
> > compress none
> > index
> > priority low
> > }
> >
> > define dumptype high-tar {
> > root-tar
> > comment "partitions dumped with tar"
> > priority high
> > }
> >
> > define dumptype remote-high {
> > high-tar
> > auth "ssh"
> > ssh_keys "/XXXXXXXXXXXXXXXXXXX"
> > estimate calcsize
> > maxdumps 4
> > compress server custom
> > server-custom-compress "/usr/bin/zstd"
> > }
> >
> > define dumptype remote-low {
> > remote-high
> > priority low
> > }
> >
> > define interactivity inter_tty {
> > plugin "tty"
> > }
> > define interactivity inter_email {
> > plugin "email"
> > property "mailto" "admin1"
> > property "resend-delay" "10"
> > property "check-file" "/tmp/email_input"
> > property "check-file-delay" "10"
> > }
> > define interactivity inter_tty_email {
> > plugin "tty_email"
> > property "mailto" "admin1"
> > property "resend-delay" "10"
> > property "check-file" "/tmp/email_input"
> > property "check-file-delay" "10"
> > }
> > interactivity "inter_tty_email"
> >
> > define taperscan taper_traditional {
> > comment "traditional"
> > plugin "traditional"
> > }
> > define taperscan taper_oldest {
> > comment "oldest"
> > plugin "oldest"
> > }
> > define taperscan taper_lexical {
> > comment "lexical"
> > plugin "lexical"
> > }
> > taperscan "taper_lexical"
> >
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 08, 2017 05:59AM
On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>> Austin,
>>
>> It's hard to say something with only the error message.
>>
>> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>> backup set that fail.
>>
> I've attached the files (I would put them inline, but one of the sets
> has over 100 DLE's, so the amdump file is huge, and the others are
> still over 100k each, and I figured nobody want's to try and wad
> through those in-line).
>
> The set1 and set2 files are for the two backup sets that show the
> header mismatch error, and the set3 files are for the one that claims
> failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 08, 2017 08:02AM
On 2017-11-08 08:03, Jean-Louis Martineau wrote:
> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> >> Austin,
> >>
> >> It's hard to say something with only the error message.
> >>
> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
> >> backup set that fail.
> >>
> > I've attached the files (I would put them inline, but one of the sets
> > has over 100 DLE's, so the amdump file is huge, and the others are
> > still over 100k each, and I figured nobody want's to try and wad
> > through those in-line).
> >
> > The set1 and set2 files are for the two backup sets that show the
> > header mismatch error, and the set3 files are for the one that claims
> > failures in the dump summary.
>
>
> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
> error in the 'FAILURE DUMP SUMMARY'
>
> client2 /boot lev 0 FLUSH [File 0 not found]
> client3 /boot lev 0 FLUSH [File 0 not found]
> client7 /boot lev 0 FLUSH [File 0 not found]
> client8 /boot lev 0 FLUSH [File 0 not found]
> client0 /boot lev 0 FLUSH [File 0 not found]
> client9 /boot lev 0 FLUSH [File 0 not found]
> client9 /srv lev 0 FLUSH [File 0 not found]
> client9 /var lev 0 FLUSH [File 0 not found]
> server0 /boot lev 0 FLUSH [File 0 not found]
> client10 /boot lev 0 FLUSH [File 0 not found]
> client11 /boot lev 0 FLUSH [File 0 not found]
> client12 /boot lev 0 FLUSH [File 0 not found]
>
> They are VAULT attemp, not FLUSH, looking only at the first entry, it
> try to vault 'client2 /boot 0 20171024084159' which it expect to find on
> tape Server-01. It is an older dump.
>
> Do Server-01 is still there? Did it still contains the dump?
>
Hmm, looks like that's a leftover from changing our labeling format
shortly after switching to this new configuration. I thought I purged
all the stuff with the old label scheme, but I guess not.

It somewhat surprises me that this doesn't give any kind of error
indication in the e-mail report beyond the 'FAILED' line in the dump
summary.
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 05:59AM
On 2017-11-08 08:03, Jean-Louis Martineau wrote:
> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> >> Austin,
> >>
> >> It's hard to say something with only the error message.
> >>
> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
> >> backup set that fail.
> >>
> > I've attached the files (I would put them inline, but one of the sets
> > has over 100 DLE's, so the amdump file is huge, and the others are
> > still over 100k each, and I figured nobody want's to try and wad
> > through those in-line).
> >
> > The set1 and set2 files are for the two backup sets that show the
> > header mismatch error, and the set3 files are for the one that claims
> > failures in the dump summary.
>
>
> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
> error in the 'FAILURE DUMP SUMMARY'
>
> client2 /boot lev 0 FLUSH [File 0 not found]
> client3 /boot lev 0 FLUSH [File 0 not found]
> client7 /boot lev 0 FLUSH [File 0 not found]
> client8 /boot lev 0 FLUSH [File 0 not found]
> client0 /boot lev 0 FLUSH [File 0 not found]
> client9 /boot lev 0 FLUSH [File 0 not found]
> client9 /srv lev 0 FLUSH [File 0 not found]
> client9 /var lev 0 FLUSH [File 0 not found]
> server0 /boot lev 0 FLUSH [File 0 not found]
> client10 /boot lev 0 FLUSH [File 0 not found]
> client11 /boot lev 0 FLUSH [File 0 not found]
> client12 /boot lev 0 FLUSH [File 0 not found]
>
> They are VAULT attemp, not FLUSH, looking only at the first entry, it
> try to vault 'client2 /boot 0 20171024084159' which it expect to find on
> tape Server-01. It is an older dump.
>
> Do Server-01 is still there? Did it still contains the dump?
>
OK, I've done some further investigation by tweaking the labeling a bit
(which actually fixed a purely cosmetic issue we were having), but I'm
still seeing the same problem that prompted this thread, and I can
confirm that the dumps are where Amanda is trying to look for them, it's
just not seeing them for some reason. I hadn't thought of this before,
but could it have something to do with the virtual tape library being
auto-mounted over NFS on the backup server?
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 06:00AM
On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>> >> Austin,
>> >>
>> >> It's hard to say something with only the error message.
>> >>
>> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>> >> backup set that fail.
>> >>
>> > I've attached the files (I would put them inline, but one of the sets
>> > has over 100 DLE's, so the amdump file is huge, and the others are
>> > still over 100k each, and I figured nobody want's to try and wad
>> > through those in-line).
>> >
>> > The set1 and set2 files are for the two backup sets that show the
>> > header mismatch error, and the set3 files are for the one that claims
>> > failures in the dump summary.
>>
>>
>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
>> error in the 'FAILURE DUMP SUMMARY'
>>
>> client2 /boot lev 0 FLUSH [File 0 not found]
>> client3 /boot lev 0 FLUSH [File 0 not found]
>> client7 /boot lev 0 FLUSH [File 0 not found]
>> client8 /boot lev 0 FLUSH [File 0 not found]
>> client0 /boot lev 0 FLUSH [File 0 not found]
>> client9 /boot lev 0 FLUSH [File 0 not found]
>> client9 /srv lev 0 FLUSH [File 0 not found]
>> client9 /var lev 0 FLUSH [File 0 not found]
>> server0 /boot lev 0 FLUSH [File 0 not found]
>> client10 /boot lev 0 FLUSH [File 0 not found]
>> client11 /boot lev 0 FLUSH [File 0 not found]
>> client12 /boot lev 0 FLUSH [File 0 not found]
>>
>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
>> try to vault 'client2 /boot 0 20171024084159' which it expect to find on
>> tape Server-01. It is an older dump.
>>
>> Do Server-01 is still there? Did it still contains the dump?
>>
> OK, I've done some further investigation by tweaking the labeling a
> bit (which actually fixed a purely cosmetic issue we were having), but
> I'm still seeing the same problem that prompted this thread, and I can
> confirm that the dumps are where Amanda is trying to look for them,
> it's just not seeing them for some reason. I hadn't thought of this
> before, but could it have something to do with the virtual tape
> library being auto-mounted over NFS on the backup server?
>
Austin,

Can you try to see if amfetchdump can restore it?

* amfetchdump CONFIG client2 /boot 20171024084159

Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 06:02AM
On 2017-11-10 08:27, Jean-Louis Martineau wrote:
> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>>>  > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>>>  >> Austin,
>>>  >>
>>>  >> It's hard to say something with only the error message.
>>>  >>
>>>  >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>>>  >> backup set that fail.
>>>  >>
>>>  > I've attached the files (I would put them inline, but one of the sets
>>>  > has over 100 DLE's, so the amdump file is huge, and the others are
>>>  > still over 100k each, and I figured nobody want's to try and wad
>>>  > through those in-line).
>>>  >
>>>  > The set1 and set2 files are for the two backup sets that show the
>>>  > header mismatch error, and the set3 files are for the one that claims
>>>  > failures in the dump summary.
>>>
>>>
>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
>>> error in the 'FAILURE DUMP SUMMARY'
>>>
>>> client2 /boot lev 0 FLUSH [File 0 not found]
>>> client3 /boot lev 0 FLUSH [File 0 not found]
>>> client7 /boot lev 0 FLUSH [File 0 not found]
>>> client8 /boot lev 0 FLUSH [File 0 not found]
>>> client0 /boot lev 0 FLUSH [File 0 not found]
>>> client9 /boot lev 0 FLUSH [File 0 not found]
>>> client9 /srv lev 0 FLUSH [File 0 not found]
>>> client9 /var lev 0 FLUSH [File 0 not found]
>>> server0 /boot lev 0 FLUSH [File 0 not found]
>>> client10 /boot lev 0 FLUSH [File 0 not found]
>>> client11 /boot lev 0 FLUSH [File 0 not found]
>>> client12 /boot lev 0 FLUSH [File 0 not found]
>>>
>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
>>> try to vault 'client2 /boot 0 20171024084159' which it expect to find on
>>> tape Server-01. It is an older dump.
>>>
>>> Do Server-01 is still there? Did it still contains the dump?
>>>
>> OK, I've done some further investigation by tweaking the labeling a
>> bit (which actually fixed a purely cosmetic issue we were having), but
>> I'm still seeing the same problem that prompted this thread, and I can
>> confirm that the dumps are where Amanda is trying to look for them,
>> it's just not seeing them for some reason.  I hadn't thought of this
>> before, but could it have something to do with the virtual tape
>> library being auto-mounted over NFS on the backup server?
>>
> Austin,
>
> Can you try to see if amfetchdump can restore it?
>
> * amfetchdump CONFIG client2 /boot 20171024084159
> At the moment, I'm re-testing things after tweaking some NFS parameters
for the virtual tape library (apparently the FreeNAS server that's
actually storing the data didn't have NFSv4 turned on, so it was mounted
with NFSv3, which we've had issues with before on our network), so I
can't exactly check immediately, but assuming the problem repeats, I'll
do that first thing once the test dump is done.
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 07:59AM
Austin,

Can you try the attached patch, I think it could fix the set1 and set2
errors.

Jean-Louis

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>> Austin,
>>
>> It's hard to say something with only the error message.
>>
>> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>> backup set that fail.
>>
> I've attached the files (I would put them inline, but one of the sets
> has over 100 DLE's, so the amdump file is huge, and the others are
> still over 100k each, and I figured nobody want's to try and wad
> through those in-line).
>
> The set1 and set2 files are for the two backup sets that show the
> header mismatch error, and the set3 files are for the one that claims
> failures in the dump summary.
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 08:01AM
On 2017-11-10 10:00, Jean-Louis Martineau wrote:
> Austin,
>
> Can you try the attached patch, I think it could fix the set1 and set2
> errors.
>
Yes, but I won't be able to log in this weekend to revert it if it
doesn't work, so I won't be able to test it until Monday.

Am I correct in assuming that it only needs to be applied on the server
and not the clients?
>
> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> >> Austin,
> >>
> >> It's hard to say something with only the error message.
> >>
> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
> >> backup set that fail.
> >>
> > I've attached the files (I would put them inline, but one of the sets
> > has over 100 DLE's, so the amdump file is huge, and the others are
> > still over 100k each, and I figured nobody want's to try and wad
> > through those in-line).
> >
> > The set1 and set2 files are for the two backup sets that show the
> > header mismatch error, and the set3 files are for the one that claims
> > failures in the dump summary.
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 08:04AM
On 10/11/17 10:10 AM, Austin S. Hemmelgarn wrote:
> On 2017-11-10 10:00, Jean-Louis Martineau wrote:
>> Austin,
>>
>> Can you try the attached patch, I think it could fix the set1 and set2
>> errors.
>>
> Yes, but I won't be able to log in this weekend to revert it if it
> doesn't work, so I won't be able to test it until Monday.
>
> Am I correct in assuming that it only needs to be applied on the
> server and not the clients?
Yes, only on the server

Jean-Louis

>>
>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>> >> Austin,
>> >>
>> >> It's hard to say something with only the error message.
>> >>
>> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>> >> backup set that fail.
>> >>
>> > I've attached the files (I would put them inline, but one of the sets
>> > has over 100 DLE's, so the amdump file is huge, and the others are
>> > still over 100k each, and I figured nobody want's to try and wad
>> > through those in-line).
>> >
>> > The set1 and set2 files are for the two backup sets that show the
>> > header mismatch error, and the set3 files are for the one that claims
>> > failures in the dump summary.
>
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 08:06AM
On 2017-11-10 08:27, Jean-Louis Martineau wrote:
> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>>>  > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>>>  >> Austin,
>>>  >>
>>>  >> It's hard to say something with only the error message.
>>>  >>
>>>  >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>>>  >> backup set that fail.
>>>  >>
>>>  > I've attached the files (I would put them inline, but one of the sets
>>>  > has over 100 DLE's, so the amdump file is huge, and the others are
>>>  > still over 100k each, and I figured nobody want's to try and wad
>>>  > through those in-line).
>>>  >
>>>  > The set1 and set2 files are for the two backup sets that show the
>>>  > header mismatch error, and the set3 files are for the one that claims
>>>  > failures in the dump summary.
>>>
>>>
>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
>>> error in the 'FAILURE DUMP SUMMARY'
>>>
>>> client2 /boot lev 0 FLUSH [File 0 not found]
>>> client3 /boot lev 0 FLUSH [File 0 not found]
>>> client7 /boot lev 0 FLUSH [File 0 not found]
>>> client8 /boot lev 0 FLUSH [File 0 not found]
>>> client0 /boot lev 0 FLUSH [File 0 not found]
>>> client9 /boot lev 0 FLUSH [File 0 not found]
>>> client9 /srv lev 0 FLUSH [File 0 not found]
>>> client9 /var lev 0 FLUSH [File 0 not found]
>>> server0 /boot lev 0 FLUSH [File 0 not found]
>>> client10 /boot lev 0 FLUSH [File 0 not found]
>>> client11 /boot lev 0 FLUSH [File 0 not found]
>>> client12 /boot lev 0 FLUSH [File 0 not found]
>>>
>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
>>> try to vault 'client2 /boot 0 20171024084159' which it expect to find on
>>> tape Server-01. It is an older dump.
>>>
>>> Do Server-01 is still there? Did it still contains the dump?
>>>
>> OK, I've done some further investigation by tweaking the labeling a
>> bit (which actually fixed a purely cosmetic issue we were having), but
>> I'm still seeing the same problem that prompted this thread, and I can
>> confirm that the dumps are where Amanda is trying to look for them,
>> it's just not seeing them for some reason.  I hadn't thought of this
>> before, but could it have something to do with the virtual tape
>> library being auto-mounted over NFS on the backup server?
>>
> Austin,
>
> Can you try to see if amfetchdump can restore it?
>
> * amfetchdump CONFIG client2 /boot 20171024084159
>
amfetchdump doesn't see it, and neither does amrecover, but the files
for the given parts are definitely there (I know for a fact that the
dump in question has exactly one part, and the file for that does exist
on the virtual tape mentioned in the log file).

I'm probably not going to be able to check more on this today, but I'll
likely be checking if amrestore and amadmin find can see them.
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 10, 2017 09:59AM
The previous patch broke something.
Try this new set2-r2.diff patch

Jean-Louis

On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
> On 2017-11-10 08:27, Jean-Louis Martineau wrote:
>> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
>>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
>>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>>>> >> Austin,
>>>> >>
>>>> >> It's hard to say something with only the error message.
>>>> >>
>>>> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for
>>>> the 2
>>>> >> backup set that fail.
>>>> >>
>>>> > I've attached the files (I would put them inline, but one of the
>>>> sets
>>>> > has over 100 DLE's, so the amdump file is huge, and the others are
>>>> > still over 100k each, and I figured nobody want's to try and wad
>>>> > through those in-line).
>>>> >
>>>> > The set1 and set2 files are for the two backup sets that show the
>>>> > header mismatch error, and the set3 files are for the one that
>>>> claims
>>>> > failures in the dump summary.
>>>>
>>>>
>>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
>>>> error in the 'FAILURE DUMP SUMMARY'
>>>>
>>>> client2 /boot lev 0 FLUSH [File 0 not found]
>>>> client3 /boot lev 0 FLUSH [File 0 not found]
>>>> client7 /boot lev 0 FLUSH [File 0 not found]
>>>> client8 /boot lev 0 FLUSH [File 0 not found]
>>>> client0 /boot lev 0 FLUSH [File 0 not found]
>>>> client9 /boot lev 0 FLUSH [File 0 not found]
>>>> client9 /srv lev 0 FLUSH [File 0 not found]
>>>> client9 /var lev 0 FLUSH [File 0 not found]
>>>> server0 /boot lev 0 FLUSH [File 0 not found]
>>>> client10 /boot lev 0 FLUSH [File 0 not found]
>>>> client11 /boot lev 0 FLUSH [File 0 not found]
>>>> client12 /boot lev 0 FLUSH [File 0 not found]
>>>>
>>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
>>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
>>>> find on
>>>> tape Server-01. It is an older dump.
>>>>
>>>> Do Server-01 is still there? Did it still contains the dump?
>>>>
>>> OK, I've done some further investigation by tweaking the labeling a
>>> bit (which actually fixed a purely cosmetic issue we were having),
>>> but I'm still seeing the same problem that prompted this thread, and
>>> I can confirm that the dumps are where Amanda is trying to look for
>>> them, it's just not seeing them for some reason. I hadn't thought
>>> of this before, but could it have something to do with the virtual
>>> tape library being auto-mounted over NFS on the backup server?
>>>
>> Austin,
>>
>> Can you try to see if amfetchdump can restore it?
>>
>> * amfetchdump CONFIG client2 /boot 20171024084159
>>
> amfetchdump doesn't see it, and neither does amrecover, but the files
> for the given parts are definitely there (I know for a fact that the
> dump in question has exactly one part, and the file for that does
> exist on the virtual tape mentioned in the log file).
>
> I'm probably not going to be able to check more on this today, but
> I'll likely be checking if amrestore and amadmin find can see them.
>
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 13, 2017 04:59AM
On 2017-11-10 08:45, Austin S. Hemmelgarn wrote:
> On 2017-11-10 08:27, Jean-Louis Martineau wrote:
>> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
>>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
>>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
>>>>  > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
>>>>  >> Austin,
>>>>  >>
>>>>  >> It's hard to say something with only the error message.
>>>>  >>
>>>>  >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for the 2
>>>>  >> backup set that fail.
>>>>  >>
>>>>  > I've attached the files (I would put them inline, but one of the
>>>> sets
>>>>  > has over 100 DLE's, so the amdump file is huge, and the others are
>>>>  > still over 100k each, and I figured nobody want's to try and wad
>>>>  > through those in-line).
>>>>  >
>>>>  > The set1 and set2 files are for the two backup sets that show the
>>>>  > header mismatch error, and the set3 files are for the one that
>>>> claims
>>>>  > failures in the dump summary.
>>>>
>>>>
>>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
>>>> error in the 'FAILURE DUMP SUMMARY'
>>>>
>>>> client2 /boot lev 0 FLUSH [File 0 not found]
>>>> client3 /boot lev 0 FLUSH [File 0 not found]
>>>> client7 /boot lev 0 FLUSH [File 0 not found]
>>>> client8 /boot lev 0 FLUSH [File 0 not found]
>>>> client0 /boot lev 0 FLUSH [File 0 not found]
>>>> client9 /boot lev 0 FLUSH [File 0 not found]
>>>> client9 /srv lev 0 FLUSH [File 0 not found]
>>>> client9 /var lev 0 FLUSH [File 0 not found]
>>>> server0 /boot lev 0 FLUSH [File 0 not found]
>>>> client10 /boot lev 0 FLUSH [File 0 not found]
>>>> client11 /boot lev 0 FLUSH [File 0 not found]
>>>> client12 /boot lev 0 FLUSH [File 0 not found]
>>>>
>>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
>>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
>>>> find on
>>>> tape Server-01. It is an older dump.
>>>>
>>>> Do Server-01 is still there? Did it still contains the dump?
>>>>
>>> OK, I've done some further investigation by tweaking the labeling a
>>> bit (which actually fixed a purely cosmetic issue we were having),
>>> but I'm still seeing the same problem that prompted this thread, and
>>> I can confirm that the dumps are where Amanda is trying to look for
>>> them, it's just not seeing them for some reason.  I hadn't thought of
>>> this before, but could it have something to do with the virtual tape
>>> library being auto-mounted over NFS on the backup server?
>>>
>> Austin,
>>
>> Can you try to see if amfetchdump can restore it?
>>
>>   * amfetchdump CONFIG client2 /boot 20171024084159
> At the moment, I'm re-testing things after tweaking some NFS parameters
> for the virtual tape library (apparently the FreeNAS server that's
> actually storing the data didn't have NFSv4 turned on, so it was mounted
> with NFSv3, which we've had issues with before on our network), so I
> can't exactly check immediately, but assuming the problem repeats, I'll
> do that first thing once the test dump is done.

It looks like the combination of fixing the incorrect labeling in the
config and switching to NFSv4 fixed this particular case.
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 13, 2017 05:01AM
On 2017-11-10 12:52, Jean-Louis Martineau wrote:
> The previous patch broke something.
> Try this new set2-r2.diff patch
Given that the switch to NFSv4 combined with a change to the labeling
scheme fixed the other issue, I'm going to re-test these two sets with
the same changes before I test the patch just so I've got something
current to compare against. I should have results from that later
today, and will likely be testing this patch tomorrow if things aren't
resolved by the other changes (and based on what you've said and what
I've seen, I don't think the switch to NFSv4 or the labeling change will
fix this one).
>
> Jean-Louis
>
> On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
> > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
> >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
> >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
> >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> >>>> >> Austin,
> >>>> >>
> >>>> >> It's hard to say something with only the error message.
> >>>> >>
> >>>> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for
> >>>> the 2
> >>>> >> backup set that fail.
> >>>> >>
> >>>> > I've attached the files (I would put them inline, but one of the
> >>>> sets
> >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
> >>>> > still over 100k each, and I figured nobody want's to try and wad
> >>>> > through those in-line).
> >>>> >
> >>>> > The set1 and set2 files are for the two backup sets that show the
> >>>> > header mismatch error, and the set3 files are for the one that
> >>>> claims
> >>>> > failures in the dump summary.
> >>>>
> >>>>
> >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
> >>>> error in the 'FAILURE DUMP SUMMARY'
> >>>>
> >>>> client2 /boot lev 0 FLUSH [File 0 not found]
> >>>> client3 /boot lev 0 FLUSH [File 0 not found]
> >>>> client7 /boot lev 0 FLUSH [File 0 not found]
> >>>> client8 /boot lev 0 FLUSH [File 0 not found]
> >>>> client0 /boot lev 0 FLUSH [File 0 not found]
> >>>> client9 /boot lev 0 FLUSH [File 0 not found]
> >>>> client9 /srv lev 0 FLUSH [File 0 not found]
> >>>> client9 /var lev 0 FLUSH [File 0 not found]
> >>>> server0 /boot lev 0 FLUSH [File 0 not found]
> >>>> client10 /boot lev 0 FLUSH [File 0 not found]
> >>>> client11 /boot lev 0 FLUSH [File 0 not found]
> >>>> client12 /boot lev 0 FLUSH [File 0 not found]
> >>>>
> >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
> >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
> >>>> find on
> >>>> tape Server-01. It is an older dump.
> >>>>
> >>>> Do Server-01 is still there? Did it still contains the dump?
> >>>>
> >>> OK, I've done some further investigation by tweaking the labeling a
> >>> bit (which actually fixed a purely cosmetic issue we were having),
> >>> but I'm still seeing the same problem that prompted this thread, and
> >>> I can confirm that the dumps are where Amanda is trying to look for
> >>> them, it's just not seeing them for some reason. I hadn't thought
> >>> of this before, but could it have something to do with the virtual
> >>> tape library being auto-mounted over NFS on the backup server?
> >>>
> >> Austin,
> >>
> >> Can you try to see if amfetchdump can restore it?
> >>
> >> * amfetchdump CONFIG client2 /boot 20171024084159
> >>
> > amfetchdump doesn't see it, and neither does amrecover, but the files
> > for the given parts are definitely there (I know for a fact that the
> > dump in question has exactly one part, and the file for that does
> > exist on the virtual tape mentioned in the log file).
> >
> > I'm probably not going to be able to check more on this today, but
> > I'll likely be checking if amrestore and amadmin find can see them.
> >
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 13, 2017 11:04AM
On 2017-11-10 12:52, Jean-Louis Martineau wrote:
> The previous patch broke something.
> Try this new set2-r2.diff patch

Unfortunately, that doesn't appear to have fixed it, though the errors
look different now. I'll try and get the log scrubbed by the end of the
day and post it here.
>
> On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
> > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
> >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
> >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
> >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
> >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
> >>>> >> Austin,
> >>>> >>
> >>>> >> It's hard to say something with only the error message.
> >>>> >>
> >>>> >> Can you post the amdump.<datestamp> and log.<datestamp>.0 for
> >>>> the 2
> >>>> >> backup set that fail.
> >>>> >>
> >>>> > I've attached the files (I would put them inline, but one of the
> >>>> sets
> >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
> >>>> > still over 100k each, and I figured nobody want's to try and wad
> >>>> > through those in-line).
> >>>> >
> >>>> > The set1 and set2 files are for the two backup sets that show the
> >>>> > header mismatch error, and the set3 files are for the one that
> >>>> claims
> >>>> > failures in the dump summary.
> >>>>
> >>>>
> >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
> >>>> error in the 'FAILURE DUMP SUMMARY'
> >>>>
> >>>> client2 /boot lev 0 FLUSH [File 0 not found]
> >>>> client3 /boot lev 0 FLUSH [File 0 not found]
> >>>> client7 /boot lev 0 FLUSH [File 0 not found]
> >>>> client8 /boot lev 0 FLUSH [File 0 not found]
> >>>> client0 /boot lev 0 FLUSH [File 0 not found]
> >>>> client9 /boot lev 0 FLUSH [File 0 not found]
> >>>> client9 /srv lev 0 FLUSH [File 0 not found]
> >>>> client9 /var lev 0 FLUSH [File 0 not found]
> >>>> server0 /boot lev 0 FLUSH [File 0 not found]
> >>>> client10 /boot lev 0 FLUSH [File 0 not found]
> >>>> client11 /boot lev 0 FLUSH [File 0 not found]
> >>>> client12 /boot lev 0 FLUSH [File 0 not found]
> >>>>
> >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
> >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
> >>>> find on
> >>>> tape Server-01. It is an older dump.
> >>>>
> >>>> Do Server-01 is still there? Did it still contains the dump?
> >>>>
> >>> OK, I've done some further investigation by tweaking the labeling a
> >>> bit (which actually fixed a purely cosmetic issue we were having),
> >>> but I'm still seeing the same problem that prompted this thread, and
> >>> I can confirm that the dumps are where Amanda is trying to look for
> >>> them, it's just not seeing them for some reason. I hadn't thought
> >>> of this before, but could it have something to do with the virtual
> >>> tape library being auto-mounted over NFS on the backup server?
> >>>
> >> Austin,
> >>
> >> Can you try to see if amfetchdump can restore it?
> >>
> >> * amfetchdump CONFIG client2 /boot 20171024084159
> >>
> > amfetchdump doesn't see it, and neither does amrecover, but the files
> > for the given parts are definitely there (I know for a fact that the
> > dump in question has exactly one part, and the file for that does
> > exist on the virtual tape mentioned in the log file).
> >
> > I'm probably not going to be able to check more on this today, but
> > I'll likely be checking if amrestore and amadmin find can see them.
> >
This message was imported via the External PhorumMail Module
Jean-Louis Martineau
Re: Odd non-fatal errors in amdump reports.
November 13, 2017 01:59PM
On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0

> FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found"

Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.


Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or privileged information.
If this message has been delivered to you by mistake, then do not copy or deliver this message to anyone. Instead, destroy it and notify me by reply e-mail
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 14, 2017 04:59AM
On 2017-11-13 16:42, Jean-Louis Martineau wrote:
> On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:
>
> driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120
> local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" ""
> "" 1073741824 memory "" "" 0
>
> > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0
> error "File 0 not found"
>
> Do that dump still exists on tape Home-0001? Find it with amfetchdump.
>
> If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape
directories, I can see it there.
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 14, 2017 04:59AM
On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:
> On 2017-11-13 16:42, Jean-Louis Martineau wrote:
>> On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:
>>
>> driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0
>> 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0
>> 20171113073255 "" "" "" "" 1073741824 memory "" "" 0
>>
>>  > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255
>> 0 error "File 0 not found"
>>
>> Do that dump still exists on tape Home-0001? Find it with amfetchdump.
>>
>> If yes, send me the taper debug file.
> amfetchdump does not see it, but looking directly at the virtual tape
> directories, I can see it there.
>
Just tried an amcheckdump on everything, it looks like some of the dump
files are corrupted, but I can't for the life of me figure out why (I
test our network regularly and it has no problems, and any problems with
a particular system should show up as more than just corrupted tar
files). I'm going to try disabling compression and see if that helps at
all, as that's the only processing other than the default that we're
doing on the dumps (long term, it's not really a viable option, but if
it fixes things at least we know what's broken).
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 14, 2017 11:59AM
On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:
> On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:
>> On 2017-11-13 16:42, Jean-Louis Martineau wrote:
>>> On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:
>>>
>>> driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0
>>> 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0
>>> 20171113073255 "" "" "" "" 1073741824 memory "" "" 0
>>>
>>>  > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255
>>> 0 error "File 0 not found"
>>>
>>> Do that dump still exists on tape Home-0001? Find it with amfetchdump.
>>>
>>> If yes, send me the taper debug file.
>> amfetchdump does not see it, but looking directly at the virtual tape
>> directories, I can see it there.
>>
> Just tried an amcheckdump on everything, it looks like some of the dump
> files are corrupted, but I can't for the life of me figure out why (I
> test our network regularly and it has no problems, and any problems with
> a particular system should show up as more than just corrupted tar
> files).  I'm going to try disabling compression and see if that helps at
> all, as that's the only processing other than the default that we're
> doing on the dumps (long term, it's not really a viable option, but if
> it fixes things at least we know what's broken).
No luck changing compression. I would suspect some issue with NFS, but
I've started seeing the same symptoms on my laptop as well now (which is
completely unrelated to any of the sets at work other than having an
almost identical configuration other than paths and the total number of
tapes).
This message was imported via the External PhorumMail Module
Austin S. Hemmelgarn
Re: Odd non-fatal errors in amdump reports.
November 16, 2017 06:59AM
On 2017-11-14 14:37, Austin S. Hemmelgarn wrote:
> On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:
>> On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:
>>> On 2017-11-13 16:42, Jean-Louis Martineau wrote:
>>>> On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:
>>>>
>>>> driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0
>>>> 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0
>>>> 20171113073255 "" "" "" "" 1073741824 memory "" "" 0
>>>>
>>>>  > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D
>>>> 20171113073255 0 error "File 0 not found"
>>>>
>>>> Do that dump still exists on tape Home-0001? Find it with amfetchdump.
>>>>
>>>> If yes, send me the taper debug file.
>>> amfetchdump does not see it, but looking directly at the virtual tape
>>> directories, I can see it there.
>>>
>> Just tried an amcheckdump on everything, it looks like some of the
>> dump files are corrupted, but I can't for the life of me figure out
>> why (I test our network regularly and it has no problems, and any
>> problems with a particular system should show up as more than just
>> corrupted tar files).  I'm going to try disabling compression and see
>> if that helps at all, as that's the only processing other than the
>> default that we're doing on the dumps (long term, it's not really a
>> viable option, but if it fixes things at least we know what's broken).
> No luck changing compression.  I would suspect some issue with NFS, but
> I've started seeing the same symptoms on my laptop as well now (which is
> completely unrelated to any of the sets at work other than having an
> almost identical configuration other than paths and the total number of
> tapes).

So, I finally got things working by switching from:

storage "local-vtl"
vault-storage "cloud"

To:

storage: "local-vtl" "cloud"

And removing the "vault" option from the local-vtl storage definition.
Strictly speaking, this is working around the issue instead of fixing
it, but it fits within what we need for our usage, and actually makes
the amdump runs complete faster (since dumps get taped to S3 in parallel
with getting taped to the local vtapes).

Based on this, and the fact that the issues I was seeing with corrupted
dumps being reported by amcheckdump, I think the issue is probably an
interaction between the vaulting code and the regular taping code, but
I'm not certain.

Thanks for the help.
This message was imported via the External PhorumMail Module
Sorry, only registered users may post in this forum.

Click here to login