SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
[Bacula-devel] Releasing the new batch DB insert code
Author Message
Post [Bacula-devel] Releasing the new batch DB insert code 
1. With the batch insert code turned on there are a number of
regression
tests
that fail. They must all pass without errors prior to production
release.
Responsible: Eric
Deadline: Roughly the end of March

Makes sense.=20

- First, have a default limit of the number of records that will be
inserted
in any one batch request. This should guarantee that an out of memory
problem will not normally occur.

Can we calculate this based on available memory at execution time?
General tuning wisdom for the commercial databases (DB/2, Oracle, etc)
target about 60% of real memory as the goal for this kind of
segmentation. That allows some query optimization overhead, and a little
wiggle room if the optimizer guesses wrong.=20

- Second, add a new directive to the Catalog resource that allows the
user
to
change the default batch size (by setting the value to 0 or very
large,
you
can effectively allow the batch to be arbitrarily large).

Makes sense.=20

- Third (this is optional, but very desirable), implement a new
directive
that
allows the user to enable/disable the batch insert mechanism. This
would
require a bit of change in the subroutine names in the cats directory
so
that
we can enable the new code and the old code at the same time, but
select
which is used at runtime for each DB based on the directive value. If
the
first and second items are implemented, the batch insert would be
enabled
by
default, otherwise it will be turned off.
Responsiblity for above 3 (4) points: Eric (and possibly Marc)
Deadline: rougly the end of March

OK. If the new algorithm really works better, it seems like extra work
that might not be needed, but c'est la vie.=20

=20
3. Documentation
Before putting this into production, I would like to see a bit more
documentation about the new algorithms -- this is documentation that
would
be
placed in the code. Marc has offerred to answer questions and write
some
documentation, I offer to integrate it into the code, and continue to
ask
questions until it is clear. This item should be no problem to
resolve.
Responsible: Kern + Marc (with help from Eric if he wants)
Deadline: Kern understands the algorthm by mid April.

Works for me. Analyzing this for race conditions will be really
interesting, I think. In fact, I think it'll be a really good final exam
question for my informatics students...*evil grin* Thanks, guys!

Post [Bacula-devel] Releasing the new batch DB insert code 
On Wed, 21 Mar 2007, David Boyes wrote:

- First, have a default limit of the number of records that will be
inserted
in any one batch request. This should guarantee that an out of memory
problem will not normally occur.

Can we calculate this based on available memory at execution time?
General tuning wisdom for the commercial databases (DB/2, Oracle, etc)
target about 60% of real memory as the goal for this kind of
segmentation. That allows some query optimization overhead, and a little
wiggle room if the optimizer guesses wrong.

In MySQL: /etc/my.cnf

====
# If your system supports the memlock() function call, you might want to
# enable this option while running MySQL to keep it locked in memory and
# to avoid potential swapping out in case of high memory pressure. Good
# for performance.
memlock
====

Comment this out. If MySQL runs out of RAM it will hit swap and slow down,
but it will not crash.

I did send Kern some updated query forms some time back for dbcheck which
reduced the load and sped things up a LOT, but I can't find them now.

(Basically, SELECT COUNT(*) instead of SELECT, etc)

No need to bring out the rows and count them when the SQL database can do
it for you.....

Post [Bacula-devel] Releasing the new batch DB insert code 
I think I haven't explained the memory issue correctly :

The example Kern gave is :

"SELECT JobMedia.JobMediaId,Job.JobId FROM JobMedia "
"LEFT OUTER JOIN Job ON (JobMedia.JobId=Job.JobId) "
"WHERE Job.JobId IS NULL LIMIT 300000";

and it only fails if I remove the "LIMIT 3000000".

The problem you mention is that you select 300,000 records from the database.
What happens then is that your SQL CLIENT process will allocate all these
records in memory (except if you use a cursor, but that's not the question).

As a consequence, if you remove the LIMIT code, the client will have to store
millions of records in memory (for you to use them). The server process won't
have to.

There is no such code in the batch inserts :

In the batch DB insert code, there is no SELECT statement (at least no SELECT
statement retrieving rows). It means no data is sent from the SQL server
process to the SQL client process.
The only memory allocated will be by the SQL backend. This memory is already
limited by administrative limits put by the DBA. Even if the DBA hasn't put
these limits, they are in place, as there are default (small) values put
there by the installer...
The memory allocated by the SQL server process is used for sorting, hashing,
small buffers to send data to the SQL client process. These resources are
controlled (and better be, or a single query could easily kill a database,
using a cross join for instance).
Bigger sorts go to disk, in chunks, in temp files...

On Wednesday 21 March 2007 15:25, Alan Brown wrote:
On Wed, 21 Mar 2007, David Boyes wrote:
- First, have a default limit of the number of records that will be
inserted
in any one batch request. This should guarantee that an out of memory
problem will not normally occur.

Can we calculate this based on available memory at execution time?
General tuning wisdom for the commercial databases (DB/2, Oracle, etc)
target about 60% of real memory as the goal for this kind of
segmentation. That allows some query optimization overhead, and a little
wiggle room if the optimizer guesses wrong.

In MySQL: /etc/my.cnf

====
# If your system supports the memlock() function call, you might want to
# enable this option while running MySQL to keep it locked in memory and
# to avoid potential swapping out in case of high memory pressure. Good
# for performance.
memlock
====

Comment this out. If MySQL runs out of RAM it will hit swap and slow down,
but it will not crash.


I did send Kern some updated query forms some time back for dbcheck which
reduced the load and sped things up a LOT, but I can't find them now.

(Basically, SELECT COUNT(*) instead of SELECT, etc)

No need to bring out the rows and count them when the SQL database can do
it for you.....

Post [Bacula-devel] Releasing the new batch DB insert code 
On Wed, 21 Mar 2007, Marc Cousin wrote:

I think I haven't explained the memory issue correctly :

I realise it's an issue for large selects, but in the case given:


The example Kern gave is :

"SELECT JobMedia.JobMediaId,Job.JobId FROM JobMedia "
"LEFT OUTER JOIN Job ON (JobMedia.JobId=Job.JobId) "
"WHERE Job.JobId IS NULL LIMIT 300000";

and it only fails if I remove the "LIMIT 3000000".

Would be better solved with:

"SELECT COUNT(*) FROM JobMedia "
"LEFT OUTER JOIN Job ON (JobMedia.JobId=Job.JobId) "
"WHERE Job.JobId IS NULL ";

Because the very next lines in dbcheck simply count the resulting lines of
output.

Similarly the deletion routine can be altered from the Kern's example,
plus a loop using DELETE statements, to:

"DELETE FROM Job "
"LEFT OUTER JOIN Job ON (JobMedia.JobId=Job.JobId) "
"WHERE Job.JobId IS NULL ";

The problem you mention is that you select 300,000 records from the database.
What happens then is that your SQL CLIENT process will allocate all these
records in memory (except if you use a cursor, but that's not the question).

As a consequence, if you remove the LIMIT code, the client will have to store
millions of records in memory (for you to use them). The server process won't
have to.

This is exactly what I was seeing with dbcheck.

Why have a dog and then do all the barking yourself?

In this case the dog is the SQL database and the barking is the needless
extraction and [counting|deleting] of individual NULL JobIds

The comments about SQL crashes are because I have seen this happen as
databases grow HUGE. Mine is somewhat in excess of 250 million entries.

AB

Post [Bacula-devel] Releasing the new batch DB insert code 

This is exactly what I was seeing with dbcheck.

Why have a dog and then do all the barking yourself?

In this case the dog is the SQL database and the barking is the needless
extraction and [counting|deleting] of individual NULL JobIds


The comments about SQL crashes are because I have seen this happen as
databases grow HUGE. Mine is somewhat in excess of 250 million entries.

AB
That wasn't the point...
I wasn't really interested in how the dbcheck example should be improved, I
just explained why the problem occured. And why it won't occur with the batch
inserts...

Post [Bacula-devel] Releasing the new batch DB insert code 
Hi,

1. With the batch insert code turned on there are a number of regression
tests that fail. They must all pass without errors prior to production
release. Responsible: Eric
Deadline: Roughly the end of March

I work on it

2. I am very concerned that the new batch insert code will under certain
circumstances fail with an out of memory condition because load placed on
the SQL engine will be too large. Marc points out that the SQL engine
should handle this, but my experience is that most users (including myself)
install MySQL or PostgreSQL and do not understand the details tuning the
engine, and in its default mode, it can have a catastrophic failure (out of
memory). This currently never happens within Bacula code because of a very
conservative design, but it does happen in dbcheck. This happens when
*all* the data is maintained in the engine. The particular command that
fails is:

"SELECT JobMedia.JobMediaId,Job.JobId FROM JobMedia "
"LEFT OUTER JOIN Job ON (JobMedia.JobId=Job.JobId) "
"WHERE Job.JobId IS NULL LIMIT 300000";

and it only fails if I remove the "LIMIT 3000000".

The bottom line is that I believe that there are three measures that we
should take to ensure that this does not happen, and if it does, the user
will have a way to easily workaround the problem without needing two users
of training as a DBA.

- Write some test code inside Bacula that will create 10 million batch
insert records (this item would be nice, but it is not required) so that we
can test it on "default" database installations.

I'm creating a bbatch tool wich will do that. (and we will have a bench tool
for bacula database).

- First, have a default limit of the number of records that will be
inserted in any one batch request. This should guarantee that an out of
memory problem will not normally occur.

It's the database job... I never seen a database (mysql, postgres or oracle)
saying something like "Sorry i'm out of memory". Database takes memory
that you give, never more.

- Second, add a new directive to the Catalog resource that allows the user
to change the default batch size (by setting the value to 0 or very large,
you can effectively allow the batch to be arbitrarily large).

I can do that

- Third (this is optional, but very desirable), implement a new directive
that allows the user to enable/disable the batch insert mechanism. This
would require a bit of change in the subroutine names in the cats directory
so that we can enable the new code and the old code at the same time, but
select which is used at runtime for each DB based on the directive value.
If the first and second items are implemented, the batch insert would be
enabled by default, otherwise it will be turned off.
Responsiblity for above 3 (4) points: Eric (and possibly Marc)
Deadline: rougly the end of March

It's possible to, but i must wrote bbatch before, i think we will have
suprises...

3. Documentation
Before putting this into production, I would like to see a bit more
documentation about the new algorithms -- this is documentation that would
be placed in the code. Marc has offerred to answer questions and write
some documentation, I offer to integrate it into the code, and continue to
ask questions until it is clear. This item should be no problem to
resolve. Responsible: Kern + Marc (with help from Eric if he wants)
Deadline: Kern understands the algorthm by mid April.

Marc is writing it.

Bye

Post [Bacula-devel] Releasing the new batch DB insert code 
Eric Bollengier wrote:


It's the database job... I never seen a database (mysql, postgres or oracle)
saying something like "Sorry i'm out of memory". Database takes memory
that you give, never more.


Talking about Oracle - I can tell you those erros can happen (generally,
not speaking about bacula). I saw them some times in my short life Wink
Error codes are:

ORA-04030
Error out of process memory when trying to allocate string bytes
(string,string)
Cause Operating system process private memory has
been exhausted

ORA-04031
Error unable to allocate string bytes of shared memory
("string","string","string","string")
Cause More shared memory is needed than was allocated in the
shared pool.

--
Oliver Lehmann
http://www.pofo.de/
http://wishlist.ans-netz.de/

Post [Bacula-devel] Releasing the new batch DB insert code 
This cause a database/system crash ? or a just a transaction rollback ?
=2E..

Le Vendredi 23 Mars 2007 17:03, Oliver Lehmann a =E9crit=A0:
Eric Bollengier wrote:
It's the database job... I never seen a database (mysql, postgres or
oracle) saying something like "Sorry i'm out of memory". Database takes
memory that you give, never more.

Talking about Oracle - I can tell you those erros can happen (generally,
not speaking about bacula). I saw them some times in my short life Wink
Error codes are:

ORA-04030
Error out of process memory when trying to allocate string bytes
(string,string)
Cause Operating system process private memory has
been exhausted

ORA-04031
Error unable to allocate string bytes of shared memory
("string","string","string","string")
Cause More shared memory is needed than was allocated in the
shared pool.

Post [Bacula-devel] Releasing the new batch DB insert code 
I don't think ORA 04030 is a good example ...

It means that the oracle process has tried to allocate memory as asked by t=
he=20
DBA, and couldn't, because either the server has no more memory, the proces=
s=20
has hit an administrative (OS) limit, the OS has done an optimistic memory=
=20
allocation...

As far as I know, it occurs when (for instance) a DBA puts a gigantic sort=
=20
area size, coupled with a very high number of process limit, or some other=
=20
(voluntarily) high limits.

What I mean is that it's not something that happens with 'out of the box'=20
configurations... It's something the DBA has 'voluntarily' done when tuning=
=20
the database.

Of course, you can trigger something like that with postgresql, providing y=
ou=20
put a high default work_mem, coupled with a high max_connections, and of=20
course, lots of session simultaneously doing big sorts ... but these aren't=
=20
out of the box parameters.

On Friday 23 March 2007 17:38, Eric Bollengier wrote:
This cause a database/system crash ? or a just a transaction rollback ?
...

Le Vendredi 23 Mars 2007 17:03, Oliver Lehmann a =E9crit=A0:
Eric Bollengier wrote:
It's the database job... I never seen a database (mysql, postgres or
oracle) saying something like "Sorry i'm out of memory". Database tak=
es
memory that you give, never more.

Talking about Oracle - I can tell you those erros can happen (generally,
not speaking about bacula). I saw them some times in my short life Wink
Error codes are:

ORA-04030
Error out of process memory when trying to allocate string bytes
(string,string)
Cause Operating system process private memory has
been exhausted

ORA-04031
Error unable to allocate string bytes of shared memory
("string","string","string","string")
Cause More shared memory is needed than was allocated in the
shared pool.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB