[Issue 4794] New - Mercurial cache job

7 messages Options
Embed this post
Permalink
jglick-2

[Issue 4794] New - Mercurial cache job

Reply Threaded More More options
Print post
Permalink
https://hudson.dev.java.net/issues/show_bug.cgi?id=4794
                 Issue #|4794
                 Summary|Mercurial cache job
               Component|hudson
                 Version|current
                Platform|All
              OS/Version|All
                     URL|
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|FEATURE
                Priority|P3
            Subcomponent|mercurial
             Assigned to|jglick
             Reported by|jglick






------- Additional comments from [hidden email] Sat Nov  7 19:04:00 +0000 2009 -------
For Hudson installations that have a lot of jobs all running off one (or a small
number) of Mercurial repositories, it is inefficient to have them all pull over
the network, as they will be repeatedly pulling the exact same changesets. The
situation is even worse when you consider that polling effectively pulls in
changesets as well (just discarding them after logging their metadata).

Suggest a new special job type, Mercurial Cache, which would have attributes:

1. List of repository URLs.

2. Optional schedule, like a project.

There is a corresponding workspace on the master and possibly on some or all
slaves. Whenever the scheduler fires or the job is otherwise run (e.g.
manually), the following actions will be taken:

1. For each repo, if there is a matching cache in the master's workspace, 'hg in
--bundle incoming.hg && hg pull incoming.hg' to pull all changesets into it.

2. For each repo and for each slave, if the slave's workspace also contains that
repo, send incoming.hg to the slave (over the usual channel) and have the slave
'hg unbundle' it.

Whenever a project using Mercurial SCM with a matching repository location is
run or does polling:

1. If on the master, quietly swap in the local cache repo location for all Hg
operations that would normally use the remote repo URL (I think this is always
'hg incoming' in some variant). Note that this means sharing hardlinks in most
cases. If the cache repo does not yet exist, 'hg clone -U' it and then proceed.

2. If on a slave, swap in the local (slave) cache repo location. If it does not
yet exist on the slave, run 'hg bundle --all' on the master, send to the slave
over the channel, and 'hg init && hg unbundle ...' to create a clone. If it does
not yet exist on the master, clone it as in #1.

There needs to be some synchronization so that master and slave caches remain in
lockstep.

No configuration for named branches in the caches; only complete repositories
are cached. Projects using branches will still only pull that branch from the
cache. The cache does not keep a checkout ("working copy") so no configuration
needed for that either.

One possible side benefit of this setup is that the slave does not perform any
network operations except over its channel to the master. Providing that the
project build does not perform any network operations, you could then have a
slave with no internet connection: the master does all pulls from the remote
repository.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

ddecristo

RE: [Issue 4794] New - Mercurial cache job

Reply Threaded More More options
Print post
Permalink
This performance improvement would really help us if it worked with the multi-config projects so is it be possible to not make it a new project type but rather a configuration option for a project?  The master Mercurial cache could be configured through the Hudson Master Configure System link first and then assigned to each project in its own configuration.

Our single Mercurial repository has a 150,000 files and we build off four different branches across 4 different platforms using the multi-config project. I have clusters of slaves so that we can build the branches in parallel.  The time spent cloning and pulling kills us.

Do you also know why with multi-config projects perform clones on an executor assigned to the parent first before cloning on the children?   The parent job runs on an executor which is not assignable and it seems to reliably choose one of my windows VM slaves over my linux slaves.  Cloning for us can take up to an hour and a half since it takes twice as long on windows.


Thanks,
Dianna


-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Saturday, November 07, 2009 11:04 AM
To: [hidden email]
Subject: [Issue 4794] New - Mercurial cache job

https://hudson.dev.java.net/issues/show_bug.cgi?id=4794
                 Issue #|4794
                 Summary|Mercurial cache job
               Component|hudson
                 Version|current
                Platform|All
              OS/Version|All
                     URL|
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|FEATURE
                Priority|P3
            Subcomponent|mercurial
             Assigned to|jglick
             Reported by|jglick






------- Additional comments from [hidden email] Sat Nov  7 19:04:00 +0000 2009 -------
For Hudson installations that have a lot of jobs all running off one (or a small
number) of Mercurial repositories, it is inefficient to have them all pull over
the network, as they will be repeatedly pulling the exact same changesets. The
situation is even worse when you consider that polling effectively pulls in
changesets as well (just discarding them after logging their metadata).

Suggest a new special job type, Mercurial Cache, which would have attributes:

1. List of repository URLs.

2. Optional schedule, like a project.

There is a corresponding workspace on the master and possibly on some or all
slaves. Whenever the scheduler fires or the job is otherwise run (e.g.
manually), the following actions will be taken:

1. For each repo, if there is a matching cache in the master's workspace, 'hg in
--bundle incoming.hg && hg pull incoming.hg' to pull all changesets into it.

2. For each repo and for each slave, if the slave's workspace also contains that
repo, send incoming.hg to the slave (over the usual channel) and have the slave
'hg unbundle' it.

Whenever a project using Mercurial SCM with a matching repository location is
run or does polling:

1. If on the master, quietly swap in the local cache repo location for all Hg
operations that would normally use the remote repo URL (I think this is always
'hg incoming' in some variant). Note that this means sharing hardlinks in most
cases. If the cache repo does not yet exist, 'hg clone -U' it and then proceed.

2. If on a slave, swap in the local (slave) cache repo location. If it does not
yet exist on the slave, run 'hg bundle --all' on the master, send to the slave
over the channel, and 'hg init && hg unbundle ...' to create a clone. If it does
not yet exist on the master, clone it as in #1.

There needs to be some synchronization so that master and slave caches remain in
lockstep.

No configuration for named branches in the caches; only complete repositories
are cached. Projects using branches will still only pull that branch from the
cache. The cache does not keep a checkout ("working copy") so no configuration
needed for that either.

One possible side benefit of this setup is that the slave does not perform any
network operations except over its channel to the master. Providing that the
project build does not perform any network operations, you could then have a
slave with no internet connection: the master does all pulls from the remote
repository.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

issue_police

RE: [Issue 4794] New - Mercurial cache job

Reply Threaded More More options
Print post
Permalink
Hi,

This is a friendly reminder that the 'issues' list is only used for
automatic e-mail notification from the issue tracker, and not meant
for interactive discussion. Please request an observer role for the
project and add the comment to the issue, or please redirect your
e-mails to the 'users' list.

(this is an automatically generated message, and if you have a comment
about this bot itself, please contact [hidden email]

----
Java.net issue tracker auto responder

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

jglick-2

[Issue 4794] Mercurial cache job

Reply Threaded More More options
Print post
Permalink
In reply to this post by jglick-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=4794






------- Additional comments from [hidden email] Fri Nov 13 00:38:46 +0000 2009 -------
Dianna DeCristo writes:
"This performance improvement would really help us if it worked with the
multi-config projects so is it be possible to not make it a new project type but
rather a configuration option for a project?  The master Mercurial cache could
be configured through the Hudson Master Configure System link first and then
assigned to each project in its own configuration.

Our single Mercurial repository has a 150,000 files and we build off four
different branches across 4 different platforms using the multi-config project.
I have clusters of slaves so that we can build the branches in parallel.  The
time spent cloning and pulling kills us."

The special job type would work equally well for this setup because the cache
job would be separate from your regular project - freestyle, Maven 2, matrix,
whatever. You have one cache job on the server, and any projects which use
Mercurial as their SCM and request matching repository locations will
automatically employ the cache.

A refinement to my initial proposal would be to make the cache configuration
just be part of global Hudson config, not a job at all, and with no schedule.
Whenever any job, through its MercurialSCM, requested access to any of these
repos - whether for 'hg incoming' or during a build - the cache would first be
created or updated. To simplify administration, you could even omit the list of
repositories to cache and simply cache _any_ remote repository that was
encountered by any job, turning the cache configuration into a single
checkbox...though in this case some scheme for discarding cached repos not in
use for a long time would be useful.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

jglick-2

[Issue 4794] Mercurial cache job

Reply Threaded More More options
Print post
Permalink
In reply to this post by jglick-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=4794






------- Additional comments from [hidden email] Fri Nov 13 00:40:19 +0000 2009 -------
Interaction with Forest extension (issue #1143) may be problematic. For
simplicity would probably just disable cache usage from projects using Forest.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

jglick-2

[Issue 4794] Mercurial cache job

Reply Threaded More More options
Print post
Permalink
In reply to this post by jglick-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=4794






------- Additional comments from [hidden email] Sat Nov 14 16:00:27 +0000 2009 -------
While broadcasting incoming.hg to all slaves ought to be reliable in principle
(since their caches should never be doing anything besides pulling from master),
this might run into trouble if some slaves went offline and missed some earlier
updates, etc. A more robust way to push changesets over a Hudson channel is
using file transfer:

hg -R repo-a bundle `hg -R repo-b heads --template ' --base {node}'` /tmp/xfer.hg
hg -R repo-b unbundle /tmp/xfer.hg

This style has the advantage that slave caches can be updated lazily, since
there is no requirement that all slaves have been updated to the same point:
when running a Hg operation on a slave, simply pull on master cache, then update
that slave cache, then continue.

(There seems to be a bug in bundle: --base on a head revision does not prevent
that revision from being included, though it excludes its ancestors. So maybe
need to also run heads on repo-a and filter repo-b's list to avoid transmitting
extra changesets. Anyway this would be useful since if the filtered list is
empty, can avoid running any commands: repos are already in synch.)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

jglick-2

[Issue 4794] Mercurial cache job

Reply Threaded More More options
Print post
Permalink
In reply to this post by jglick-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=4794






------- Additional comments from [hidden email] Mon Nov 16 20:59:03 +0000 2009 -------
http://mercurial.selenic.com/bts/issue1910 tracks the Hg bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]