July 26, 2010

To Build or Not to Be - Seminar Videos

JFrog's Continuous Integration and Build Seminar "To Build or Not to Be", took place on July 1st, 2010 and was a big success.

The sessions of Kohsuke Kawaguchi creator of Hudson and CEO of InfraDNA, and Hans Dockter creator of Gradle and CEO of Gradle Inc are now available online.

Watch now the videos of "Gradle - A Better Way To Build".



and "Doing More with Hudson".



Enjoy!

May 11, 2010

The case study of JBoss Repository Manager

Most of the issues encountered by JBoss developers with their new build infrastructure are discussed openly here. This is an important source of information about the problems encountered with a Hudson, Maven and Sonatype-Nexus integration.

Since we (JFrog) worked on Hudson, Maven, and Repository Manager environment for many years, we provided some feedbacks to JBoss.

The main thing is: We are not promoting or supporting any particular build technology and so when Maven does not fit we are not afraid to say it as it is.

This case study exposes vividly the differences between Nexus and Artifactory!

Here are the issues I'm talking about:

  1. Mod dav or http(s) deployment URL
  2. Managing the distributionManagement XML tag in pom.xml
  3. Using password for Maven deployment
  4. Timeout and performance issues
  5. Maven metadata XML issue
  6. Maven deploy failures
  7. Creating and verifying checksums

1) Mod dav or http(s) deployment URL

To solve the timeout on deploy issue for richfaces, the recommendation was to use pure https instead of dav:https as url deployment.

Here are all the caveeat you need to know about Maven deploy wagons:

  • There are 3 main kind of Wagon used with Maven: Lightweight http (the default today), Heavyweight http (the default for older Maven versions), Dav.
  • Dav uses transactions when deploying, it has a better management of memory on the client side, and has a bug calculating checksums. Dav is a must when deploying big files. Now, the issue with dav is you can get timeout due to transactions, so the recommendation is correct!
  • Lightweight http(s) included in Maven, cache the whole byte array in memory so cannot work for big deployment.
  • Dav and Heavyweight protocols suffer from a default configuration that transfer each deployed file twice over the net. This is due to standard HTTP protocol always trying to send a request without authentication, then with it. Check this to find the way to configure preeemptive http authentication in Maven simple http client.
  • Maven is caching credentials information based on URL. It means that if you are authenticating the read queries, the same read credentials are used for deploy! See "Configuring Authentication" here.

Basically, looking at the above options nothing is satisfactory with standard Maven.
To solve this issue, we (JFrog) created a build client plugin that manages deployment of the artifacts using Apache commons http client with preemptive authentication and optimized response time. For info: This client is used to deployed full DVD images of 4.5Gb to Artifactory multiple times per day, from multiple Hudson servers and agents...

2) Managing the distributionManagement XML tag in pom.xml

To manage correctly all the above parameters, and the deployed URL of the exact repository location for each projects, you need to configure the distributionManagement tag in the pom.xml. This generates a lot of friction, and makes rebuilds of SVN tags almost impossible.

For good reason, JBoss wants to split the repository manager into multiple servers and local repositories, so each projects will deploy their artifacts to a dedicated repository.

Since this distribution is a progressive change, all the previous builds and SVN tags are obsolete and cannot be reproduced...

We believe that managing deployment configuration parameters should be done outside Maven, and that's why we have an Artifactory-Hudson plugin that does exactly this. Your pom.xml are clean of repository manager URLs and you can tuned deployment parameters (Timeout, http or https, ...) in Hudson outside Maven.

BTW: The Artifactory-Hudson plugin provides a lot more features like full bidirectional traceability between Artifactory and Hudson for all the artifacts produced and used during each hudson job execution.

3) Using password for Maven deployment

Like described here having encrypted password in your settings.xml is a little annoying but it works. Now, you need to know that it still forces you to use https since the clear text password will pass on the net. Now, the issue with Repository Manager password is that all kind of automation scripts beside maven needs them (Hudson, buildr, Ant, auto-backup, rsync, ...), and so you always find yourself in a tricky situation, see this and that.

To solve this issue, we implemented an encrypted password system that works as follow:

  • Artifactory generates an asymmetric key pair for each user,
  • Entering your clear text password in the UI gives you a block of the settings.xml with your personal encrypted password,
  • The above encrypted password can be used anywhere (not only maven), and allows only REST API access (maven deploy and so on) not UI access,
  • With the encrypted password, https to do maven deploy is not a must anymore => a lot less load on the servers,
  • Stealing the encrypted password or the private key (kept in Artifactory DB) has minimal security impact compare to full clear text decryption of all the passwords used on the JBoss Hudson server,
  • You can enforce to always use encrypted password for REST and maven deploy.

4) Timeout and performance issues

This issue is related to the dav mode, the JVM memory parameters, OS open file handles tuning, but also to the next issue about maven-metadata.xml files.

From our experience, one issue with latest 1.6 Sun JVM is the miscalculation of young memory size. So, when updating/upgrading your server you should verify that -XX:NewSize and -XX:MaxNewSize are at least set to 2/3 of the total heap size. Artifactory (like Nexus) needs a lot of memory space per request (managing files), and so the young eden space of the JVM is the most important parameter. Just be careful to set a good multiple of mega bytes has the JVM has a bug moving memory blocks from young to old gen space.
From what I read here, this is what I recommend: "-server -Xms1g -Xmx1g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:NewSize=512m -XX:MaxNewSize=512m"

5) Maven metadata XML issue

This issue is a little tricky to explain, I hope I'll be clear :)
For each maven pom or jar file of a SNAPSHOT version, maven needs to 2 maven-metadata.xml files. I'll take the richfaces issue as an example:

Deploying: org/richfaces/ui/modal-panel/3.3.4-SNAPSHOT/modal-panel-3.3.4-SNAPSHOT.pom needs to update:

  • org/richfaces/ui/modal-panel/3.3.4-SNAPSHOT/maven-metadata.xml to update the timestamp and or the latest unique version number of this version,
  • org/richfaces/ui/modal-panel/maven-metadata.xml to update the list of available SNAPSHOT versions and the latest one created.

The issue is that the maven client is doing this complicated work, combined with the fact that maven-metadata.xml files are very repeatedly requested by maven to build its version resolution scheme.
So, in this example, for every module of richfaces 3.3.X-SNAPSHOT, maven deploy the jar (twice due to issue 1), the pom, the sources, read the previous values of maven-metadata.xml on both folders, then write them down (twice due to issue 1). Of course due to dav and the heavy read on these files, you get timeouts.

The performance tuning planned here will help solve this issue since the reads will be offloaded to nginx.

Be aware, you will hit a bigger problem due to the above design: Repeated corruption of maven-metadata.xml files!

Like everyone, most of JBoss projects have upstream and downstream dependencies in Hudson. For example: richfaces-jsf2 depends on richfaces. So, it is common that multiple separate SNAPSHOT hudson job (let say richfaces-3.3.4-SNAPSHOT and richfaces-3.3.5-SNAPSHOT) will get executed in parallel. As you can see from the above maven design, the calculation of maven-metadata.xml files will almost certainly fail.

To solve this issue, our deployment system (in the hudson plugin) knows that Artifactory is a smart repository manager and so sends only the pom, jars and verified checksum to the repository manager. Artifactory is then queuing maven metadata recalculation tasks asynchronously, and so avoiding parallelism conflicts and timeout. Implementing this mechanism was far from evident, but we have today integration tests deploying 5 different snapshot versions concurrently (and repeatedly) without a hitch :) 

6) Maven deploy failures

As seen here, on failed Hudson jobs trying to deploy to Nexus, you end up with half deployed projects. It means that until the successful build 572 of richfaces, the repository contained a random list of good and bad jars and pom for the same project/version. It is recommended to actually do the deployment only until the full build is successful.

Deployment needs to be a post build action of Hudson (like it is done in our plugin).


7) Creating and verifying checksums

Since maven is calculating and deploying the checksum to JBoss Nexus instance, you have general issues like:

  • Playing with dav may have generated a lot of bad checksums that you don't know of;
  • There is no bad checksum verification on deploy, and so the build will be successful but users will get a wrong checksum error;
  • Since the correspondance between hudson and maven checksums could not be trusted you'll lost traceability between Hudson and Nexus.

For info: Artifactory is capable of failing the deployment if checksums does not match, and has many checksums policy configuration.


I hope Red Hat will solve their build infrastructure issues for the good of JBoss developers and users :)

March 3, 2010

Building an Enterprise Repository with Artifactory


*The content of this blog is a translation of a blog posted in Portuguese by Diego Pacheco.*

I clearly recall experiencing DLL hell while working predominantly with Microsoft products. We suffered from dependency issues back then and we still suffer from them today.

When I started working with Java I suffered from a similar development concern. The specifics are a little different, but the essence of the problem is the same – dependencies. Java developers know this experience as JAR hell.

It's amazing to me that companies still struggle with these issues in 2010. There are good solutions for solving dependency problems. Two such solutions are Apache Maven 2 and Apache Ivy. Both these Apache solutions provide dependency management.

It's common to see a folder called *lib* in the classpath of Java applications containing more than 100 jars. Often, 70% or more of these jars are completely useless, never used, never have been touched by the system ClassLoader, or will seldom ever be accessed by any system or user-defined ClassLoader. Still, they remain in the build. But why? Why leave so many many jars to bloat the application? For two reasons:
  1. Dependency Problems: Everyone has probably seen the famous ClassNotFoundException at one time or another. Often, this problem is solved by copying all the jars from the application server and framework distributions we use to the good old *lib* folder. Of course, this is not an elegant solution, and it is not a good solution, but is widely used because it is easy to implement and it saves time. The problem you get to live with is that you're hoarding a bunch of jars you don't need and the size of your application will swell.

    Note: The more robust application servers like Websphere for instance, let you create shared libraries. You don't have to pack all the jars with your distribution (i.e. ear, war, jar) because you can let the server put them in the classpath.

  2. Ignorance: Many companies still don't use any dependency management solutions. As a consultant I've seen this a lot, in Brazil and abroad. Maybe some companies believe their developers are doing great without some fancy management solution. The rest of us will be investing in dependency management solutions. Personally, I like either Maven or Ivy for dependency resolution.

Using a Dependency Manager

If you still don't make use of a framework for dependency resolution, I recommend you to start using one ASAP.
Either Apache Maven 2 or Apache Ivy will provide you with benefits like:
  • Transitive dependencies: making it easier to find those elusive dependencies
  • Versioning: Using the "dump it in the *lib* folder" approach there is no control over which version is needed or which version is in production or development. A dependency management framework provides you with fine granularity control without going mad.
  • Automatic downloading: With a dependency management solution, you don't have to search for dependencies, download them, and put them into the classpath of your application. This is a huge advantage.
  • Versions update: Here's another big headache handled for you, especially if you have several applications that make use of a jar that you develop internally. Good dependency management keeps you from going over every application to perform version changes by hand. This will save you some real hard work and a huge amount of time.

I hope that you are already convinced that having a dependency management solution like Ivy or Maven is a good idea. Maybe you use one or the other already.

Often, companies that use Maven or Ivy forget to set up a good corporate repository solution behind it all.


Building a corporate repository with Artifactory

I've used Java corporate repository solutions such as Nexus and Archiva in the past. Recently I use only Artifactory and I've never had a reason to look back. Artifactory is a very elegant solution to dependency management. The main issue I had with Archiva is that it downloads and resolves dependencies rather slowly.

Firstly, I recommend Artifactory because it does a much better and faster job of downloading and resolving dependencies.

Secondly, I heartily recommend using Artifactory because it provides a comfortable web UI for the management of your third party jars (solutions and plugins) like Spring, Hibernate, JBoss, as well as managing those that are produced in-home by your company. This separation can be done using Artifactory's repositories.

Thirdly, I strongly recommend Artifactory because it provides separation and control of the many versions of development and production material. Most of the time people use Artifactory only as a proxy solution that helps you save some traffic from your company's bandwith, but it's actually much more.


Dependency Control

Artifactory allows you to control the permissions of users and groups. You decide which applications can consume or use certain repositories and jars. For example, you can restrict the use of Spring only for some projects. I wouldn't do that :) but it can be done. Artifactory also provides you with a much better way to manage your internal dependencies and the solutions you write by letting you limit and centralize access to those dependencies. You may never miss a dependency again.


M2 and Ivy Consolidated Repository



The figure on top shows that by using *virtual* Artifactory repositories you can serve both Ant-Ivy and Maven clients. The coolest thing is that you can serve both clients from the same repository.

How is this possible? Because Artifactory is flexible.

Artifactory works under the Maven 2 dependencies standard. This standard includes:
  • GroupId: the group of the solution, macro grouping of a module, or a providing company (e.g. org.springframework)
  • ArtifactId: represents or identifies the available / consumed artifact – could be the name of the jar like in our old *lib* folder (e.g. spring)
  • Version: the version of the jar (In Maven when we see something like 1.0, normally it indicates that the solution is a production release and if available, you should use GA. When a solution is under development, a SNAPSHOT version is used; sometimes followed by the date and time of the last build.)

With Ivy things work pretty much the same in the attribute organisation maintaining a respective equivalence to Maven 2:
  • Organisation: similar to Maven GroupId
  • Module: similar to Maven ArtifactId
  • Revision: similar to Maven Version

The big difference between the two Apache Dependency solutions is that Ivy uses a regex-style pattern for dependency resolution, and Maven uses a predefined pattern for dependency resolution. It's a fact that Ivy gives you more flexibility, but also demands more work. What I suggest is to use the same Maven pattern for Ivy, so deployments to Artifactory will be consistent. This way you'll be able to use the same repository for Ivy and Maven.


Configuring Ivy to download and deploy to Artifactory

To accomplish this, all you have to do is use this XML file I've named ivy-artifactory-settings.xml:
1<ivysettings>
2
<settings defaultResolver="public"/>
3
<credentials realm="Artifactory Realm"
4
host="seu_host_do_artifactory"
5
username="admin_user" passwd="admin_password"/>
6
<resolvers>
7
<ibiblio name="public" m2compatible="true"
8
root="http://seu_server:8080/artifactory/
9 seu_proxy_repository"
/>
10
<url name="publish_artifactory" m2compatible="true">
11
<artifactpattern="http://seu_server:8080/artifactory/
12
seu_repositorio_release_repository/[organisation]/
13
[module]/[revision]/[artifact]-[revision].[ext]"/>
14
</url>
15
</resolvers>
16
</ivysettings>
Do not forget to use the pattern [organisation]/[module]/[revision]/[type]s/[artifact]-[revision].[ext] to resolve the dependencies. By the time you are publishing your jar via Ivy, you can use a task to make the Maven POM based on Ivy's dependencies. It should be something like this:
1<ivy-makepom ivyfile="${ivy.xml.file}"
2
pomfile="${basedir}/dist/${ivy.organisation}/
3 ${ivy.module}/${ivy.revision}/
4 ${ivy.module}-${ivy.revision}.pom"
>
5
<mapping conf="default" scope="compile"/>
6
<mapping conf="runtime" scope="runtime"/>
7
</ivy-makepom>
Taking into account that you created your jar in the dist folder and that it follows the Maven 2 pattern, you will notice that it's located within the hierarchy of organisation, module and version folders and that the jar is tagged with the module name and the version.

Consuming and publishing your artifacts using Maven shouldn't put you to any trouble, since Ivy now follows the same pattern as Maven.

So no matter if the project uses Maven 2 or Ant with Ivy, by using Artifactory and following the patterns and tips from this post, you will be able to use the same repository for everything.

Best regards until next time.

January 14, 2010

So you've decided to configure a remote repo and avoid headaches?!

Background
There are a lot of public Maven 2 repositories out there (repo1, JBoss, SpringSource, etc.). When setting up your a repository manager for your organization, configuring remote repositories can be one of the most difficult tasks. Finding the correct URL for those remote repositories, and more importantly, defining the correct include/exclude patterns for artifacts, is not always a trivial thing to do.

However, we at JFrog believe that if you are using Artifactory as your repository manager, this task can become a snap using remote repository configuration sharing!


Remote Repository Provisioning

To share the configuration of your remote repositories all you need to do is configure a remote repository once. Then all other Artifactories that you allow access can connect to yours and pull the remote repository configuration via REST and they are all set and ready to go in a matter of minutes!

Artifactory allows you to define include/exclude patterns on the repository level (as opposed to group level), which is important for repository configuration sharing, since it deems each remote repository responsible for a predefined set of artifacts.


The Process

The process is split into two separate parts: sharing and importing: First, you must choose which repository configurations you want to expose (you don't want to expose repositories with sensitive information) and then other Artifactories simply pull the configuration.


1. Sharing: choose which repositories to expose

Go to your Repositories page and select which remote "repo" you would like to share. Then, in the "Advanced" area of the panel you need to select the "Share Configuration" check box and click Save. It's that easy.




2. Importing: adding repositories to a new artifactory instance

Now let's go to the other side. Let's say that you have a brand new instance of Artifactory running. In the repositories admin page you can see the default list of remote repositories. You can select these predefined common remote repositories (JBoss, SpringSource, Java.net, Google, and so on).


If you wish to update one of the repository definitions or add a new one that is not currently there all you need to do is click the "Import" button. Enter a remote Artifactory URL, or just use the default one that points to "http://repo.jfrog.org", to get a list of most common well-known repositories.

Now, click "Load".

You will get a list of readily available repositories is at your disposal. Simply check the ones you want, modify the repository key if needed, and import. The process is quite easy and lightweight!




Assuming you are in a large company that has several Artifactories running in multiple locations, you do not need an expert to configure those repositories again. All you need to do is connect to one central Artifactory that holds all remote repositories already configured and working, and simply pull the configuration over to your side. You are ready to go in a matter of minutes and without the headache of trying to make sure that configuration matches the one on the central Artifactory.

Remote repositories may change (often leading to a domino effect in artifacts resolution). Artifactory deals with that too by allowing a repository configuration to be overridden with a newly retrieved one. You can of course rename an imported configuration in case it conflicts with an existing configured repository.


Conclusion:

Artifactory OSS version offers a powerful way to make remote repository configuration easy, eliminating redundant maintenance pains. By following a simple and easy-to-use process the entire aspect of repository sharing becomes a no brainer.

That's it—enjoy and happy building!

December 30, 2009

Empower Hudson with Artifactory - Track and Replay Your Build Artifacts

Overview
In this blog, I will demonstrate how to integrate Hudson with JFrog's Artifactory repository manager to have full build-to-artifacts traceability. We will use Artifactory plug-in to deploy the Hudson build artifacts and track them back to their original build.

Keeping the history and reproducibility of code is a must-have for any modern project.
Using one of the different flavors of version control applications, you can easily reproduce the state of any point in the past using the different methods of SCM tagging.

But what happens when you want to reproduce binary products from a certain phase?
Are dependencies considered? Does anyone really remember what version of dependency X was used in version 1.0 or in version 3.1 of your application? What if you used version ranges or dynamic properties? Was the application compiled using JDK 5 or 6?

All this information can be recorded during the publication of your binaries, which is usually done by a CI server of your choice.
Your CI server has all the knowledge required in order to reproduce a build:
  • Information on the builds themselves
  • The published items
  • Version information
  • Dependencies
  • Build environment details
But how can you capture all this data? This is where Artifactory kicks in!

Artifactory (v2.1.3+ OSS) is open for communication from any build process to receive information needed for tracing/reproducing a build - the sender of this information is typically your build server!
The information is transferred
via REST in the form of a BuildInfo JSON object and contains details about the modules, artifacts, dependencies, environments, properties, and more.
All builds and binaries are provided with bi-directional links that enable you to reproduce and analyze the impact of any action.

Presently, JFrog provides a first integration with Hudson and Maven 2. Other technology stacks are coming, but for the purpose of this blog I will use a setup of Hudson with a Maven 2 build.

So let's get "crackin'"!

We assume that your instance of Hudson is already configured to request all it's dependencies from Artifactory. This of course ensures that all your build's dependencies are cached in Artifactory and can be used for build reproducibility.

Installing Hudson's Artifactory Plug-in
To install the Artifactory plug-in, simply browse from the main menu to "Manage Hudson" -> "Manage Plugins" -> "Available" Tab, and check to enable "Artifactory Plug-in". Once the plug-in has been downloaded and installed, restart Hudson for the changes to take effect.

Now we'll configure the plug-in on a system-wide level and point it to the Artifactory to which we would like to publish the information (please note that Artifactory should be running and available at this point).
To do this, enter the "Configure System" menu via "Manage Hudson" -> "Configure System", and then configure the URL (up to the application context name, like "http://localhost:8081/artifactory"), and optional credentials (if anonymous access is enabled, you don't need to provide them). The need for credentials of an authenticated user comes from the fact that Hudson requests a list of deployable repositories from Artifactory, so you can choose the destination of your binaries at a later stage.
Notice that you can add multiple Artifactory configurations to suit your needs.

Next, we'll configure the plug-in at the "Job" level.
Enter the Job configuration by selecting your Job and clicking the "Configuration" link. Scroll down to the Post-build Actions option group, and select "Deploy artifacts to Artifactory". Once selected, the menu will expand and will let you choose the "Artifactory server" and "Target repository" to which to deploy. As implied by the field names and the "Deployer username" and "Deployer password" credentials you provide, you must have "Deploy" permission on the target repository you select.
Being able to select your deployment target from a ready-made list, which is received directly from Artifactory, helps you avoid the pitfall of configuring your "Distribution Management" with typos.

The Artifactory plug-in deploys via REST API, which optimizes the process of unique/non-unique snapshots and doesn't require credential and distribution management configuration in your settings.xml and POM files.
Unlike Maven, which deploys each module as its build is completed (which may result in a partial deployment of your project's artifacts if your build fails at some point), the Artifactory plug-in deploys only when the entire build completes successfully (much like the built-in Hudson deployer). Each deployed artifact is tagged with buildName and buildNumber properties, and finally the Build Info is published.

At this point, you can run your Job, and then view the "Console output" to see the deployment and build info publication log messages.


Artifactory's Build Management

Now that the Job is complete, the artifacts are deployed, and the build info is published, we can view the build info in Artifactory by clicking the "Artifacts" tab under the new "Browse:Builds" sub-menu.
Here, we can see a list of all the published build names and the time each was last built.
Drilling down through the build number list, we can view the general info, the published modules, and the XML representation of the selected build.
Notice that the top of the build browser displays navigable breadcrumbs that are also synchronized with a RESTful URL that provides easy access to every part of every build.
The general info tab displays the main details about the build (name, number, type, etcetera) properties that were attached, and even the option to save the published module's artifacts and dependencies as saved search results (requires the "Smart Searches" add-on).

Clicking on the name of a module displays a list of the artifacts and dependencies that are part of the selected module.





When deleting an artifact that's associated with a build, either as a product or a dependency, Artifactory will notify you of the association prior to the removal.

Promotion of published modules is also made possible by the "Save search results" actions that are available through the General Build Info tab (requires the "Smart Searches" add-on).
Moreover, buildName and buildNumber properties, allows us to manually search build artifacts through the Property Searcher (requires the "Properties" add-on).

Conclusion
Using Hudson (and others to be supported soon) and Artifactory we've:
  • Supplied Hudson with all the needed dependencies from Artifactory—helping us keep the exact dependencies that were used in each build
  • Deployed all produced binaries to Artifactory—helping us keep and promote all the products of the build
  • Published build information to Artifactory—helping us manage and keep track of every build, environment, product, and dependency
With the assistance of these tools and methods, you will be able to reproduce and execute a build from any point of recorded time or compare information between different builds.
You may want to visit our build integration wiki page for a more in-depth explanation of the process.

Have fun, and be careful not to break the build. ;)




December 24, 2009

The one that talks, the one that does!

In a blog "Why Putting Repositories in your POMs is a Bad Idea", Sonatype "asked" the open source community to manage their Maven2 POM file correctly.

This is a good and important request, since Maven will not work correctly:
  • Over time (due to URL changes)
  • In a closed environment (no direct access to the Internet from a developer machine)
  • Because it will shortcut the repository manager of your choice (Nexus, Archiva, or Artifactory) for resolving dependencies—this is most important.


We are facing this problem for almost every customer that uses Maven, and most of them are using the lazy and dirty solution of "mirrorOf". It is argued in the blog that mirroring all Maven requests to a single URL is a good idea: We know it is a bad idea as it completely takes away control of isolating the source for releases, snapshots, and plug-ins!

There is however, a dramatic sentence in the blog about POM files coming from Open Source projects (it actually applies to everyone):
"The entries you have defined will be burned forever into your released POMs."
It sounds like Maven is broken by design and forever, because of all the bad POM files that already exist out there.

Since we support our customers, and they are suffering from actions that are not under their control, we decided to fix it.

In the latest version of Artifactory (2.1.3), there is now a new feature: Automatic cleanup of remote repositories declared in POMs.
You can now configure any virtual repository to automatically clean up rogue remote repositories declared in POM files.

By default, Artifactory will do it for repositories and plug-in repositories directly declared under the project POM entity, or declared inside an active-by-default profile. You can enforce a deeper cleanup that removes all repository and plug-in repository declarations in all profiles.

Using Artifactory as your repository manager means that you will never get "burnt forever" by innocent mistakes done in POM files of nice, popular Open Source projects.

"The one that talks, the one that does!"

November 5, 2009

Search-based Promotion - Staging and Promotion Finally Made Simple!

Overiew

One of the greatest features of Artifactory 2.1 is the support for artifacts staging and promotion.
The idea behind this feature is that in many environments, before exposing a new release for public consumption, the release needs to go through a well-known life-cycle - the release is first made available in a staging environment where it is validated and undergoes final QA, only then it is moved or 'promoted' to a place where it can be found and used by clients.

In the world of releasing artifacts this means that when to-be-released artifacts are built and deployed into Artifactory, you want them to first to be accessible only by a selected group (staging). It is only when the artifacts have been 'blessed' and found to be release-ready, that you can change their access-level and make them available for download by a wider audience (promotion).

There are two problems with managing staging and promotion. The first problem is around identification - how can we manage and identify a group of artifacts deployed to Artifactory by the same build, in order to promote them later on as a single unit.
The second problem is about promotion composition - once to-be-promoted artifacts can be identified, what is the best way to collect them and change their visibility. Also, what if you wish to be picky about promotion. For example - what if you want to sieve test artifacts from the release or remove by hand an internal artifact that was originally part of the deployed files?

The solution described here takes advantage of one of Artifactory's coolest features - the ability to attach properties on artifacts and the ability to compose and operate on search results. It uses some features found in the Artifactory Add-ons Power Pack, but with some additional setup it can also be adapted to the open source version of Artifactory.

Identifying Artifacts Deployed as a Group

The Quarantine Way
One way of identifying same-build artifacts is by creating isolation: you create a "quarantine repository" per-build and deploy your artifacts to it. This guarantees that artifacts are isolated in a confined place where they can be identified. The next step is to make the quarantine repository visible to your staging clients.
The main problem with this approach is that it somewhat a heavyweight solution:
  • You need to create a pseudo repository on-the-fly, usually using a cryptic handle (such as a time stamp and the originating IP address) which makes it complicated to remember and identify later on.
  • For each such temporary repository you need to configure how it will be exposed under a more user-friendly repository and who will have permissions to view it.
  • You need to manage the repository and decide when to close it and how long it needs to be kept.
So, we decided we don't want to go this way of creating a quarantine area and introducing unnecessary maintenance efforts. Moreover, the whole concept of creating an on-the-fly repository for the sake of supporting promotion, just felt unnatural to us! What we really wanted to do instead is to be able to deploy our artifacts to a normal managed repository accessible by the QA users. Then, we wanted to find in this repository all the artifacts that were deployed by the same build and promote them as one unit to a different repository. If we could also have a way to do some manual artifacts filtering before promotion this would also be nice.

Meet Artifacts Tagging
Artifactory supports attaching properties to artifacts (and folders). You can search for property values and retrieve artifacts that were tagged with properties. So, if we could somehow attach properties to our build artifacts at build-time in a simple manner, then we can use these properties to locate and identify the deployed artifacts originated from the same build for the sake of promoting them later on.
Properties can be attached to artifacts in many ways: via the UI (the Artifactory Properties add-on allows you to attach custom strongly-types properties), via REST API using PUT requests and as a piggy-back to artifacts PUT requests using matrix parameters.
We are going to use the last method to transparently tag our artifacts as they are deployed to Artifactory by the build process. To accomplish this, all we need to do is add matrix parameters to the deployment URL, as part of our POM's 'distributionManagement' section:
<distributionManagement>
<repository>
<id>qa-releases</id>
<url>
http://myserver:8081/artifactory/qa-releases; \
buildNumber=${buildNumber}; \
revision=${revision}
</url>
</repository>
</distributionManagement>

Matrix parameters are a set of key-value pairs separated by semicolon (;) and are a standard HTTP way for specifying parameters (in addition to query parameters and path parameters).

In our example, we added two parameters to the deployment URL: buildNumer and revision, both will be replaced by Maven in deployment time with dynamic values from our project properties (e.g. by using the Maven build-number plugin). When building a multi-module project all the artifacts deployed from the parent project will end up having the same buildNumber and revision values. When Artifactory sees the values as part of the URL it automatically attaches new properties with the supplied key/values to the deployed artifacts.
This mechanism works with any Artifactory 2.1+, without having to install any add-on. It also works with any REST client, which means you do not have to customize Maven or use a Maven plugin - it just works out-of-the-box.

The artifacts in this example are deployed to the 'qa-releases' local repository, which is our common place for staging artifacts before they undergo final testing and approval by the QA team.

Composing And Promoting Artifacts

The Goal - Search and Move
So, we managed to have all our build artifacts tagged with properties and deployed into the staging area. Next the QA team has gone through all known issues and approved our artifacts for further distribution to clients. We now need to find a way to copy or move our artifacts to another area where they can be downloaded by these clients. In fact, we can think of 'promotion' as a fancy term for moving (or copying) artifacts to a different place where different access rules apply.

So, what we really need to do is search for our recently deployed artifact by their build number property and then move the result to another repository.
One way to do this is via REST API - search for artifacts by metadata (properties are stored internally as XML metadata) and send DAV MOVE requests to move the results one by one. This will require writing a small external script to send the relevant HTTP queries and can be done using the basic open source Artifactory version. However, there is a much simpler and more powerful way to do this inside Artifactory itself, which involves the 'Smart Searches' and 'Properties' add-ons.

Finding To-Be-Promoted Artifacts - Meet Smart Searches and Properties
The idea behind smart searches is as follows:
You collate artifacts by searching - start with performing a search, then save the search results found into a named 'search result'. A 'search result' is merely a collection of found items. You can now perform additional searches, and for each search you can choose whether to create new search results or, more interestingly, add or subtract the items found to/from any previously saved search result. You can combine artifacts from any search type (simple, GAVC, class, XML etc.) - the combinations are really unlimited!

In our case we will simply run a property search for all artifacts deployed with a certain build number (tagged in deployment with the buildNumber property). For this, we will use the Properties search user interface:



We will then save our results as a new search result called 'to-be-promoted'. We can perform other searches to add or remove artifacts from the saved results.



We can even manually tweak our saved results by discarding specific artifacts - say one artifact in the build needs to be closed source, we can discard the sources form the results (if we wanted to discard all sources we could do this easily by subtracting the results of a GAVC search for the 'sources' classifiers - see the screenshot below).



Once we are happy with our saved search results, we can go on and promote them!

Promotion - Just Move It!
Promotion is super easy - we just move the saved search results artifacts to a target, more open, repository, thus making them available for public consumption. We can choose whether to move the original artifacts or create a copy of them in the destination repository.



We can even perform a 'dry run' to see that everything moves fine and we have no errors or warnings (for example, the destination repository might not accept some artifacts due to it snapshots/releases policy, security policies or include/exclude path patterns).



Finally, we can also export the search result artifacts to the file system if we wish to use them elsewhere outside Artifactory.
When search results are copied ro moved they maintain their original metadata and properties, so any valuable information you might have on your artifacts is kept.

Wrapping Up
Artifactory is taking a unique approach towards staging and promotion - instead of artificially requiring artifact to be deployed into a quarrantine area, we simply tag artifacts with metadata upon deployment and have them deployed to a central staging repository that is accessible internally to the development and the QA team members. This process is cheap, doesn't require complicated setup and is natural to both developers and testers. If we wish to avoid double-deployments to the staging repository we can easily do it using permissions (using the delete privilege) - there no need to manually close the whole repository for this.
Once we are happy with the staging build artifacts, we can collect them by searching and simply move or copy them to another target repository where they are publicly available. In this example we tagged our artifact with a buildNumber property which we searched for later, but we could attach any properties to our artifacts and perform many kind of searches to control our promotion source (by combination of property values, by GAVC etc.) - with metadata on artifacts we are not limited by the physical repository our artifacts were deployed into!

Finally, we created a screencast showing the promotion methodology described here, in action.

Enjoy!