Alfresco, NAS or SAN, that’s the question!

The main requirement on the shared storage is being able to cross-mount the storage between the Alfresco servers. Whether this is done via an NAS or SAN is partly a decision around which technology your organization’s IT department can best support. Faster storage will have positive implications on the performance of the system, with Alfresco recommending throughput at 200 MB/sec.

NAS allows us to mount the content store via NFS or CIFS on all Alfresco servers, and they are able to read/write the same file system at the same time. The only real requirement is that the OS on which Alfresco is installed supports NFS (which is any Linux box actually). NFS tends to be cheaper and easier, but is not the fastest option. It is typically sufficient, though.

SAN is typically faster and more reliable, but obviously more expensive and complex (dedicated hardware and configuration requirements). In order to read/write from all Alfresco servers from/to the SAN, special file system types are necessary. For Red Hat, we use GFS2, other Linux flavors use OCFS or many others.

You are maybe thinking what happen in case of having multiple Alfresco servers writing to the same LUN could result in corruption (especially in header files), so it sounds like NAS (NFS/CIFS) would take care of that issue, however, if using a SAN, the filesystem must be managed properly to allow for read/write from multiple servers. For the Alfresco stand point, you don’t have to take care of that in both SAN or NAS approaches because Alfresco manages the I/O such that no collisions or corruption occur.

Note: If using a SAN, ensure the file system is managed properly to allow for read/write from multiple servers.

I also wanted to share this presentation I did internally some time ago but I think it would be useful.

Alfresco Tuning Shortlist

During last few years, I have seen dozens of Alfresco installations in production without any kind of tuning. That makes me thing that 1) nobody cares about performance or 2) nobody cares  about documentation or 3) both of them!
I know people prefer to read a blog post instead the product official documentation. Since Alfresco have improved A LOT our official documentation and most of the information provided below can be found there, I want to point out some tips that EVERYONE has to take into account before going live with your Alfresco environment. Remember, it’s easy Tuning = Live, No Tuning = Dead.
Tuning the Alfresco side:
  • Increase number of concurrent connections to the DB in alfresco-global.properties

[bash]
# Number below has to be the maxThreads value + 75
db.pool.max=275
[/bash]

  • Increase number of threads that Tomcat will use in server.xml – section 8080, 8443 and 8009 in case you use AJP

[bash]
maxThreads=“200”
[/bash]

  • Adjust the amount of memory you want to assign to Alfresco in setenv.sh or ctl.sh (which is the default one):

[bash]
export CATALINA_OPTS=" -Xmx=16G -Xms=16G"
[/bash]

in JAVA_OPTS make sure you have the flag “-server” that gives 1/3 of memory for new objects, do not use “XX:NewSize=” unless you know what you are doing, Solr takes many new objects and it will need more than 1G in production.

[bash]
ooo.enabled=false
jodconverter.enabled=true
[/bash]

Tuning the Solr side:
In solrcore.properties for both workspace and archive Spaces Store

[bash]
alfresco.batch.count=2000
solr.filterCache.size=64
solr.filterCache.initialSize=64
solr.queryResultCache.size=1024
solr.queryResultCache.initialSize=1024
solr.documentCache.size=64
solr.documentCache.initialSize=64
solr.queryResultMaxDocsCached=2000
solr.authorityCache.size=64
solr.authorityCache.initialSize=64
solr.pathCache.size=64
solr.pathCache.initialSize=64
[/bash]

In solrconfig.xml for both workspace and archive Spaces Store 

[bash]
mergeFactor change it to 25
ramBufferSizeMB change it to 64
[/bash]

April/9/2015 Update! For Solr4 (Alfresco 5.x) add next options to its JVM startup options:

[bash]
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
[/bash]

Tuning the DB side:
Max allowed connections, adjust that value to the total amount of your Alfresco or Alfrescos plus 200, consider increase it in case you use that DB for other than only Alfresco.
  • For MySQL in my.cnf configuration file:

[bash]
innodb_buffer_pool_size = 4GB
max_connections=600
innodb_log_buffer_size=50331648
innodb_log_file_size=31457280
innodb_flush_neighbors=0
[/bash]

  • For Postgres in postgresql.conf configuration file

[bash]
max_connections = 600
[/bash]

Do maintenance on your DB often. Run ANALYZE or VACCUM (MySQL or Postgres), a DB also needs love!
Tuning the OS side:
I’m not very good on Windows so I will cover only a few tips for Linux:
  • Change limits in /etc/security/limits.conf to the user who is running your app server, for example “tomcat”:

[bash]
tomcat soft nofile 4096
tomcat hard nofile 65535
[/bash]

If you start Alfresco with a su -c option in /etc/init.d/, for Ubuntu you have to uncomment the pam_limits.so line here /etc/pam.d/su, if this is using login (by ssh) it is uncommented by default. For RedHat/Centos this line has to be uncommented here /etc/pam.d/system-auth.

  • Your storage throughput should be greater than 200 MB/sec and this can be checked by:

[bash]
# hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 390 MB in  3.00 seconds = 129.85 MB/sec
[/bash]

  • Allow more concurrent requests by editing /etc/sysctl.conf

[bash]
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 2048 64512
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10
[/bash]

Run “sysctl -p” in order to reload changes.
  • A server full reboot is a good preventive measure before going live, it should start all needed services in case of contingency and we will find if we left something back on the configuration.
Remember, this is ONLY A SHORTLIST, you can do much more depending on your use case. Reading the documentation and taking our official training will be helpful and take advantege that we were polishing our training materials lately.

Understanding Alfresco Content Deletion

As part of the work I’m doing for the upcoming Alfresco Summit, where I will be talking about my favorite topic: “Security and Alfresco”, I have written a few lines about Alfresco node deletion, how it works and why is important to take it into account in terms of security control.
I just wanted to clarify how Alfresco works when a content item is deleted and also how content deletion works in Records Management (RM). Basic content deletion is already very well explained in this Ixxus blog post but there are some differences in the database schema between Alfresco 4.1 and 4.2 worth noting, such as the alf_node table has a field named ‘node_deleted in versions 4.0 and earlier.
To develop a deep knowledge about Alfresco security and also how to configure Alfresco backup and disaster recovery, you should first need to understand how the Alfresco repository manages the lifecycle of a content item.
Node creation:
When a node is created,regardless how it is uploaded or created in Alfresco (via the API, web UI, FTP, CIFS, etc.)Alfresco will do the following:

  1. Metadata properties are stored into the Database in the logical store workspace://SpacesStore (alf_node, alf_content_url among others).
  2. The file itself is store and renamed as .bin under alf_data/contentstore/YYYY/MM/DD/hh/mm/url-id-of-the-file.bin
  3. Next, depending on your indexing you chose, its index entries are created within Lucene (alf_data/lucene-indexes/workspace/SpacesStore) or Solr (alf_data/solr/workspace/SpacesStore).
  4. Finally, in most cases, a content thumbnail is created as a child of the file created.

Node deletion:
There are two phases to node deletion:
Phase 1- A user or admin deletes a content item (sending it to the trashcan):

  1. When someone deletes a content item, the content and its children (eg. thumbnails) are moved (archived) within  the DB from workspace://SpacesStore to archive://SpacesStore. Nothing else happens in the DB.
  2. The actual content “.bin” file remains in the same location inside the contentstore directory.
  3. Finally,the indexes are moved from the existing location to the corresponding archive alf_data/lucene-indexes/archive/SpacesStore) or Solr (alf_data/solr/archive/SpacesStore) depending on your index engine selection.

NOTE: A deleted node stays in the trashcan FOREVER, unless the user or admin either empties the trashcan or recovers the file. This default” behavior can be changed by using third party modules that empty the trashcan automatically on a custom schedule. See below for more information on these modules.
The trashcan may be found at these locations:
 Alfresco Share: User -> My Profile -> Trashcan (admin user will see all users deleted files, since 4.2 all users can also see and restore their own deleted files).
Alfresco Explorer: User Profile -> Manage Deleted Items (for all users).
Phase 2- Any user or admin (or trashcan cleaner) empties the trashcan:
That means the content is marked as an “orphan” and after a pre-determined amount of time elapses, the orphaned content item ris moved from the alf_data/contentstore directory to alf_data/contentstore.deleted directory.
Internally at DB level a timestamp (unix format) is added to alf_content_url.orphan_time field where an internal process called contentStoreCleanerJobDetail will check how many long the content has been orphaned.,f it is more than 14 days old, (system.content.orphanProtectDays option) .bin file is moved to contentstore.deleted. Finally, another process will purge all of its references in the database by running nodeServiceCleanupJobDetail and once the index knows the node has bean removed, the indexes will be purged as well.
NOTE: Alfresco will never delete content in alf_data/contentstore.deleted folder. It has to be deleted manually or by a scheduled job configured by the system administrator.
By default, the contentStoreCleanerJobDetail runs every day at 4AM by checking how the age of an orphan node and if it exceeds system.content.orphanProtectDays (14 days) it is moved to contentstore.deleted.
Additionally, the nodeServiceCleanupJobDetail runs every day at 9PM and purges information related to deleted nodes from the database.
Now, that we understand how Alfresco works by default, let’s learn how to modify Alfresco’s behavior in order to clean the trashcan automatically:
There are several third party modules to achieve this, but I recommend the Alfresco Trashcan Cleaner by Alfresco’s very own Rui Fernandes. Tt can be found at https://code.google.com/p/alfresco-trashcan-cleaner/.
Once the amp is installed, you can use this sample configuration  by copying it to alfresco-global.properties:

[bash]
trashcan.cron=0 30 * * * ?
trashcan.daysToKeep=7
trashcan.deleteBatchCount=1000

[/bash]

The options above configure the cleaner to run every hour at thethe half hour and it will remove content from the trashcan and mark them as orphan if a content has been in the trashcan for more than 7 days. It will do this in batches of 1000 deletions every time it runs. To delete from the trashcan without waiting any grace period set the trashcan.daysToKeep property value to -1.
Can I configure Alfresco to avoid using contentstore.deleted and ensure it really deletes a file after the trashcan is cleaned?
Yes, this is possible by setting system.content.eagerOrphanCleanup=true in alfresco-global.properties and once the trashcan is emptied, the file will not be moved to contentstore.deleted but it will be deleted from the file system (contentstore). After that, nodeServiceCleanupJobDetail will purge any related information from the database. Using sys:temporary aspect it also perform same behavior.
So, what is the recommended configuration for a production server?
This is something you have to figure out based on your backup and disaster recovery strategy. See my  Alfresco Summit presentation and white paper here: http://blyx.com/2013/12/04/my-talk-about-alfresco-backup-and-recovery-tool-in-the-alfresco-summit/.
If you have a proper l backup strategy, you can offer your users a grace period of 30 days to recover their own deleted documents from the trashcan and after the grace period delete them simultaneously from the trashcan and the filesystem. This can be achieved by installing the previously mentioned trashcan-cleaner and with this configuration in alfresco-global.properties:

[bash]
system.content.eagerOrphanCleanup=true
trashcan.cron=0 30 * * * ?
trashcan.daysToKeep=30
trashcan.deleteBatchCount=1000

[/bash]

And what about Alfresco Records Management, does it work in the same way? How a record destruction works?
In the Records Management world you don’t tend to delete documents as often it is done in Document Management. When a content item is deleted from the RM file plan, it is considered to be a regular delete operation. This is rarely used and only done by RM admins when there is some justifiable reason such as correcting  a mistake that requires a record to be removed.
The only difference is that the deleted record by-passes the archive store, hence it never goes to the trashcan, it is marked as orphan once it is deleted. Then it will be moved to contentstore.deleted after orphanProtectDays or it is truly deleted if eagerOrphanCleanup is set as true.
Destruction of a record works in the same way that a record is removed, this will by-pass the archive and immediately trigger the clean-up (eagerOrphanCleanup) process so the content does not stay in the file system contentstore or contentstore.deleted.
As far as the meta-data goes, there are two options; the first is that all the meta-data (and hence the node itself) are completely deleted, the alternative method cleans out all the content but the node remains with only the meta-data (called ghosting). In Alfresco RM versions before 2.2 this was a global configuration value (rm.ghosting.enabled=true), in 2.2 it can be defined on the destroy step of the disposition schedule: “Maintain record metadata after destroy”.

Alfresco content deletion graph
Alfresco content deletion

Some final words on content deletion:
As we have seen, Alfresco offers different ways to delete content. It is important to remember, even if Alfresco completely deletes content such as when using the destroy option in RM or by using eagerOrphanCleanup, Alfresco will not wipe the removed content from the physical storage, it therefore can be recovered by file system recovery tools. Wiping a deleted content item may vary depending on multiple factors, since filesystem type to hardware configuration, etc. If you want to guarranty a real physical wipe of a file in your file system, a third party software must be used to “zero out” the corresponding disk sectors. The specific tools depend on the operating system type, hardware, etc.
Thanks to my colleagues at Alfresco Kevin Dorr, Roy Wetherall for the Records Management section and Luis Sala for the document syntax review.

Where and how to change any Alfresco related port

Due to a conversation in Twitter with @binduwavell I haven’t found a single place where to find how to change any port related to all services that Alfresco runs. So, I have decided to write a blog post about it with some notes from Rich McKnight.

Here you go a comprehensive list of all ports and where to change them:

Tomcat:

  • HTTP 8080: tomcat/conf/server.xml
  • HTTPS 8443: tomcat/conf/server.xml
  • Shutdown Port 8005:  tomcat/conf/server.xml
  • AJP 8009:  tomcat/conf/server.xml
  • JPDA 8000: catalina.sh

Alfresco:

Alfresco context inside Alfresco configuration: alfresco-global.properties

  • alfresco.port=8080

Share:
Share context inside Alfresco configuration: alfresco-global.properties

  • share.port=8080

If repository ports are changed you change Alfresco Share connection ports in web-extenxion/share-config-custom.xml

Alfresco SharePoint Protocol: alfresco-global.properties

  • vti.server.port=7070
  • vti.server.external.port=7070

OpenOffice – LibreOffice: alfresco-global.properties

  • ooo.port=8100

JodConverter: alfresco-global.properties

  • jodconverter.portNumbers=8100

FTP: alfresco-global.properties
Can be mapped to non-privileged ports, then use firewall rules to forward requests from the standard ports

  • ftp.port=21

CIFS – SMB shared drive: alfresco-global.properties
Can be mapped to non-privileged ports, then use firewall rules to forward requests from the standard ports

  • cifs.tcpipSMB.port=445
  • cifs.netBIOSSMB.sessionPort=139
  • cifs.netBIOSSMB.namePort=137
  • cifs.netBIOSSMB.datagramPort=138

IMAP: alfresco-global.properties
Can be mapped to non-privileged ports, then use firewall rules to forward requests from the standard ports

  • imap.server.port=143

Inbound Email (SMTP): alfresco-global.properties
Can be mapped to non-privileged ports, then use firewall rules to forward requests from the standard ports

  • email.server.port=25

NFS server: alfresco-global.properties
Mount/NFS server ports, 0 will allocate next available port

  • nfs.mountServerPort=0
  • nfs.nfsServerPort=2049

RPC registration port, 0 will allocate next available port
Some portmapper/rpcbind services require a privileged port to be used

  • nfs.rpcRegisterPort=0

To disable NFS and mount server registering with a portmapper set

  • nfs.portMapperPort to -1
  • nfs.portMapperPort=111

Cluster in 4.2 with Hazelcast: alfresco-global.properties

  • alfresco.hazelcast.port=5701

Cluster in 4.1 with JGroups: alfresco-global.properties

  • alfresco.tcp.start_port=7800

Solr:
From Solr to Alfresco workspace queries:  ./alf_data/solr/workspace-SpacesStore/conf/solrcore.properties

  • alfresco.port=8080
  • alfresco.port.ssl=8443

From Solr to Alfresco archive queries:  ./alf_data/solr/archive-SpacesStore/conf/solrcore.properties

  • alfresco.port=8080
  • alfresco.port.ssl=8443

From Alfresco to Solr queries:  alfresco-global.properties

  • solr.port=8080
  • solr.port.ssl=8443

RMI service, JMX ports: alfresco-global.properties

  • alfresco.rmi.services.port=50500
  • avm.rmi.service.port=0
  • avmsync.rmi.service.port=0
  • attribute.rmi.service.port=0
  • authentication.rmi.service.port=0
  • repo.rmi.service.port=0
  • action.rmi.service.port=0
  • deployment.rmi.service.port=0

Monitoring RMI:

  • monitor.rmi.service.port=50508

Alfresco Tip: How to enable SSL in Alfresco SharePoint Protocol

There are two ways to approach getting the Alfresco SharePoint Protocol to run over SSL and avoid having to modify the Windows registry for allow non-ssl connections from MS Office (in both Windows and Mac).

One way is to use the out of the box SSL certificate that Alfresco uses for communications between itself and Solr (this blog post is about this option). The other is to generate a new certificate and configure Alfresco to use it, which is the option if you want to use a custom certificate. Next steps tested on Alfresco 4.2, it should work in 4.2 as well for both Enterprise and Community. Please, let me know through a comment if you have an objection on this.

  • 1. Rename file tomcat/shared/classes/alfresco/extension/vti-custom-context.xml.ssl to tomcat/shared/classes/alfresco/extension/vti-custom-context.xml, if it does not exist just create it like below:

[xml]

<?xml version=’1.0′ encoding=’UTF-8′?>
<!DOCTYPE beans PUBLIC ‘-//SPRING//DTD BEAN//EN’ ‘http://www.springframework.org/dtd/spring-beans.dtd’>

<beans>
<!–
<bean id="vtiServerConnector" class="org.mortbay.jetty.bio.SocketConnector">
<property name="port">
<value>${vti.server.port}</value>
</property>
<property name="headerBufferSize">
<value>32768</value>
</property>
</bean>
–>

<!– Use this Connector instead for SSL communications –>
<!– You will need to set the location of the KeyStore holding your –>
<!– server certificate, along with the KeyStore password –>
<!– You should also update the vti.server.protocol property to https –>
<bean id="vtiServerConnector" class="org.mortbay.jetty.security.SslSocketConnector">
<property name="port">
<value>${vti.server.port}</value>
</property>
<property name="headerBufferSize">
<value>32768</value>
</property>
<property name="maxIdleTime">
<value>30000</value>
</property>
<property name="keystore">
<value>${vti.server.ssl.keystore}</value>
</property>
<property name="keyPassword">
<value>${vti.server.ssl.password}</value>
</property>
<property name="password">
<value>${vti.server.ssl.password}</value>
</property>
<property name="keystoreType">
<value>JCEKS</value>
</property>
</bean>
</beans>

[/xml]

  • 2. Now add the required attributes to alfresco-global.properties:

[bash]

vti.server.port=7070
vti.server.protocol=https
vti.server.ssl.keystore=/opt/alfresco/alf_data/keystore/ssl.keystore
vti.server.ssl.password=kT9X6oe68t
vti.server.url.path.prefix=/alfresco
vti.server.external.host=localhost
vti.server.external.port=7070
vti.server.external.protocol=https
vti.server.external.contextPath=/alfresco

[/bash]

Remember to change localhost to your server full name (i.e. your-server-name.domain.com).

  • 3. Restart the Alfresco application server and try the “Edit online” action on a MS Office document through Alfresco Share. A warning message will appear to accept the Alfresco self-signed certificate but is a common behavior.

Running the Alfresco Solr backup from the command line

SOLR can be backed up by different ways. It uses a scheduled job by default but also can be triggered by the JMX interface in Alfresco Enterprise. Additionally can be done by direct using next URLs. Example for doing a backup of the alfresco solr core and only keep 1 backup:

https://localhost:8443/solr/alfresco/replication?command=backup&location=/opt/alfresco/alf_data/solrBackup/alfresco&numberToKeep=1

For the archive core and only keep 1 backup:

https://localhost:8443/solr/archive/replication?command=backup&location=/opt/alfresco/alf_data/solrBackup/archive&numberToKeep=1

In order to do the backup from the command line, you may use the “curl” command and run it like this (see comment about pem certificate below):

[bash]curl -k –cert /opt/alfresco/alf_data/keystore/browser.pem:alfresco "https://localhost:8443/solr/alfresco/replication?command=backup&location=/opt/alfresco/alf_data/solrBackup/alfresco&numberToKeep=1"
[/bash]

 

[bash]curl -k –cert /opt/alfresco/alf_data/keystore/browser.pem:alfresco "https://localhost:8443/solr/archive/replication?command=backup&location=/opt/alfresco/alf_data/solrBackup/archive&numberToKeep=1"
[/bash]

Please, note that “curl” does not support p12 certificates therefore you need to convert the default browser.p12 to browser.pem by running (password is alfresco):

[bash]
openssl pkcs12 -in /opt/alfresco/alf_data/keystore/browser.p12 -out /opt/alfresco/alf_data/keystore/browser.pem –nodes
[/bash]

This option will be included in next version (0.3) of the Alfresco BART (Backup and Recovery Tool).

Deploying an Alfresco cluster in Amazon AWS in just minutes

I have been playing with Amazon Web Services since few months ago. AWS is for a SysAdmin like Disneyland is for a 8 years old child, I enjoy so much doing this kind of stuff.
If you are not familiar with AWS products/services, let me describe with Amazon words and in my own words what are the most important services and concepts we have been using for deploying an Alfresco on-premise installation in AWS:

  • EC2: virtual servers in the cloud.
  • VPC: isolated cloud resources. Yes, a real isolated cloud architecture and resources.
  • S3: Scalable storage, like a CAS (Content Addressable Storage) for your local or cloud servers.
  • RDS: Managed Relational Database Service (MySQL, Oracle or MS SQL Server).
  • ELB: Elastic Load Balancer, as part of EC2 allows you to create load balancers easily.
  • CloudFormation: Templated AWS resource creation. *This is why I’m writing this article. A CloudFormation template is a json file which creates a wizard and options based in our needs.
  • AWS Region: a location with multiples AZ .
  • AZ: Availability Zone (data centers connected through low-latency links in the same region).

Once said so, my colleague Luis Sala has been working together with the Amazon AWS crew and they have made a CloudFormation template to deploy an Alfresco cluster in just minutes. This template is available here: https://github.com/AlfrescoLabs/alfresco-cloudformation.

This CloudFormation template will create a 2 nodes Alfresco cluster inside a virtual private cloud (VPC), a Load Balancer (ELB) with sticky sessions bases on the Tomcat JSESSIONID, a shared ContentStore based on S3, a shared MySQL DB based on a RDS instance. Each Alfresco node will be in a separate Availability Zone and finally the template includes auto-scaling roles for add extra Alfresco nodes when some thresholds are reached.

We will have something like the diagram below, I say “like this” because we will have only 2 Alfresco nodes in the cluster and the auto-scaling will add more nodes in case of thresholds are reached (clic to see it bigger).

aws-cf-alfresco

Finally in the video below you can see step by step a real CloudFormation deployment, I think the video screencast is self-explanatory, it does not have audio. As you can see, the video is 6 minutes length after cropping some dead times but it was around 15 minutes total.

I thought it is a very interesting approach about Alfresco clustering and it worth it to share with you all. Any question or feedback is welcome, even in spanish or english 😉

Alfresco Backup and Recovery Tool, release v0.1

Project was moved to Github!

Please go to https://github.com/toniblyx/alfresco-backup-and-recovery-tool for downloads, questions, issues, suggestions or feedback. Thanks!

Here you go, first release of the Alfresco Backup and Recovery Tool (Alfresco BART). An Apache 2.0 licensed tool for backup and restore of Alfresco ECM.

DESCRIPTION
Alfresco BART is a tool written in shell script on top of Duplicity for Alfresco backups and restore from a local file system, FTP, SCP or Amazon S3 of all its components: indexes, data base, content store and all deployment and configuration files. It should runs in most Linux distributions, for Windows you may use Cygwin (non tested yet).

Brief description of its features: full and incremental backups, backup policies, backup volume control, encryption with GPG, compression. Also it has a restore wizard with shortcuts for quick restore of some key components (alfresco-global.properties and more).

DISCLAIMER
This is an initial version, it has bugs and needs many improvements, please take care 🙂

FEATURES
Features in this version (v0.1):

  • 5 different modes of work: backup, restore, verify, collection and list
    • backup: runs an incremental backup or a full if first time or configured
    • restore: runs the restore wizard
    • verify: verifies the backup
    • collection: shows all the backup sets in the archive
    • list: lists the files currently backed up in the archive
  • Full and incremental backups.
  • Backup policies:
    • Periodicity: number of days of every full backup, if not backup found it does a full
    • Retention: keep full or incremental copies, clean old backups.
    • Control of number of moths to remove all backups older than or backup retention period.
  • Separated components (backup sets or groups), ability to enable or disable any set (cluster and dedicated search server aware), all backup sets supported are:
    • Indexes (SOLR or Lucene)
    • Data base (MySQL, PostgreSQL and Oracle)
    • Content Store plus deleted, cached and content store selector (optional).
    • Files: all configuration files, deployments, installation files, etc.
  • Restore wizard with support to:
    • restore a full backup (all sets)
    • given backup set
    • restore from a given date or days, month, year ago
    • restore alfresco-global.properties from a point in time
  • Backup volume control:
    • All backups collections are split in a volume size 25MB by default, this can help to store your backup in tapes or in order to upload to a FTP, SCP or S3 server.
  • Backup to different destinations:
    • Local filesystem
    • Remote FTP or FTPS server
    • SCP server (should have shared keys already configured, no authentication with user and password supported)
    • Amazon S3
  • Encryption with GnuPG, all backup volumes are encrypted, this feature is configurable (enable or disable).
  • Compression, all backup volumes are compressed by default
  • Log reporting, Alfresco BART creates a log file each day of operation with in a report of any activity.

DEPENDENCES

  • Duplicity 0.6 (with boto and fabric)
  • Python 
  • GnuPG
  • NcFTP
  • librsync
  • mysqldump for MySQL backup
  • pg_dump for PostgreSQL backup
  • exp for Oracle backup

TODO

  • TEST, TEST and TEST with JBOSS, MySQL, Oracle, S3, FTPs, SCP, etc.
  • Add more input and task controllers (and configuration, first run).
  • Restore single repository file.
  • Snapshots (LVM if exist, AWS if exist).
  • Support for MS SQL Server.
  • Configuration wizard (shell).
  • Share admin panel configuration page as main point to configure more options related to backup (eager, cleaner, index backup, trascan cleaner, etc.).
  • Custom logging control and reporting improvement.

DOWNLOADS and INSTALLATION 

Most recent information about tool and latest code is available in:
http://blyx.com/alfresco-bart

Please report bugs and improvements to: reverse moc.xylb@inot

Alfresco trick: bulk users invitation to a site (external and internal users)

For a personal project I was wondering if I can invite a group of friends to a site without having to get them access to my Alfresco, just wanted to give them access to certain site as consumers.

Here is how I did that, once I generate a list of friends like below (file solo-mails.txt):

[bash]
[email protected]
[email protected]
[email protected]
[/bash]

I run next curl command in JSON format. Remember that $i is the mail address of any friend, use your own admin credentials as user:password, you should change ‘surname’, localhost, and site name ‘mysite’ in the URL. The option -H “Accept-Language: en,en;q=0.8” will send the invitation in english, if you want to sent it in spanish use Accept-Language: es,en;q=0.8.

for i in `cat solo-mails.txt`; do curl -i -u user:password -H “Content-Type: application/json” -H “Accept-Language: en,en;q=0.8” -d “{‘invitationType’:’NOMINATED’,’inviteeUserName’:”,’inviteeRoleName’:’SiteConsumer‘,’inviteeFirstName’:’$i‘,’inviteeLastName’:’surname‘,’inviteeEmail’:’$i‘,’serverPath’:’http://localhost:8080/share/’,’acceptURL’:’page/accept-invite’,’rejectURL’:’page/reject-invite’}” “http://localhost:8080/alfresco/s/api/sites/mysite/invitations“; done
This command will send an invitation with an autogenerated username and a password.

As I mentioned, command above is for external users, but if you want to do same thing for internals use same command but the value ‘inviteeUserName’ has to have the username you want to invite, for example ‘inviteeUserName’:’toni’. Obviously I run this command from my Mac also valid from a Linux with curl.

Thanks to my colleague at Alfresco Rui Fernandes, he pointed me out about where to start.

How to enable Tomcat Manager in an Alfresco installation

In order to address some maintenance tasks in Tomcat, may be useful to get access to the Tomcat Manager (http) interface, things like stop or start an application if you are doing some changes in Alfresco or Share, even a different way to access to its JMX interface using jmxproxy if you are working remotely.

This is a easy step by step guide about how you can enable the Tomcat Manager that comes with an Alfresco default (bundle) installation. Tested with Alfresco Enterprise 4.1.4, but should work with any other Alfresco 4 version.

  • Edit tomcat/conf/tomcat-users.xml and adapt it like below:

[xml]
<tomcat-users>

<role rolename="manager-gui"/>

<role rolename="manager-status"/>

<role rolename="manager-jmx"/>

<role rolename="manager-script"/>

<user username="CN=Alfresco Repository Client, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB" roles="repoclient" password="null"/>

<user username="CN=Alfresco Repository, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB" roles="repository" password="null"/>

<user username="manager" roles="manager,manager-gui,manager-status" password="manager"/>

<user username="manager2" roles="manager-jmx,manager-script" password="manager"/>

</tomcat-users>
[/xml]

  • Then edit tomcat/conf/Catalina/localhost/manager.xml and change like this:

[xml]
<Context antiResourceLocking="false" privileged="true" useHttpOnly="true" override="true">

<Valve className="org.apache.catalina.authenticator.BasicAuthenticator" securePagesWithPragma="false" />

</Context>
[/xml]

  • Restart your Tomcat and thats all.

Once Alfresco is up agan, lets try to access to the manager with user “manager” and password “manager”, please avoid using this credentials in production environments.

To access html interface:

http://localhost:8080/manager/html

Screen Shot 2013-05-30 at 12.38.33 PM

To list all applications:

http://localhost:8080/manager/list

To list server information:

http://localhost:8080/manager/serverinfo

To see default session info (use / or /context):

http://localhost:8080/manager/sessions?path=/

To start, stop, and undeploy alfresco or share

http://localhost:8080/manager/start?path=/alfresco

http://localhost:8080/manager/stop?path=/alfresco

http://localhost:8080/manager/undeploy?path=/alfresco

http://localhost:8080/manager/start?path=/share

http://localhost:8080/manager/stop?path=/share

http://localhost:8080/manager/undeploy?path=/share

To see all MBeans (jmxproxy):

http://localhost:8080/manager/jmxproxy.

Screen Shot 2013-05-30 at 12.39.34 PM

Sources: http://forums.alfresco.com/forum/developer-discussions/other-apis/unable-access-tomcat-manager-03292012-1345

and http://www.ixxus.com/blog/2011/02/monitor-and-manage-alfresco-jmx