Key Data Science

RSS
Dec
15

Pentaho Kettle issues with FTPS

Pentaho is a swiss knife when it comes to data transfer. I often use it to collect data from various remote locations using different protocols like HTTP, FTP, FTPS or SFTP.

Every Friday I wear my security hat and work on at least one task improve the security at my place. This time I noticed that some potentially sensitive data was downloaded over the Internet using FTP. I probed the server and discovered that it also support FTPS. Switching to a more secure protocol which seemed like a quick win.

On a side note, FTPS is often confused with SFTP and vice-versa even though these protocols share nothing in common except their ability to securely transfer files. SFTP is based on the SSH protocol which is best-known for providing secure access to shell accounts on remote Unix servers.

I changed the job to use FTPS and run Pentaho. The job ran very quickly and didn’t produce any errors. Success!!?

Not quite. Upon further inspection, I noticed that there was absolutely no data processed. It seemed like the downloaded file was empty or the file wasn’t transferred at all.

I manually connected over FTPS using lsftp. The files were there and definitely not empty. Unfortunately, Pentaho doesn’t give you any options to increase the logging verbosity for the FTP client. Luckily the FTP server was local, and I was able to set it to the debugging mode by enabling using the below setting in vsftpd.conf:

xferlog_enable=YES
xferlog_std_format=NO
log_ftp_protocol=YES

Note: log_ftp_protocol — When enabled in conjunction with xferlog_enable and with xferlog_std_format set to NO, all FTP commands and responses are logged. This directive is useful for debugging.

The debug code showed an additional error which points to the require_ssl_reuse option:
Sun Mar 1 11:09:02 2015 [pid 12118] [super] FTP command: Client "195.224.x.x", "LIST /data"
Sun Mar 1 11:09:02 2015 [pid 12118] [super] FTP response: Client "195.224.x.x", "150 Here comes the directory listing."
Sun Mar 1 11:09:02 2015 [pid 12117] [super] DEBUG: Client "195.224.x.x", "No SSL session reuse on data channel."
Sun Mar 1 11:09:02 2015 [pid 12118] [super] FTP response: Client "195.224.x.x", "522 SSL connection failed; session reuse required: see require_ssl_reuse option in vsftpd.conf man page"

Note: require_ssl_reuse
If set to yes, all SSL data connections are required to exhibit SSL session reuse (which proves that they know the same master secret as the control channel). Although this is a secure default, it may break many FTP clients, so you may want to disable it.
Default: YES

After a bit more time spent on researching the issue it turned out that Apache Commons FTPS library used by Pentaho does not support the SSL session reuse behaviour; in fact, there’s an open Apache NET Jira ticket to fix this issue.

Without this option, if an attacker connects and establishes the SSL data connection before the legitimate user, they get to either steal the download or supply the upload data. The likelihood of successful exploitation especially in the internal environment is low, so I decided to disable for now.

require_ssl_reuse=NO

With the setting disabled a quick log inspection confirmed a successful download using Pentaho:

Sun Mar 1 11:12:27 2015 [pid 12259] [super] FTP command: Client "195.224.x.x", "RETR /seetickets/to_commission_monthlyreport_201701.csv"
Sun Mar 1 11:12:27 2015 [pid 12259] [super] FTP response: Client "195.224.x.x", "150 Opening BINARY mode data connection for /data/monthlyreport_201512.csv (761332 bytes)."
Sun Mar 1 11:12:27 2015 [pid 12259] [super] OK DOWNLOAD: Client "195.224.x.x", "/data/monthlyreport_201512.csv", 761332 bytes, 840.62Kbyte/sec
Sun Mar 1 11:12:27 2015 [pid 12259] [super] FTP response: Client "195.224.x.x", "226 Transfer complete."

I hope the issue will be fixed by Apache NET soon and I can revert the setting to the default and more secure one.

Pentaho Comments Off on Pentaho Kettle issues with FTPS