|  |

| |

Enabling better global research outcomes in soil, plant & environmental monitoring.

Using R to import and analyse SFM1 Sap Flow Meter data

Widely used by the research community, R is a common tool used in many scientific research publications. Recognising this, ICT International has produced a series of R code examples that can be used by researchers to manage the import and organisation of their SFM1 Sap Flow data in CSV format.

Whilst researchers are often familiar with importing CSV data into R, there are some peculiarities that need to be addressed. In time, these code examples will be expanded to include the option to import JSON files.

In turn, the options for accessing and setting the data up for analysis will be explained in the tabs.

  • CATM1 tab is for access via the ICT International CATM1 server
  • SD Card Access is for downloaded CSV files stored on your own computer (or at a file location that R can access).
  • Finally, the Visualisation tab provides some examples of visualisation options within R.

To initially set up R, it is as always necessary to download and install the packages. Using common packages that many users are familiar with the following packages are used here:

Packages Used

To set up the import, there are several packages used in the import, handling and preliminary analysis. The following code can be used to undertake the installation of packages used:

#General Import
#readr for importing CSV files
install.packages("readr")
library(readr)
#dplyr: additional functionality
install.packages("dplyr")
library(dplyr)
#Lubridate to manage the date/time formatting
install.packages("lubridate")
library(lubridate)

For data visualisation:

#ggplot2: install and load ggplot2 for graphing and analytics (or package of your choice)
install.packages("ggplot2")
library(ggplot2)
#Plotly: 3d time series diagrams
install.packages("plotly")
library(plotly)
#magrittr required by plotly and other lines within the script where the pipe function is used
#install.packages("magrittr")
library(magrittr)

Optional Geolocation:

#Geolocation: Using leaflet to produce the maps
install.packages("leaflet")
library(leaflet)

R Logo used in accordance with CC BY-SA 4.0 Licence

With the SFM1x Sap Flow Meter, devices using CAT-M1 (SFM1C) can allow access data direct from a cloud server where the file is stored as a CSV.

Furthermore, the header file is also stored in the cloud and can be added from within the R script.

Once the SFM1x has been identified, and R given the instruction of where to find the file, the entire file is downloaded. The time taken will vary depending on size and connection; it is recommended to set a locally stored dataframe in R for analysis rather than re-downloading the entire file every time the analysis script is run.

Setting up

Specify the location of the Header File. This can be offline (local) or from the ICT International Server. When importing the list of field names from CSV file, ensure all names are in a single row – and make ensure there are no spaces (use _) and remove degree symbol.

Locally stored CSV header:

If using a locally stored header file, use this section:

##note: Importing the names for the column headings from the csv file requires the csv to be saved as a plain (not UTF or Mac/Windows format) csv.
SFM_Header_csv <- read.csv("SFM1x_Col_Headers.csv", header=FALSE)
##If the header file has not been tidied the following is required
#Remove degree symbol
SFM_csv <- data.frame(lapply(SFM_Header_csv, function(x) {
gsub("\xb0C", "C", x)
}))
#Remove brackets
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
gsub("\\*|\\(|\\)", "", x)
}))
#Remove /hr
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
gsub("/hr", "phr", x)
}))
#Remove extra spaces
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
gsub(" ", " ", x)
}))
#Remove excess whitespace
SFM_csv <- SFM_csv %>%
mutate_if(is.character, trimws)
#Replace spaces with _
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
gsub(" ", "_", x)
}))
#Name columns
colnames(SFM_csv) <- SFM_csv[1,]
SFM_csv <- SFM_csv[-c(1),]

Cloud stored header file:

If retrieving data from the ICT International CATM1 server use this section

#Device Names/URLs: Edit the URL here and name your instrument - if you change the name here, you will need to edit all the subsequent appearances of that name in "ICT_International_CATM1.R". If you have additional instruments you will need to add them as well.
#Device 1: the device serial number replaces "serial_number" in the link below:
SFM_01 <- "http://ictcatm1.com/serial_number.csv"
#If you have the coordinates for the sensors, then define them below in the following format (the coordinates below are for ICT International)
SFM_01_Latitude <- (-30.516904) #North/south (south prefixed by -n.n)
SFM_01_Longitude <- (151.651451) #East/west (west prefixed by -n.n)
###Importing data from the Sap Flow Meter; data is imported as a CSV direct from the server
#Read from the URL and import as data
#!! Note that the sensor name has been changed from SFM_01 to SFM_001 - this is to allow the script to be re-run from this point, rather than from the Master_Script.!
SFM_001i <- read.csv(SFM_01, header= FALSE)
SFM_001 <- SFM_001i
#Append the header file from the header_csv file:
colnames(SFM_001) <- SFM_Header_csv

In the case of this test data, it is from an ICT International development instrument so contains some errors

#Using test data, some cleaning is required; record 25351 is erroneous, and needs the time stamp correcting
SFM_001 <- SFM_001[-c(25351),]

To avoid excessive calls on the data, it is good to save a local copy within your R Workspace:

#Calling the data in from the code above is fresh each time, the static metadata needs to be added
#Add these to the dataframes above (needs to be done each time the data is called)
SFM_001 <- cbind(SFM_001,SFM_01_Latitude)
SFM_001 <- cbind(SFM_001,SFM_01_Longitude)
#rename the columns for use later - as these are bound to the end of the dataframe, they should be as follows:
names(SFM_001)[16] = "Latitude"
names(SFM_001)[17] = "Longitude"

Device time settings are set at UTC, but it is necessary to check this before deployment. Furthermore, the time structure needs to be organised so that it is consistent and set to being a recognised time format rather than a character string

#Code to clean ISO Data stamps (this will allow the use of the data with Plottly and Timeseries data tools:
#Date/Time is set to UTC/GMT
SFM_001$Date_Time <- ymd_hms(SFM_001$Date_Time)
#Confirm TimeZone:
SFM_001[1,1]
#Change Timezone to local time zone
##use Sys.timezone(location = TRUE) to identify local timezone, and then use this subsequently
#Beware of setting this if analysing data overseas to where the data was collected - it may need manually a configured timezone.
local_tz <- Sys.timezone(location = TRUE)
local_tz
SFM_001$Date_Time <- strftime(SFM_001$Date_Time, tz = local_tz)
#Confirm TimeZone Change:
SFM_001[1,1]

If you are working remotely/overseas to the sensors, then it is possible to configure the data to be working in the relevant local timezone:

#Use to check for Timezone name (tested with Europe/Paris - GMT+2hrs in first record)
ListTimeZone <- as.data.frame(OlsonNames())
View(ListTimeZone)
#Example - use to change your timezone if necessary
SFM_001$Date_Time <- with_tz(SFM_001$Date_Time,"Europe/Paris")
#Confirm TimeZone Change:
SFM_001[1,1]

If you want to exclude data that has been recorded before installation, then it is possible to set an installation date:

#Set the installation date for your sensors, removing Test data - use date for first recorded entry if necessary (in this case it's not)
#SFM_001 <- subset(SFM_001, SFM_001$Date_Time > SFM01_Installation_Date)

For those who use a , as a decimal separator, this can be set with the following

#Optional: European format:
#SFM_001 <- format(SFM_001, decimal.mark=",")
View(SFM_001)

The CSV files downloaded from the SD card on both the SFM1 Sap Flow Meter and the SFM1x are configured to work with ICT International’s Combined Instrument Software and Sap Flow Tool. With this compatibility, they contain header data that is redundant in R.

This header uses the first 16 rows of the CSV file, and therefore needs to be cleaned before use – this step can be included in the import process.

Importing the CSV and attach header file

#Tidy up column names before naming columns
#Tidy of the row for column headers
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
  gsub("\xb0C", "C", x)
}))
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
  gsub("\\*|\\(|\\)", "", x)
}))
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
  gsub("/hr", "phr", x)
}))
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
  gsub("  ", " ", x)
}))
SFM_csv <- SFM_csv %>%
  mutate_if(is.character, trimws)
SFM_csv <- data.frame(lapply(SFM_csv, function(x) {
  gsub(" ", "_", x)
}))
#Name columns
colnames(SFM_csv) <- SFM_csv[1,]
SFM_csv <- SFM_csv[-c(1),]

With this new header file, it is possible to attach the new header to the downloaded file:

#Add these to the dataframes above (needs to be done each time the data is called)
SFM_csv <- cbind(SFM_csv,SFM_CSV_Latitude)
SFM_csv <- cbind(SFM_csv,SFM_CSV_Longitude)
#rename the columns for use later - as these are bound to the end of the dataframe, they should be as follows:
names(SFM_csv)[28] = "Latitude"
names(SFM_csv)[29] = "Longitude"

Time and Date

As with the CATM1 Data, the timezones and timestamps need to be organised into the correct format.

Firstly, join date and time

#Join Date and Time
#Date/Time is set to UTC/GMT
SFM_csv$Date_Time <- with(SFM_csv, dmy(Date) + hms(Time)

The following code is to clean ISO Data stamps (this will allow the use of the data with Plottly and Timeseries data tools):

#Confirm Timezone:
SFM_csv[1,30]
#Change Timezone to local time zone
##use Sys.timezone(location = TRUE) to identify local timezone, and then use this subsequently
#Beware of setting this if analysing data overseas to where the data was collected - it may need manually a configured timezone.
local_tz <- Sys.timezone(location = TRUE)
local_tz
SFM_csv$Date_Time <- strftime(SFM_csv$Date_Time, tz = local_tz)
#Confirm Timezone Change:
SFM_csv[1,30]

Importantly, if working remotely/overseas use this code to check the Timezone:

#Use to check for Timezone name (tested with Europe/Paris - GMT+2hrs in first record)
ListTimeZone <- as.data.frame(OlsonNames())
View(ListTimeZone)
SFM_csv$Date_Time <- with_tz(SFM_csv$Date_Time,"Europe/Paris")
#Confirm Timezone Change (last column as this is the one that has had the time and data combined field)
SFM_csv[1,30]

If you have data before the installation date, but haven’t removed it, then this code be used to set the cutoff date:

#Set the installation date for your sensors, removing Test data - use date for first recorded entry if necessary (in this case it's not)
#SFM_csv <- subset(SFM_csv, SFM_csv$Date_Time > SFMcsv_Installation_Date)

Furthermore, if you are using , as the decimal seperator, then the following code can be used:

#Optional: European format:
#SFM_csv <- format(SFM_csv, decimal.mark=",")

From here, the visualisation code is common to both applications.

Irrespective of the method used to access data, the visualisation follows the same options for either option

Data Visualisation

With common data visualisation tools, it is possible to present the data from the Sap Flow Meter for analysis.

CATM1 Data

#ggplot for a point and line graph
Uncorrected_in_SFM001 <- ggplot(SFM_001, aes(x=Date_Time, y=Uncorrected_In_cmphr)) + geom_line() + geom_point()
Uncorrected_in_SFM001
jpeg('Uncorrect_in_SFM001.png', width=1800, height=800, units = "px")
Uncorrected_in_SFM001
dev.off()

Using Plotly, it is possible to present a 3D graph as well:

#plotly for a 3D graph
plot_ly(data = SFM_001, x=SFM_001$Date_Time, y=SFM_001$Uncorrected_In_cmphr, z=SFM_001$Uncorrected_Out_cmphr, intensity = ~Uncorrected_In_cmphr, colorscale = list(c(0,'red'), c(0.33,'orange'), c(0.66, 'yellow'), c(1, 'green')), type="mesh3d" )

Locally Stored Data (CSV from SD Card or BlueTooth)

#ggplot
Uncorrected_in_SFMcsv <- ggplot(SFM_csv, aes(x=Date_Time, y=Uncorrected_In_cmphr)) + geom_line() + geom_point()
Uncorrected_in_SFMcsv
jpeg('Uncorrect_in_SFMcsv.png',  width=1800, height=800, units = "px")
Uncorrected_in_SFMcsv
dev.off()
SFM_csv$Uncorrected

For 3D visualisation, the following code is used:

#plotly
plot_ly(data = SFM_csv,
             x=SFM_csv$Date_Time,
             y=SFM_csv$Uncorrected_In_cmphr,
             z=SFM_csv$Uncorrected_Out_cmphr,
             intensity = ~Uncorrected_In_cmphr,
             colorscale = list(c(0,'red'),
                               c(0.33,'orange'),
                               c(0.66, 'yellow'),
                               c(1, 'green')),
             type="mesh3d" )

Geographical Visualisation

With the coordinates added to the data in the previous stages, it is possible to add maps to display sensor location:

Separate Maps

For separate maps, each map can be built individually

#CATM1 Map
SFM_001_map <- leaflet()
SFM_001_map <- addTiles(SFM_001_map)
SFM_001_map <- addMarkers(SFM_001_map,SFM_01_Longitude, SFM_01_Latitude, popup="SFM_010 Site")
SFM_001_map
#CSV Map
SFM_csv_map <- leaflet()
SFM_csv_map <- addTiles(SFM_csv_map)
SFM_csv_map <- addMarkers(SFM_csv_map,SFM_CSV_Longitude, SFM_CSV_Latitude, popup="SFM_CSV Site")
SFM_csv_map
Combined Maps

If you want to combine the sensor locations on a single map, then the following code can be used:

SFM_map <- leaflet()
SFM_map <- addTiles(SFM_map)
SFM_map <- addMarkers(SFM_map,SFM_01_Longitude, SFM_01_Latitude, popup="SFM_010 Site")
SFM_map <- addMarkers(SFM_map,SFM_CSV_Longitude, SFM_CSV_Latitude, popup="SFM_CSV Site")
SFM_map