Server Configuration Attributes


Introduction

Server configuration attributes are intended to be used by developers and administrators for many purposes, such as configuration, monitoring, etc. These attributes are mere name-value pairs. Some of the pairs actually impact the performance of the network; e.g., hashSpanFix: AllowBalance value of false means that the particular server's data directories will not auto balance.

Each server has its own set of name-value pairs (or attributes as I will frequently call them throughout this document). Adding, modifying or deleting any attributes will only effect the particular server, if the change impacts the server at all.

Note that some attributes can be editted (see the section on permissions). To save the attributes, though, the user must have permission to set configuration to the specific server, which is enforced by our public/private key security model. This is handled automatically when a user signs in. Any attempt to save attributes by a user with insufficient permissions will simply fail.

Any user, however, can easily start a server or their own network, and they will automatically be able to set the configuration for their server(s). For more information, see Setting up a Server.

Skip ahead to Directions & Screenshot to see how this server attributes are accessed and modified.

Name-value pair

Our configuration allows administrators to add, edit and remove simple name value pairs (with some exceptions: see the section on permissions). These can be arbitrary, as I could easily add my name to the configuration, assuming I have write permissions for a server's configuration:

"Tagged By" » "Bryan"

This wouldn't serve much of a purpose, but it would appear in the Configuration anytime I or anyone else opens it.

However, say I added the following:

"Tranche: Server Name" » "Tranche 209"

That value is used by various separate interfaces to show a name for the server, such as our Google Maps page. Without this value, we would not be able to provide this value, and would have to use a less friendly Tranche url, which is tranche://141.214.182.209:443.

Consider the following:

"hashSpanFix: AllowRun" » "false"

This value actually impacts server-side performance! This effectively disables the server-side service for that particular server that grabs chunks from other servers that it thinks it should have, as well as repairing broken chunks and performing other important server-side functionality.

Permissions

Any user with write permissions to the network (i.e., anyone who can log in) can open up a server configuration, though only certain users (our admins) can save any changes to the configuration. Any user can edit the configuration for any server on their computer (see the screenshot — it is a very simple editor), but unless they have permissions to save the configuration, any attempt to save will fail.

If you start a server of your own, you will be able to save changes to the configuration. This is because the user who starts a server is automatically an admin. (An admin can also add other admins. See Setting up a Server.)

However, there is another layer of permissions, as individual attributes can have the following server attributes. (Even an admin cannot bypass these permissions, though a developer can modify permission rules between releases.):

Again, for users who cannot save changes to the configuration, the most that can be done is simply read; these permissions apply primarily to admin. However, hidden applies to both admin and non-admin.

Directions & Screenshot

Screenshot of server's configuration attributes.

To view the attributes of a local server (one on your computer):

  1. Click on Preferences > Server Configuration
  2. If you haven't started a server yet, pick a port (1500 is a good choice for almost any machine) and click Start the server. It will take a few seconds to start up.
  3. Click on the Attributes tab.

To view the attributes of a remote server (one on another computer — this is how we generally administrate servers):

  1. Click on Preferences > Server Configuration
  2. Click on the External Server tab.
  3. Type in the Tranche URL (e.g., tranche://141.214.182.209:443), the click Load Configuration.
  4. Click on the Attributes tab.

You can right-click (or control-click) on attributes to edit or remove them, and you can click on Attributes Menu > Add Attribute (scroll to top of attributes tab) to add a new attribute.

Useful attributes

As we illustrated earlier, attributes play different roles. E.g.,

The following explain the default values. Note any can be changed if the user has the proper admin privileges. (If you start your own server, you have the proper admin privileges.)

Name Default value Description
actualBytesUsed: /path/to/directory Varies Displays the number of bytes currently residing in a data directory used by this server. (Each server has one or more data directories, which can be on one or more disks or partitions.) If this server is using more than one directory, there will be multiple attributes of this kind.

Example value: 107374182400 if a data directory has exactly 100GB.
currentlyConnectedUsers Varies Displays the number of internet connections the server is currently servicing. These represent concurrent users (either actual end-users or services such as backup and monitoring tools).
dataBlockUtil: KnownDataFileCount Varies The number of data chunks that are loaded into the server. These are the Tranche file system's way to save and retrieve data, and do not necessarily match the number of files that were uploaded or can be downloaded.
dataBlocUtil: KnownMetaDataFileCount Varies The number of meta data chunks that are loaded into the server. These are the Tranche file system's way to restore the original files from data chunks, as well as assemble the project's files back into a project.
dataBlockUtil: MergeQueueSize Varies The current size of the queue (line) of files that need to be merged back into the Tranche file system. This will generally be zero unless there are current or recent uploads, and the Tranche file system is cleaning up. This will only be non-zero when the file system is doing something called splitting (similar to how self-balancing trees split data when they are unbalanced).
flatFileTrancheServer: KnownProjectCount Varies The number of meta data chunks on the server that represent a project. Note that this does not reflect the number of total projects that can be downloaded from that server alone, which varies along many dimensions. This is mostly intended for internal use, but can be used to make an imperfect, rule-of-thumb observation about the percentage of the network on the server.
freeMemory Varies The number of free bytes of memory available to the server at any given time. This changes rapidly in real-time on the server. This can be useful for tracking down memory issues to alleviate swap or find conditions that cause the server to run out of memory.
hashSpanFix: AllowDelete true Setting to false disables the delete feature in the server's healing thread. A server will only delete a chunk if the server has a hash span and the chunk is not in that hash span but has enough copies on the overall network to remain available after a delete.
hashSpanFix: AllowRun true Setting to false disables the healing thread completely, meaning that the server will not repair its chunks nor look for chunks to download or delete. This is a safety switch to allow an admin to stop behavior that she thinks might cause problems on the network. This also allows the admin to stop the healing thread if they think it requires too many resources.
hashSpanFix: DataCopied Varies The number of data chunks this server has downloaded based on its hash span(s). A server without a hash span will not copy data, since the hash span is intended to mirror a slice of the network.
hashSpanFix: DataDeleted Varies The number of data chunks this server has deleted based on its hash span(s). Data is deleted from a server if there are enough copies on the network and the data does not fit in one of the server 0+ hash spans belonging to the server. (A server without a hash span will eventually delete everything if their are sufficient copies elsewhere.)
hashSpanFix: DataLocalChunkThrewException Varies The number of exceptions (in this case, errors) thrown when loading the data chunks. Unless there are I/O errors, this should remain zero. In the case of a non-zero value, there may be hardware (or software) issues that need to be addressed. This is used to monitor the servers to maintain data integrity and availability.
hashSpanFix: DataNotRepaired Varies This is a complex value, and a non-zero value is not necessarily bad (though zero is generally good). Every time the healing thread finds a bad chunk, it tries one server at a time to find a copy of the chunk. If it finds a good copy, it replaces the chunk. If it finds another bad copy, it increments this value by one to signify that another chunk was not replaced. However, this is normal if multiple bad chunks exist. As long as one good chunk exists, it will eventually be replaced. (We specifically test this with unit tests and stress tests.)
hashSpanFix: DataRepaired Varies A zero or non-zero value is not necessarily bad. If a bad chunk is replaced by a good copy of the chunk from another server in the server's healing thread, this value is incremented by one. In conditions where we know there are a lot of bad chunks on the network, this helps us know which servers are repairing themselves (along with hashSpanFix: DataNotRepaired).
hashSpanFix: DataSkipped Varies The number of chunks the server finds it already has when trying to mirror a slice of the network specified by its hash span. While the server does not have a hash span (which could be indefinite), this number will not change.
hashSpanFix: GlobalIterationCount Varies The total number of times that the healing thread has performed all of its operations once, including adding new data and meta data, deleting irrelevant data and meta data, and repairing any broken data and meta data. Depending on the amount of data and meta data on a server, a single iteration may take a long while, from hours to days, or even weeks or months.
hashSpanFix: GlobalLocalRepairCompleteIterations Varies In the case that the server does not have a hash span, this is the total number of times that the healing thread has checked all of its data and meta data for bad chunks and replaced any with any good copies it might find. This process continues as long as the server is running since bad chunks can appear (albeit rarely) at any time under any circumstances that involve I/O errors.
hashSpanFix: MetaDataCopied Varies The number of meta data chunks this server has downloaded based on its hash span(s). A server without a hash span will not copy any data (meta data or data), since the hash span is intended to mirror a slice of the network.
hashSpanFix: MetaDataDeleted Varies The number of meta data chunks this server has deleted based on its hash span(s). Meta data is deleted from a server if there are enough copies on the network and the data does not fit in one of the server 0+ hash spans belonging to the server. (A server without a hash span will eventually delete everything if their are sufficient copies elsewhere.)
hashSpanFix: MetaDataLocalChunkThrewException Varies The number of exceptions (in this case, errors) thrown when loading the meta data chunks. Unless there are I/O errors, this should remain zero. In the case of a non-zero value, there may be hardware (or software) issues that need to be addressed. This is used to monitor the servers to maintain data integrity and availability.
hashSpanFix: MetaDataNotRepaired Varies This is a complex value, and a non-zero value is not necessarily bad. Every time the healing thread finds a bad chunk, it tries one server at a time to find a copy of the chunk. If it finds a good copy, it replaces the chunk. If it finds another bad copy, it increments this value by one to signify that another chunk was not replaced. However, this is normal if multiple bad chunks exist. As long as one good chunk exists, it will eventually be replaced. (We specifically test this with unit tests and stress tests.)
hashSpanFix: MetaDataRepaired Varies A zero or non-zero value is not necessarily bad. If a bad chunk is replaced by a good copy of the chunk from another server in the server's healing thread, this value is incremented by one. In conditions where we know there are a lot of bad chunks on the network, this helps us know which servers are repairing themselves (along with hashSpanFix: DataNotRepaired).
hashSpanFix: MetaDataSkipped Varies The number of chunks the server finds it already has when trying to mirror a slice of the network specified by its hash span. While the server does not have a hash span (which could be indefinite), this number will not change.
hashSpanFix: NumRepsInHashSpanRequiredForDelete 1 If a server finds a chunk (data or meta data) that does not belong (either doesn't fit in one of its hash spans or it doesn't have a hash span), it checks various conditions to determine whether the chunk should be deleted. This particular value is the number of required copies of the chunk in other servers that have a hash span that includes that chunk. (I.e., with the default value of one, at least one other server must have a hash span that holds that chunk AND that server must already have a copy of the chunk). Increasing this value results in more stringent criteria before a deletion will occur, and lessening it (not recommended) results in less stringent criteria for a deletion.
hashSpanFix: NumTotalRepsRequiredForDelete 3 If a server finds a chunk (data or meta data) that does not belong (either doesn't fit in one of its hash spans or it doesn't have a hash span), it checks various conditions to determine whether the chunk should be deleted. This particular value is the number of other copies of this chunk must exist before the chunk is deleted, regardless of hash spans. (I.e., with the default value of three, at least three other servers must currently have a copy of the chunk before considering a deletion.) Increasing this value results in more stringent criteria before a deletion will occur, and lessening it (not recommended) results in less stringent criteria for a deletion.
serverURL Varies The Tranche URL of the server.

Example value: tranche://141.214.182.209:443, which is one of the UM Biological Chemistry servers from the Phillip Andrews Lab.
totalMemory Varies The total amount of primary memory (i.e., RAM) available to the server. If a server runs out of memory, an OutOfMemoryError will occur, and the server might crash.
hashSpanFix: GlobalServerBeingChecked Varies The remote server with which the healing thread for the local server is currently communicating for downloading data or meta data to the local server, or for some other activity. (In essence, the remote server from which the particular server is currently pulling data.) This will be Unknown if not communicating with another server, which happens under a variety conditions.
coreServer: mirrorEveryUpload Varies If set to true, the server will mirror every upload to the Tranche core Proteomics network. (Actually, every upload is sent to the server by the client's interpretation of this value. If a client cannot contact the server, it will not receive a copy of the chunk until (acutally, unless) the healing thread downloads a copy. For this reason, it is not an actual mirror.

hashSpanFix: GlobalLocalDeleteCompleteIterations Varies The number of complete iterations that the healing thread has performed checking the server's chunks for data and meta data to delete, based on the configuration settings. All servers try to download data in their hash span(s) as well as delete chunks not in their hash spans (under ideal conditions and contingent on the various server configuration attributes). A server without any hash spans ideally deletes all of its chunks when they have sufficient copies.
hashSpanFix: BatchSizeForDownloadChunks 50 The size of a batch of chunks to check for potential new chunks in a servers hash span. One of four values that compete for healing thread's time: downloading, deleting, balancing and healing.
hashSpanFix: BatchSizeForDeleteChunks 50 The size of a batch of chunks to check for potential chunks to delete from a server if not in the server's hash span and if there are sufficient copies. One of four values that compete for healing thread's time: downloading, deleting, balancing and healing.
hashSpanFix: BatchSizeForHealChunks 50 The size of a batch of chunks to check for chunks to repair (e.g., the chunks were corrupted). One of four values that compete for healing thread's time: downloading, deleting, balancing and healing.
hashSpanFix: BatchSizeForBalanceDirectories 50 The number of attempts to balance the data directories. This plays the important role of maintaining a relatively proportional use of a server's various data directories. (E.g., if a new data directory is added, it will be empty, even if another directory is full. This will slowly but steadily move data to less full directories.) One of four values that compete for healing thread's time: downloading, deleting, balancing and healing.
hashSpanFix: CurrentActivity Varies The current activity handled by the healing thread. This should generally involve deletion, downloading, healing or repairing, though it might mention the fact that it is preparing for startup or taking care of housekeeping duties.
hashSpanFix: TimeSpentDeleting Varies Time (actual time and percentage) spent by healing thread looking for chunks to delete.
hashSpanFix: TimeSpentDoingNothing Varies Time (actual time and percentage) spent by healing thread doing ancilliary duties, such as waiting for startup.
hashSpanFix: TimeSpentDownloading Varies Time (actual time and percentage) spent by healing thread downloading chunks that belong on this server.
hashSpanFix: TimeSpentHealing Varies Time (actual time and percentage) spent by healing thread looking for corrupted chunks to heal.
hashSpanFix: TimeSpentHealing Varies Time (actual time and percentage) spent by healing thread balancing the server's data directories.
hashSpanFix: PauseInMillisAfterEachOperation 100 If want to free up resources by slowing down the healing thread, can set a value (in milliseconds) to pause between operations. (This is a candidate option that might be auto adjusted after develop a server performance monitor daemon.)
remoteTrancheServer: ServerSideCallbacks Varies Human-readable message about the number of callbacks, as well as two servers causing majority of callbacks. Helpful for determining troublesome servers on network. (These not only slow down server-side operations, such as the healing thread, but can slow down client-side operations, such as uploading and downloading projects.)
flatFileTrancheServer: TotalSize Varies The total human-readable size available to the server as a sum of all the limits in the DataDirectoryConfiguration objects. (I.e., sum of all available space in configured data directories.) If overflow is detected, will be noted.
flatFileTrancheServer: TotalSizeUsed Varies The total human-readable size used by the server as a sum of all the used space in the DataDirectoryConfiguration objects. (I.e., sum of all used space in configured data directories.) If overflow is detected, will be noted.
flatFileTrancheServer: TotalSizeAvailable Varies Human-readable space remaining in server when subtract value of flatFileTrancheServer: TotalSizeUsed from flatFileTrancheServer: TotalSize. If overflow is detected, will be noted.
dataBlockUtil: MergedDataBlocksTotal Varies The number of data blocks that have been merged. This is the process by which the B-tree splits a leaf to become a node, and is mostly useful for debugging purposes. Note this may not be the sum of dataBlockUtil: MergedDataBlocksSuccesses and dataBlockUtil: MergedDataBlocksFailures if a failure occurs during a specific portion of the code.
dataBlockUtil: MergedDataBlocksSuccesses Varies The total number of successfully merged data blocks. See dataBlockUtil: MergedDataBlocksTotal for more information.
dataBlockUtil: MergedDataBlocksFailures Varies The total number of failed data blocks merges. This could happen if the data block were corrupted, there are permission problems, or any other IO problems. Tranche servers are programmed to deal with this inevitability, and a small number of these should not cause alarm. However, if there are a large number (say, over 50), you should contact the development team for an investigation. See dataBlockUtil: MergedDataBlocksTotal for more information.
corruptedDataBlock: CorruptedDataBlockCount Varies The total number of corrupted data blocks detected on the server. Corrupted data blocks may happen under normal circumstances, such as a "hard shutdown" (whenever process is killed), but may also occur when I/O errors (which might be sign of failing hardware).
corruptedDataBlock: SalvagedChunksFromCorruptedBlockCount Varies The total number of chunks recovered from all corrupted data blocks by salvaging intact data. If something is not salvaged, it might be downloaded.
corruptedDataBlock: DownloadedChunksFromCorruptedBlockCount Varies The total number of chunks recovered from all corrupted data blocks by downloading the missing data from other servers. Anything not salvaged or downloaded is considered lost. However, salvaging and downloading should hopefully prevent lost data. (However, if information is lost from the header of a data block, it cannot be retried. The healing thread should make up for this sort of loss over time.)
corruptedDataBlock: LostChunksFromCorruptedDataBlockCount Varies If chunks from corrupted data blocks are not salvaged or downloaded, they are considered lost. There should not be many (if any) lost chunks, as salvaging and downloading combined should reconstitute the majority of lost data. The healing thread should make up for any other type of loss over time.
corruptedDataBlock: CorruptedDataBlockHeaderCount Varies All corrupted data blocks must either be corrupted in either the header or body. This is the total count of data blocks corrupted in the header. This variety is more troublesome because there is no way to determine what was lost. The healing thread, however, should make up for this sort of loss over time.
corruptedDataBlock: CorruptedDataBlockBodyCount Varies All corrupted data blocks must either be corrupted in either the header or body. This is the total count of data blocks corrupted in the body.
corruptedDataBlock: AllowedToFixCorruptedDataBlock true Whether the server is allowed to fix corrupted data blocks. Because this feature deletes data block files, the switch to turn it off (or on again) is available for troubleshooting.
coreServer: isServerReadOnly false If set to true, then a server will not accept new chunks, nor will the healing thread download chunks in its hash span. This allows a server to share its existing data, even if an administrator does not want the server to accept any new data.
hashSpanFix: AllowBalance Currently undecided If true, then the data directories will be balanced over time. This allows new (or under-used) data directories to assume data from other directories that are filling up.
hashSpanFix: RequiredPercentageDifferenceToBalanceDataDirectories 15.0 The required difference between a directory with a lot of used space and a directory with less. This is determined by calculating the percentage of space used by each and subtracting the values. Note that a higher value means less balancing will occur. A value at or near zero will simply shuffle data back and forth, which is non-productive. A good balance prevents non-productive shuffling and non-shuffling.
hashSpanFix: RequiredPercentageForMostUsedDataDirectoryBeforeBalance 60.0 This is the percentage of a directories total space that must be used before it is a candidate for shuffling. This prevents early shuffling which may server little purpose. Since a new server (or server that holds data only for short periods of time, like those without hash spans) do not need shuffling before a certain directory reaches a certain size, there is no need to shuffle.
hashSpanFix: TotalDataBlocksMovedToBalance varies The total number of data blocks shuffled as a result of a servers balancing. Balancing is a slow process by design, and it takes resources. There is little reason to balance quickly (even moving one data blocks frees an average of 50MB, which means that a new upload could proceed for a while). However, a large number will result as the server accumulates enough data. (Unless the directories are well balanced by chance, which might not be uncommon.)

* It is not a good idea to edit or remove an attribute unless you know what the impact will be. No changes will take place until the configuration is saved. Only some users can save configuration changes, and anyone who attempts to change them without proper permissions will get a simple message telling them that they do not have permissions to change the configuration.

Notes

Four primary functions of healing thread

The Tranche project has been modified extensively to accomodate the various failures that the developers and administrators have witnessed, like disk failures, disconnected servers, etc.

Each server on a Tranche network has a healing thread that runs as long as a server is running. This healing thread serves four primary functions:

  1. Downloading: looks for any new chunks on the network (other servers) and downloads to itself if the chunk falls in the server's hash span.
  2. Deleting: looks at the server's own chunks, and deletes any if not in the server's hash span and there are sufficient replications on the network.
  3. Repairing: looks at the server's own chunks, and repairs if the chunk is corrupted and the server finds another copy on the network.
  4. Balancing: balances between the data directories if disparities are disproportionate.

Special case: If a server does not have a hash span, as a few smaller or older servers do not, they will only hold on to data until it is replicated sufficiently. These servers will delete (everything, if sufficient resources) and repair, but not download.

Note that these four activities compete for the healing thread's time (i.e., for in the parlance of programmers, they all run on a single thread). These were designed so that administrators can adjust the focus. (Eventually, we will design automatons to do this work intelligently.)

This is accomplished by assigning how much of work each will do in succession. Each of the four handles a batch of chunks at the same time. However, this value is changed if a user sets a different value for the following three configuration attributes:

  1. hashSpanFix: BatchSizeForDownloadChunks (default value of 50)
  2. hashSpanFix: BatchSizeForDeleteChunks (default value of 5)
  3. hashSpanFix: BatchSizeForHealChunks (default value of 10)
  4. hashSpanFix: BatchSizeForBalanceDirectories (default value of 2)

So using the default values, every time fifty chunks are checked for new data to download to a server, 5 more chunks are checked to be deleted from a server if there are sufficient replications and the chunk doesn't belong, and 10 chunks are checked to see whether they were corrupted. Also, if there is sufficient disparity between use of data directories, up to two data blocks will be moved to mitigate the difference.

So why would we want to change these values while the server is running?

It is not possible to predetermine every scenario, but runtime adjustments allows the network administrators to respond to non-ideal circumstances and protect data.



Home   •   FAQ   •   Contact Us   •   Join email list   •   Collaborations   •   Credits