Loading
Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

Dark Skies

Tensorflow solution for the Dark Skies Challenge

The main script prepares dataset for Tensorflow format is dk_build_image_data.py. Briefly, this script takes a structured directory of images and converts it to a sharded TFRecord that can be read by the Inception model.

The directory of training images is created with following structure:

  train/astronaut/ISS013-E-88573.JPG
  train/astronaut/ISS020-E-29183.JPG
  ...
  train/aurora/ISS010-E-19304.JPG
  train/aurora/ISS010-E-33666.JPG
  ...
  train/black/ISS006-E-21563.JPG
  train/black/ISS006-E-21565.JPG
  ...
  train/city/ISS006-E-18390.JPG
  train/city/ISS006-E-21390.JPG
  ...
  train/none/ISS006-E-21548.JPG
  train/none/ISS006-E-21601.JPG
  ...
  train/stars/ISS007-E-15075.JPG
  train/stars/ISS013-E-78712.JPG
  ...
  train/unknown/ISS006-E-21633.JPG
  train/unknown/ISS006-E-22850.JPG
  ...

In parent folder train, each unique label has its own sub-folder that holds images belong to this label.

Once the data is arranged in this directory structure, we can run dk_build_image_data.py on the data to generate the shardedTFRecord dataset.

Set DK_ROOT to the folder that is cloned from darkskies-challenge github repo. To run dk_build_image_data.py, enter following in commands in the terminal:

## Prepare dataset , this should be modified for your computer
# here I assume that all folders and files are in $DK_ROOT,
# location to where to save the TFRecord data
OUTPUT_DIRECTORY=$DK_ROOT/tf_record
# location of downloaded images
TRAIN_DIR=$DK_ROOT/train

# please see below for label file
LABELS_FILE=$DK_ROOT/labels.txt 

# build the preprocessing script.
cd $DK_ROOT
bazel build -c opt inception/dk_build_image_data 

# convert the data. 
bazel-bin/inception/dk_build_image_data --train_directory="${TRAIN_DIR}" --output_directory="${OUTPUT_DIRECTORY}" --labels_file="${LABELS_FILE}" --train_shards=96 --num_threads=8

The $LABELS_FILE will be a text file that is read by the script that provides a list of all of the labels. Concretely, $LABELS_FILE contained the following data:

astronaut
aurora
black
city
none
stars
unknown

Note that each row of each label corresponds with the entry in the final classifier in the model. That is, the astronaut corresponds to the classifier for entry 1; aurora is entry 2, etc. We skip label 0 as a background class.

After running this script produces files that look like the following:

  $TRAIN_DIR/train-00000-of-00096
  $TRAIN_DIR/train-00001-of-00096
  ...
  $TRAIN_DIR/train-00095-of-00096

where 96 is the number of shards specified for darkskies-challenge dataset. We aim for selecting the number of shards such that roughly 1024 images reside in each shard. One this data set is built you are ready to train or fine-tune an Inception model on this data set.

Note, be sure to check num_examples_per_epoch() in dk_data.py to correspond with your number of downloaded images.