New LangChain & Vector DBs course. Enroll nowLangChain & Vector DBs: 60+ lessons & projects in our course. Enroll for free

  • ActiveLoop
    • Solutions

      INDUSTRIES

      • agricultureAgriculture
        agriculture_technology_agritech
      • audioAudio Processing
        audio_processing
      • roboticsAutonomous & Robotics
        autonomous_vehicles
      • biomedicalBiomedical & Healthcare
        Biomedical_Healthcare
      • multimediaMultimedia
        multimedia
      • safetySafety & Security
        safety_security

      CASE STUDIES

      • IntelinAir
      • Learn how IntelinAir generates & processes datasets from petabytes of aerial imagery at 0.5x the cost

      • Earthshot Labs
      • Learn how Earthshot increased forest inventory management speed 5x with a mobile app

      • Ubenwa
      • Learn how Ubenwa doubled ML efficiency & improved scalability for sound-based diagnostics

      ​

      • Sweep
      • Learn how Sweep powered their code generation assistant with serverless and scalable data infrastructure

      • AskRoger
      • Learn how AskRoger leveraged Retrieval Augmented Generation for their multimodal AI personal assistant

      • TinyMile
      • Enhance last mile delivery robots with 10x quicker iteration cycles & 30% lower ML model training cost

      Company
      • About
      • Learn about our company, its members, and our vision

      • Contact Us
      • Get all of your questions answered by our team

      • Careers
      • Build cool things that matter. From anywhere

      Docs
      Resources
      • blogBlog
      • Opinion pieces & technology articles

      • tutorialTutorials
      • Learn how to use Activeloop stack

      • notesRelease Notes
      • See what's new?

      • newsNews
      • Track company's major milestones

      • langchainLangChain
      • LangChain how-tos with Deep Lake Vector DB

      • glossaryGlossary
      • Top 1000 ML terms explained

      • deepDeep Lake Academic Paper
      • Read the academic paper published in CIDR 2023

      • deepDeep Lake White Paper
      • See how your company can benefit from Deep Lake

      Pricing
  • Log in
TensorFlow tf.data & Activeloop Hub. How to implement your TensorFlow data pipelines with Hub
    • Back
      • Tutorials

    TensorFlow tf.data & Activeloop Hub. How to implement your TensorFlow data pipelines with Hub

    Data pipelines are simpler if you use Hub instead of tf.data. Learn how to load datasets, create datasets from directory or approach data augmentation and segmentation tasks effortlessly with Hub.
    • Margaux Masson-Forsythe

      Margaux Masson-Forsythe

      on Oct 4, 202113 min read

    • Upvotes: 0

    • Share:

  • The Tensorflow data API tf.data is a well-known tool used to build complex Machine Learning (ML) input-data pipelines when training a model with Tensorflow. It is a very useful and powerful tool when applying transformations to a whole dataset for example.

    In this tutorial, we will show how to use Hub instead of tf.data for several cases:

    1. How to load a common dataset: CIFAR10

    2. How to create dataset from directory: Flower Photos dataset

    3. How to conduct Data Augmentation

    4. How to work with segmentation datasets

    Before starting, we need to install and import the packages required for this tutorial:

    !pip install hub==2.0.7 # restart runtime after this
    

    Imports:

    import hub 
    import tensorflow as tf
    import pathlib
    import os
    import matplotlib.pyplot as plt
    import numpy as np
    from PIL import Image
    from tqdm import tqdm
    

    1) How to load the CIFAR10 dataset

    Let’s start with a simple task: loading the CIFAR10 dataset. CIFAR10 dataset comprises 60000 32x32 colour images in 10 classes, and there are 6000 images per class. In total, there are 50000 training images and 10000 test images in the CIFAR10 dataset.

    • with tf.data:

      train, test = tf.keras.datasets.cifar10.load_data()
      
      images, labels = train
      images = images/255 # normalize
      
      dataset_cifar10_tf_data = tf.data.Dataset.from_tensor_slices((images, labels))
      

    ➡️ <TensorSliceDataset shapes: ((32, 32, 3), (1,)), types: (tf.float64, tf.uint8)>

    • with Hub:

      ds_cifar10_hub = hub.load('hub://activeloop/cifar10-train')
      
        def to_model_fit(item):
            x = item['images']/255 # normalize
            y = item['labels']
            return (x, y)
      
      ds_cifar10_hub_tf = ds_cifar10_hub.tensorflow()
      ds_cifar10_hub_tf = ds_cifar10_hub_tf.map(lambda x: to_model_fit(x))
      

    ➡️ <MapDataset shapes: ((32, 32, 3), (1,)), types: (tf.float32, tf.uint32)>

    2) How to create dataset from directory: Flower dataset

    We are using the Flower Photos dataset from Kaggle to demonstrate how to create a Tensorflow dataset from a local directory.

    We first download the Flower Photos dataset:

    !export KAGGLE_USERNAME="xxxxx" && export KAGGLE_KEY="xxxxx" && kaggle datasets download -d batoolabbas91/flower-photos-by-the-tensorflow-team && unzip -n flower-photos-by-the-tensorflow-team.zip
    

    We need to take a look at what we have in this folder:

    Now we can gather more information:

    dataset_flowers_path = 'flower_photos'
    
    from imutils import paths
    files_list = sorted(list(paths.list_images(dataset_flowers_path)))
    classes_flowers = sorted(os.listdir(dataset_flowers_path))
    
    print(f'There are {len(classes_flowers)} classes of flowers in the dataset" {classes_flowers}')
    

    ➡️ There are 6 classes of flowers in the dataset” [‘LICENSE.txt’, ‘daisy’, ‘dandelion’, ‘roses’, ‘sunflowers’, ‘tulips’]

    This is incorrect, we do not have 6 classes but 5 classes, so let’s fix this:

    # Removing the 'LICENSE.txt'
    classes_flowers.remove('LICENSE.txt')
    print(f'There are {len(classes_flowers)} classes of flowers in the dataset" {classes_flowers}')
    

    ➡️ There are 5 classes of flowers in the dataset” [‘daisy’, ‘dandelion’, ‘roses’, ‘sunflowers’, ‘tulips’]

    That’s better! Now let’s take a look at some of the images:

    for i in range(4):
      image = Image.open(files_list[i])
      image_size = image.size
      print(image_size)
      image.show()
    

    Visualize the first 4 images in the Kaggle [Flower Photos](https://www.kaggle.com/batoolabbas91/flower-photos-by-the-tensorflow-team) dataset — Image by author

    We can see that the images all have different sizes, so we need to keep this in mind for the rest of our coding.

    • with tf.data:

    We use the list files_list defined previously to create a Tensorflow dataset ds_flowers_tf_data:

    ds_flowers_tf_data = tf.data.Dataset.from_tensor_slices(files_list)
    

    We then implement a function parse_image that takes the path to the file, reads the image, get the label, and returns the normalized and resized image (since we saw that all the images have different sizes so we choose to use resize_size=(256,256))and the encoded label:

    def parse_image(file_name):
      # read the image, decode it, resize it, and normalize it
      image = tf.io.read_file(file_name)
      image = tf.image.decode_jpeg(image, channels=3)
      image = tf.image.resize(image, resize_size) / 255.0
    
      # Found the label and encode it
      label = tf.strings.split(file_name, os.path.sep)[-2]
      one_hot = label == classes_flowers
      encoded_label = tf.argmax(one_hot)
    
      # return the image and the integer encoded label
      return (image, encoded_label)
    

    Now we can see this function to map the file paths to the image/label pairs and apply it to ds_flowers_tf_data, that we then batch (batch_size=10), shuffle using a common seed shuffle_common_seed and prefetch:

    ds_flowers_tf_data = (ds_flowers_tf_data
                          # Calling parse_image
                          .map(parse_image, num_parallel_calls=tf.data.AUTOTUNE)
                          .batch(batch_size)
                          .shuffle(len(ds_flowers_tf_data), seed=shuffle_common_seed)
                          .prefetch(tf.data.AUTOTUNE))
    

    We implement a function called visualize_img_label_in_first_batch_TF_ds that takes as inputs a batched dataset ds along with the batch_size used and displays the image, its shape and its label:

    def visualize_img_label_in_first_batch_TF_ds(ds, batch_size):
      for image, label in ds:
        for b in range(batch_size):
          print(f'Image size: {image.numpy()[b].shape}')
          print(label.numpy()[b])
          plt.imshow(image.numpy()[b])
          plt.show()
        break
    

    Now we can use this function on the batched dataset ds_flowers_tf_data we created with tf.data:

    visualize_img_label_in_first_batch_TF_ds(ds_flowers_tf_data, batch_size)
    

    First batch (image&label) in the dataset ds_flowers_tf_data— Image by author

    • with Hub: Now we want to do the exact same thing but using Hub instead of tf.data. So we still want to be able to process the path to the images, read them, get their label, and returns the normalized and resized images (resize_size=(256,256))and the encoded labels.

    First, we create the Hub dataset structure using the same list files_list as we did for the tf.data dataset and using the classes_flowers collected previously:

    with hub.empty('./flowers_hub') as ds_flowers_hub:
        # Create the tensors with names of your choice.
        ds_flowers_hub.create_tensor('images', htype = 'image', sample_compression = 'jpg')
        ds_flowers_hub.create_tensor('labels', htype = 'class_label', class_names = classes_flowers)
    
        # Iterate through the files and append to hub dataset
        for file in tqdm(files_list):
            label_text = os.path.basename(os.path.dirname(file))
            label_num = classes_flowers.index(label_text)
    
            # Append to images tensor using hub.read
            ds_flowers_hub.images.append(hub.read(file))  
    
            # Append to labels tensor
            ds_flowers_hub.labels.append(np.uint32(label_num))
    

    We have created a Hub dataset called ds_flowers_hub that expects an image and a mask, both of types images.

    We resize the dataset using the Hub transformation feature:

    # Resize op[
    @hub.compute
    def resize(sample_in, sample_out, new_size):    
        # Append the label and image to the output sample
        sample_out.labels.append(sample_in.labels.numpy())
        sample_out.images.append(np.array(Image.fromarray(sample_in.images.numpy()).resize(new_size)))
    
        return sample_out
    
    # name resized dataset
    path_dataset_resized = './flowers-dataset-resized-256x256'
    
    # hub.like is used to create an empty dataset with the same tensor structure
    ds_flowers_hub_resized = hub.like(path_dataset_resized, ds_flowers_hub, overwrite = True)
    
    # Resize the dataset ds_flowers_hub that will be store in ds_flowers_hub_resized
    resize(new_size=resize_size).eval(ds_flowers_hub, ds_flowers_hub_resized, num_workers = 2)
    

    Now we can create the Tensorflow resized, batched, shuffled and prefetched dataset from ds_flowers_hub_resized:

    def to_model_fit(item):
        x = item['images']/255 # normalize
        y = item['labels']
        return (x, y)
    
    ds_flowers_hub_tf = ds_flowers_hub_resized.tensorflow()
    
    ds_flowers_hub_tf =  (ds_flowers_hub_tf
                          # calling to_model_fit
                          .map(lambda x: to_model_fit(x))
                          .batch(batch_size)
                          .shuffle(len(ds_flowers_hub_resized), seed=shuffle_common_seed)
                          .prefetch(tf.data.AUTOTUNE))
    

    Using the same function visualize_img_label_in_first_batch_TF_ds as previously, we visualize the first batch in the dataset ds_flowers_hub_tf which should be exactly the same as the images in the first batch of ds_flowers_tf_data because we are using the same shuffling seed (shuffle_common_seed) and we are using the same original dataset:

    visualize_img_label_in_first_batch_TF_ds(ds_flowers_hub_tf, batch_size)
    

    First batch (image&label) in the dataset ds_flowers_hub_tf— Image by author

    ➡️ The images in the first batch of ds_flowers_hub_tf and ds_flowers_tf_data are indeed identical. 🌸

    Finally, we can try to start a simple training with these datasets in order to check they are behaving correctly and can be used for an image classification training. First, we implement a function train_with_simple_CNN_function that defines the model to use (a very basic CNN), compiles the model and starts the training:

    def train_with_simple_CNN_function(ds):
      model = tf.keras.Sequential([
          tf.keras.layers.InputLayer(input_shape=(resize_size[0], resize_size[1], 3)),
          tf.keras.layers.Conv2D(16,3,padding='same',activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Conv2D(32,3,padding='same',activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Conv2D(64,3,padding='same',activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Flatten(),
          tf.keras.layers.Dense(128,activation='relu'),
         tf.keras. layers.Dense(len(classes_flowers), activation='softmax')
      ])
    
      # Compile the model, we are using the Adam optimizer, the SparseCategoricalCrossentropy loss
      # and SparseCategoricalAccuracy because our labels are not categorical 
      model.compile(
          optimizer='adam',
          loss=tf.keras.losses.SparseCategoricalCrossentropy(),
          metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
      )
    
      # Start training over 2 epoch
      history = model.fit(ds, epochs = 2)
    

    Results:

    Logs: running training on 2 epochs with ds_flowers_hub_tf and ds_flowers_tf_data -- Image by author

    We see that the trainings go through the same numbers of iterations per epoch: 367 (which makes sense because len(dataset)=3670 and batch_size=10). In this tutorial, we do not care about the metrics, and only focus on whichever the dataset is usable in a training or not.

    So, everything looks good! We can now work on training the best image classification model to differentiate the 5 classes of flowers we have with these datasets 🌻

    Photo by [Gérôme Bruneau](https://unsplash.com/@geromebruneau?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

    3) Data Augmentation 🌻🌻🌻

    Data augmentation is a common method used to avoid overfitting of the model. Let’s see how we can implement it using tf.data and Hub:

    • tf.data:

    We use the same files_list and tf.data.Dataset.from_tensor_slices but this time, we add another mapping that will use the function augment_using_ops to augment our dataset:

    def augment_using_ops(images, labels):
     images = tf.image.random_flip_left_right(images)
     images = tf.image.random_flip_up_down(images)
     images = tf.image.rot90(images)
     return (images, labels)
    
    ds_flowers_tf_data = tf.data.Dataset.from_tensor_slices(files_list)
    
    # We shuffle, cachhe, batched, augment and prefetch 
    ds_directory_tf_data_data_aug = (ds_flowers_tf_data
                      .map(parse_image, num_parallel_calls=tf.data.AUTOTUNE)
                      .shuffle(len(ds_flowers_tf_data), seed=shuffle_common_seed)
                      .cache()
                      .batch(batch_size)
                      .map(augment_using_ops, num_parallel_calls=tf.data.AUTOTUNE)
                      .prefetch(tf.data.AUTOTUNE)
    )
    

    Then we want to see if the augmentation was successful (we should have flipped and rotated images now:

    visualize_img_label_in_first_batch_TF_ds(ds_directory_tf_data_data_aug, batch_size)
    

    First batch of the augmented tf dataset ds_directory_tf_data_data_aug— Image by author

    The data augmentation worked fine!

    • Hub:

    We do the exact same augmentation but this time we are using Hub to create the TF dataset. We re-use the Hub dataset ds_flowers_hub_resized constructed previously. We create the augmented TF dataset ds_flowers_hub_data_aug by replacing the previous function to_model_fit that was used in the mapping by the function normalize_and_augment that augments and normalizes our dataset:

    def normalize_and_augment(item):
        x = item['images']/255 # normalize
        x = tf.image.random_flip_left_right(x)
        x = tf.image.random_flip_up_down(x)
        x = tf.image.rot90(x)
        y = item['labels']
        return (x, y)
    
    ds_flowers_hub_tf = ds_flowers_hub_resized.tensorflow()
    
    # We shuffle, cachhe, batched, augment and prefetch 
    ds_flowers_hub_data_aug = (ds_flowers_hub_tf
                      .shuffle(len(ds_flowers_tf_data), seed=shuffle_common_seed)
                      .cache()
                      .batch(batch_size)
                      .map(normalize_and_augment, num_parallel_calls=tf.data.AUTOTUNE)
                      .prefetch(tf.data.AUTOTUNE)
    )
    

    We check the images in the first batch:

    First batch of the augmented tf dataset ds_flowers_hub_data_aug — Image by author

    We see that we have the same images in the first batch as in ds_directory_tf_data_data_aug (because we see the same shuffle_common_seed), however the random data augmentation are different: the images are rotated and flipped in other directions in ds_flowers_hub_data_aug than in ds_directory_tf_data_data_aug because we used random data augmentations.

    4) Segmentation Dataset: Image + Mask

    All of the previous examples used image classification datasets. But a lot of Computer Vision projects focus on segmentation tasks and not classification. If you are just starting out and do not know what is image segmentation - segmentation is a pixel-wize classification method.

    For this example, we will use the Kaggle dataset: Accurate damaged flower shapes/segmentation:

    !export KAGGLE_USERNAME="xxxxx" && export KAGGLE_KEY="xxxxxx" && kaggle datasets download -d metavision/accurate-damaged-flower-shapessegmentation && unzip -n accurate-damaged-flower-shapessegmentation.zip
    

    Looking in the dataset’s structure — Image by author

    When looking in the dataset, we see that the images of the flowers are under the subfolder called “720p” and the corresponding masks are under the subfolder “mask”. We collect all the paths to the images in the list files_list_flowers_images and all the paths to the masks in the list files_list_flowers_masks. In total, we have 2544 pairs of image/mask.

    As usual, we want to take a look at these images:

    for i in range(4):
      img = Image.open(files_list_flowers_images[i])
      print(img.size)
      img.show()
    

    First 4 images in the dataset — Image by author

    for i in range(4):
      mask = Image.open(files_list_flowers_masks[i]).convert('L')
      print(mask.size)
      mask.show()
      print(np.unique(mask))
    

    First 4 masks in the dataset — Image by author

    For the masks, we also display the np.unique(mask) because we want to know if the values are binary or not. Ideally, they should be: for example 0 for background and 1 for flower since we are planning on doing binary segmentation. However, we see here, that we have np.unique(mask)=[ 0 20 74 77 78] for the first mask for example. So, we need to keep in mind that we will have to do something about this when creating the dataset.

    This time we want to resize the images to 512x512 because the original are too big to be efficiently used in a model on Google Colab (we would need way more resources). So we re-define some of the common variables:

    resize_size = (512, 512)
    batch_size = 4
    shuffle_common_seed = 21
    

    Okay, let’s start!

    • tf.data:

    We are using the same tf.data.Dataset.from_tensor_slices architecture as before to create the TF dataset with tf.data. But this time, we are passing two file lists instead of one: files_list_flowers_images and files_list_flowers_masks:

    ds_flowers_tf_data_seg = tf.data.Dataset.from_tensor_slices((files_list_flowers_images, files_list_flowers_masks))
    
    ds_flowers_tf_data_seg = (ds_flowers_tf_data_seg
                            # Calling parse_image_mask
                            .map(parse_image_mask, num_parallel_calls=tf.data.AUTOTUNE)
                            .batch(batch_size)
                            .shuffle(len(ds_flowers_tf_data_seg), seed=shuffle_common_seed)
                            .prefetch(tf.data.AUTOTUNE))
    ds_flowers_tf_data_seg
    

    However, we need to modify the function used in the mapping to read both the image and the mask. For this, we implement this new mapping function parse_image_mask:

    def parse_image_mask(image_name, mask_name):
      # read the image, decode it, resize it, and normalize it
      image = tf.io.read_file(image_name)
      image = tf.image.decode_jpeg(image, channels=3)
      image = tf.image.resize(image, resize_size) / 255.0
    
      # read the mask, decode it
      mask = tf.io.read_file(mask_name)
      mask = tf.image.decode_jpeg(mask, channels=1)
    
      # Need to have binary values: 0 or 1
      mask = tf.cast(mask > 0, tf.int32)
    
      # Resize
      mask = tf.image.resize(mask, resize_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    
      # return the image and its mask
      return (image, mask)
    

    In this new mapping function, we read the image and resize and normalize it. Then we read the mask perform a thresholding so that we only have binary values (0 or 1) for training the model, and finally we resize it.

    NB: we used the method: method=tf.image.ResizeMethod.NEAREST_NEIGHBOR when resizing the images so that it does not add new values that are not initially in the images (for example a value different from 0 or 1 for the mask)

    We implement a function show_img_mask_in_first_batch to visualize the images and masks in the first batch of a dataset ds:

    def show_img_mask_in_first_batch(ds, batch_size):
      for image, mask in ds:
        # first batch
        for b in range(batch_size):
          print(image.numpy()[b].shape)
          plt.imshow(image.numpy()[b])
          plt.show()
          print(mask.numpy()[b].shape)
          plt.imshow(mask.numpy()[b][:,:,0])
          plt.show()
          print(np.unique(mask.numpy()[b][:,:,0])) # we want [0. 1.]
        break
    

    And use it with our new segmentation dataset ds_flowers_tf_data_seg:

    show_img_mask_in_first_batch(ds_flowers_tf_data_seg, batch_size)
    

    Pairs image/mask in first batch of ds_flowers_tf_data_seg -- Image by author

    Here we see that we do have only 0 and 1 as values in the mask image. Both the images and masks are correctly resized.

    • Hub:

    First, we create the dataset ds_flowers_hub_seg locally at the path ./flowers_seg_hub, we populate it using the list files_list_flowers_images:

    with hub.empty('./flowers_seg_hub') as ds_flowers_hub_seg:
        # Create the tensors with names of your choice.
        ds_flowers_hub_seg.create_tensor('images', htype = 'image', sample_compression = 'jpg')
        ds_flowers_hub_seg.create_tensor('masks', htype = 'image', sample_compression = 'png')
    
        # Iterate through the files and append to hub dataset
        for file in tqdm(files_list_flowers_images):
            # Append to images tensor using hub.read
            ds_flowers_hub_seg.images.append(hub.read(file))
    
            path_to_mask = file.replace('image', 'mask').replace('720p','mask').replace('jpg','png')
            # Append to masks tensor using Pillow Image
            ds_flowers_hub_seg.masks.append(np.array(Image.open(path_to_mask)))
    

    NB: the paths to the masks are the same as the ones to the images if we replace the strings “image” by “mask”, “720p” by “masks” and “jpg” by “png”. This is what we are doing when defining the variable path_to_mask.

    Now that we have our datatse ds_flowers_hub_seg, we can create the TF dataset:

    ds_flowers_hub_seg_tf = ds_flowers_hub_seg.tensorflow()
    
    ds_flowers_hub_seg_tf =  (ds_flowers_hub_seg_tf
                            # calling to_model_fit
                            .map(lambda x: to_model_fit(x))
                            .batch(batch_size)
                            .shuffle(len(ds_flowers_hub_seg), seed=shuffle_common_seed)
                            .prefetch(tf.data.AUTOTUNE))
    

    And this time, this is the mapping function to_model_fit that we use:

    def to_model_fit(item):
        x = tf.image.resize(item['images'], resize_size)/255
        y = item['masks']
    
        # 3 channels to 1 channel
        y = tf.image.rgb_to_grayscale(y)
    
        # Need to have binary values: 0 or 1
        y = tf.cast(y > 0, tf.int32)
    
        # Resize
        y = tf.image.resize(y, resize_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    
        return (x, y)
    

    This function resize and normalize the images, convert the RGB masks to grayscale (3 channels to 1 channel) and perform the same thresholding as we did for ds_flowers_tf_data_seg, and finally resize the mask.

    Let’s take a look at the first batch:

    show_img_mask_in_first_batch(ds_flowers_hub_seg_tf, batch_size)
    

    Pairs image/mask in first batch of ds_flowers_hub_seg_tf -- Image by author

    This looks good! Now, we want to check that these datasets are both usable to train a segmentation model.

    We use the Unet model architecture from the article Binary Semantic Segmentation: Cloud detection with U-net and Activeloop Hub:

    model = unet(input_shape = (512,512,3))
    model.compile(optimizer=tf.keras.optimizers.Adam(1e-4),
                  loss='binary_crossentropy',
                  metrics=['accuracy', tf.keras.metrics.Recall(name="recall"), 
                           tf.keras.metrics.Precision(name="precision"), 
                           tf.keras.metrics.MeanIoU(num_classes=2, name='iou')])
    

    and then train with the two datasets we just created:

    Testing if we can use the datasets to train Unet — Image by author

    Both datasets were usable to train Unet for 1 epoch 🌼

    The Notebook for this tutorial is available here.

    • Previous
        • Release Notes
      • Release Notes: Hub 2.3.4 is released, new features for ingesting data from Kaggle, enhancements to hub auto and PyCon.

      • on Apr 21, 2022
    • Next
        • News
      • Loopy News: our Database for AI landed, enhanced video support, experiment tracking

      • on Mar 4, 2022
cubescubescubescubescubescubes
  • Subscribe to our newsletter for more articles like this
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic PaperHumans in the Loop Podcast
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured