Release Notes: Hub 2.3.4 Adds Kaggle & PyCon Features
  • Hub v2.3.4 features

    We added support for the most common image formats in hub.ingest and hub.ingest_kaggle (so you can directly ingest popular datasets from Kaggle). Also, we introduced ds.summary() so you can easily understand your dataset layout. See what’s included in the screenshot!
    Dataset summary

    Now you can return data in PyTorch Dataloaders as bytes instead of tensors, using ds.pytorch(... tobytes = True). This enables you to use libraries of your choice to decompress and remove your data. We also shipped less intrusive locking when performing operations on different version control branches.

    Community contributions

    New datasets were uploaded and documented by our community members Uday Uppal (KKanji) and Manas Gupta (EMNIST). We also changed str return to include Tensor-Wise information which was for issue by Suhaas Neel.

    Additionally, Bikram Maharjan added support to additional image formats in hub.auto. Sai Nikhilesh Reddy has contributed to ds.summary and the ReadMe in Chinese was merged by Jinyi Chen.

    Events

    We’re presenting at PyCon next week (Apr 27 - May 1)! Stop by our booth if you’re around or register for Davit Buniatyan's workshop The Future of Handing off Data to Compute.

    Community feature

    Gradient Health, our partner company (Ouwen Huang) has published a great guide to Open-source Medical Imagery Datasets check it out and let us know what you think.

    Share: