18 Why do most deep learning papers not include an implementation? 2020-05-14T14:19:56.070

9 Where can I find the original paper that introduced RNNs? 2018-09-30T18:55:45.547

6 Are there human predictions of when a computer would have been better than a human at Go? 2018-07-01T15:41:13.983

5 Where to publish reasonable article in Deep Reinforcement Learning? 2017-11-07T09:02:44.327

5 How does weight normalization work? 2018-02-16T19:36:22.933

5 What if the more fit parent has fewer nodes compared to the other, will the disjoint and excess genes be discarded? 2018-05-07T18:09:31.727

5 Are the ideas in the paper "Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour" novel? 2019-05-20T14:16:37.620

5 What is the "semantic level"? 2019-11-18T09:07:36.480

4 Name of paper for encoding/representing XY coordinates in deep learning 2019-05-01T16:29:02.923

4 What is the meaning of "stationarity of statistics" and "locality of pixel dependencies"? 2019-07-30T18:02:28.337

4 How does the network know which objects to track in the paper "Label-Free Supervision of Neural Networks with Physics and Domain Knowledge"? 2019-09-18T15:29:43.083

4 What is the meaning of the square brackets in ant colony optimization? 2019-11-01T12:59:24.073

4 Are most things generally discovered because they work empirically and later justified mathematically, or vice-versa? 2019-12-12T05:50:54.390

4 Are the labels updated during training in the algorithm presented in "An algorithm for correcting mislabeled data"? 2020-03-14T21:32:46.653

4 How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG? 2020-08-21T20:00:04.873

3 What is the significance of this Stanford University "Financial Market Time Series Prediction with RNN's" paper? 2018-04-17T15:29:02.883

3 What should I do when the potential value of a state is too high? 2018-05-08T22:23:53.637

3 What is a bad local minimum in machine learning? 2019-01-16T02:44:26.203

3 How is equation 8 derived in the paper "Self-critical sequence training for image captioning"? 2019-02-21T11:22:08.810

3 Why is the max a non-expansive operator? 2019-03-14T20:50:46.340

3 How are the observations stored in the RNN that encodes the state? 2019-08-03T01:00:28.310

3 Is unsupervised disentanglement really impossible? 2019-08-12T00:38:56.740

3 What is a non-starving policy in reinforcement learning? 2019-09-11T04:10:19.787

3 Understanding proof of lemma 1 (policy improvement bound) of the "Trust Region Policy Optimization" paper 2019-11-21T22:38:18.797

3 Do all expert trajectories have the same starting state in apprenticeship learning? 2020-03-27T09:08:16.220

3 How is the state-visitation frequency computed in "Maximum Entropy Inverse Reinforcement Learning"? 2020-04-07T12:16:40.490

3 Understanding the results of "Visualizing and Understanding Convolutional Networks" 2020-04-11T07:22:50.233

3 How does publishing in the deep learning world work, with respect to journals and arXiv? 2020-05-19T12:36:40.033

3 How can I read any AI paper? 2020-06-30T13:40:24.267

2 Why do we need 10 bits to represent the 1000 classes in AlexNet? 2018-03-15T23:53:52.083

2 Why do we use $D(x \mid y)$ and not $D(x,y)$ in conditional generative adversarial networks? 2018-09-15T06:05:05.613

2 How do we stack two U-Nets to yield one final prediction? 2018-09-20T01:27:25.590

2 Can you help me understand how weight normalization works? 2018-10-03T11:45:23.277

2 How could we estimate the square footage of a room from an image? 2019-03-12T09:57:37.507

2 Why does the BERT encoder have an intermediate layer between the attention and neural network layers with a bigger output? 2019-03-14T14:15:54.450

2 Infinite horizon in Reinforcement Learning 2019-06-01T00:58:30.793

2 What is "dense" in DensePose? 2019-07-09T07:17:33.867

2 What is the difference between Squeeze-and-excite and bottleneck modules from Mobilenet v2? 2019-08-16T11:30:53.543

2 How GoogleNet actually deal with reducing overfitting? 2019-11-29T21:05:57.427

2 YOLO 9000 about Better Stronger 2019-12-25T14:50:34.587

2 Why is this Monte Carlo approach scalable for a growing number of states variables and action variables? 2020-01-20T04:27:54.457

2 Is it feasible to use GAN for high-quality image synthesis other than human faces? 2020-01-29T03:56:33.950

2 How is the gradient with respect to weights derived in batch normalization? 2020-02-27T10:44:16.340

2 Recent algorithms for correcting mislabeled data using multilayer perceptrons 2020-03-16T09:48:20.373

2 How can I get to a final output of shape $224 \times 224$, without FC layers, from a tensor of specific shape, in OpenPose? 2020-03-21T10:55:22.077

2 Why does GAN loss converge to log(2) and not -log(2)? 2020-03-27T16:07:28.180

2 Which work originally introduced gradient clipping? 2020-03-31T23:37:06.820

2 Do I have to downsample the input and upsample the output of the neural network when implementing the NICE algorithm? 2020-04-01T06:55:19.237

2 How are the classical MDP and the object-oriented MDP views different? 2020-04-28T14:42:14.507

2 How can Siamese Networks be viewed as RNNs? 2020-04-30T03:53:39.437

2 Understanding the node information score in the paper "Hierarchical Graph Pooling with Structure Learning" 2020-05-16T09:03:48.750

2 Are the final states not being updated in this $n$-step Q-Learning algorithm? 2020-06-02T14:10:10.190

2 Deriving hyperparameter updates in Online Interactive Collaborative Filtering 2020-06-13T09:42:16.577

2 Why should the baseline's prediction be near zero, according to the Integrated Gradients paper? 2020-06-18T01:18:03.113

2 What is a heatmap in the CornerNet paper? 2020-07-10T22:03:36.810

2 What is convergence analysis, and why is it needed in reinforcement learning? 2020-07-15T15:21:38.493

2 What is meant by "arranging the final features of CNN in a grid" and how to do it? 2020-07-24T09:54:10.420

2 What is the memory complexity of the memory-efficient attention in Reformer? 2020-07-29T10:27:50.280

2 What is the surrogate loss function in imitation learning, and how is it different from the true cost? 2020-08-13T09:15:28.017

1 How is the word embedding represented in the paper "Recurrent neural network based language model"? 2018-02-12T13:05:53.910

1 Why are there transition layers in DenseNet? 2018-10-15T07:46:52.850

1 IQN bellman target: using Z vs using Q 2019-04-04T08:12:22.690

1 Understanding how the loss was calculated for the SQuAD task in BERT paper 2019-04-20T01:13:46.130

1 Understanding the reconstruction loss in the paper "Anomaly Detection using Deep Learning based Image Completion" 2019-07-18T15:03:50.343

1 What is an identity recurrent neural network? 2019-08-04T20:08:42.683

1 Reference request: one-hot encoding outperforming random orthogonal encoding 2019-09-07T22:42:13.813

1 Why do both sine and cosine have been used in positional encoding in the transformer model? 2019-09-12T02:03:04.950

1 How is the general return-based off-policy equation derived? 2019-11-16T10:56:13.993

1 How can I get the predicted box in Faster R-CNN? 2019-12-04T06:38:46.483

1 What is "temporal depth"? 2019-12-07T14:38:52.240

1 What is a cascaded convolutional neural network? 2020-01-10T06:54:22.467

1 Scoring feature vector with Support Vector Machine 2020-01-20T16:04:38.960

1 Can GraphRNN be used with very large graphs? 2020-02-10T17:41:40.633

1 What AI conferences in Europe should I consider submitting papers to explaining the ongoing work on RefPerSys? 2020-03-01T08:59:35.127

1 Is the TD-residual defined for timesteps $t$ past the length of the episode? 2020-04-03T16:09:34.237

1 What does equation in the "related work" section of the GAN paper mean? 2020-04-04T10:19:30.697

1 Is the paper "Reducing the Dimensionality of Data with Neural Networks" by Hinton relevant? 2020-04-17T09:22:52.857

1 Does the paper "On the difficulty of training Recurrent Neural Networks" (2013) assume, falsely, that spectral radii are $\ge$ square matrix norms? 2020-04-18T21:43:31.520

1 Why do we set offset (0.5) in single shot detector? 2020-04-28T06:40:59.453

1 Why is this variable in equation 2 of the SQAIR paper a random vector of $n$ ones followed by a zero? 2020-04-30T08:37:46.187

1 How can transition models in RL be trained adversarially? 2020-05-02T04:41:17.237

1 Which reward function works for recommendation systems using knowledge graphs? 2020-05-02T15:09:08.137

1 What do the authors of this paper mean by the bias term in this picture of a neural network implementation? 2020-05-09T19:49:04.690

1 What is a Hidden Markov Model - Artificial Neural Network (HMM-ANN)? 2020-05-17T06:55:00.937

1 What are finite horizon look-ahead policies in reinforcement learning? 2020-05-28T11:37:24.503

1 What is the main contribution of the paper Disentangling by Factorising? 2020-05-31T23:01:53.473

1 Why can't neural networks be applied to preference learning problems? 2020-06-11T19:19:05.600

1 What do the notations $\sim$ and $\Delta (A) $ mean in the paper "Fairness Through Awareness"? 2020-06-19T22:19:03.553

1 How are the coefficients of the Region of Interest being selected? 2020-06-30T21:24:44.203

1 What does it mean when a model "statistically outperforms" another? 2020-07-03T10:07:58.203

1 What is the score used to visualize attention in this paper? 2020-07-07T09:22:15.887

1 What is meant by degrees of freedom of latent variables? 2020-07-14T08:04:50.097

1 Ways to keep up with the latest developments in Machine Learning and AI? 2020-07-21T10:29:31.883

1 How can a de-noising auto-encoder act as an anomaly detection model? 2020-07-27T15:19:16.963

1 How to understand this NN architecture? 2020-08-30T13:59:54.970

0 Will AI always depend on models and thus approximations? 2020-02-13T05:10:41.740

0 How do you perform a gradient based adversarial attack on an SVM based model? 2020-03-11T02:07:53.710

0 What does "In each generation, 25% of offspring resulted from mutation without crossover" mean in the context of NEAT? 2020-04-18T17:28:41.177

0 What is the KWIK Framework? 2020-04-29T13:57:32.883